KnowledgeBlog
Copyright 2002, Michael Bales

systems, information, data, concepts, knowledge, ideas, people, computers, the world, interrelationships
Français | Deutch | Español | Italiano | Portuguese

Archive, May - September, 2001

An exciting new world
We are entering an exciting new world.  The growing amount of text on the web, paired with ever-faster processing speeds, give us an unprecedented opportunity to analyze and represent human knowledge.  Automated techniques can be used to process large volumes of unstructured text.  When the resulting data are analyzed, associations and patterns emerge between concepts. These methods are, by nature, imprecise; however, conclusions known to be especially imprecise can be disregarded.  The results could improve interdisciplinary collaboration and encourage scientific progress and discovery.

Michael Bales

Feel free to e-mail me.  I would love to hear from you!

mikeybales@yahoo.com

 

Sunday, August 19, 2001

I just took a nap.  As I was falling asleep, I was thinking about the alien who encoded all the world's knowledge by measuring and then drawing a line on a stick (see journal entries from 5-30-98 and 7-8-01.)  The implications are interesting… we can encode information in a similar way.  In a highly compressed format.  Let's say that my stick looks like this:

 

It has 40 positions.

              |

Now I have made a mark on the stick in position 16.  In doing so, I have identified the sixteenth element in a given, ordered set of 40.

   D       6       h        1 2     p

Now I have added additional information.  Notice that the possible combinations of information encodable on this stick are nearly endless.  The stick, 40 characters wide, has mostly blank spots.  The blank spots -- seemingly absence of information, actually contribute to the meaning conveyed by the stick.

What if each item on the stick was an element of information.  If we can place ASCII characters there, then we can certainly place other things there.  The proximity of A to B indicates its similarity to B.  What we have here is a condensed version of the galaxy mapping tool described earlier.  The key is that blank space, distance between entities, contributes information to the system infinitely more than the identity or even the positions of the entities.

Well, we're at Caribou on a Sunday morning.  The birds are singing in the trees.  Maybe the word "bird*" shows up often near the words "sing*" and "tree*".  So I was thinking this morning -- what should I do if I really want to create systems to benefit humanity.  And I decided I should capitalize on already-existing systems.  And that I should invest in the hardware I'll need, in case I am overtaken by any bright ideas.

For example, if I were to build a system today, I would create a prototype "all human knowledge navigation system" based exclusively on statistics.  Here's what I mean: you start with the initial term "bird".  You see "bird" appear on the screen, along with clickable related words (words in bird's "semantic neighborhood".)  And then... wait, no, that's already been invented.  Well, what it would be really nice to have... What would I want, if I could have an ideal computer helper... no, reference tool... no, knowledge discoverer?  I still like the computer assistant that would kind of hang out and contribute mostly irrelevant information to a discussion or brainstorming session.

OK, here's the challenge: Given A -- that I believe a computer system can present a tremendous advantage for finding mathematical associations between terms, and that a computer system can map the nature of relationships between terms... can a computer system deduce relationships between a new term with given characteristics, and other terms it already "knows"?  Yes.  We already know that to be the case in certain circumstances.  Actually, I forgot the "challenge" that I was about to describe because I got distracted.  But I'll think of it again sometime.

So the real question is: can a computer system mine the enormous corpus of text on the web and, using mainly statistical methods and existing semantic maps, generate output that is truly useful?  Yes, given the right tools and some time, I can prove that.  That's a challenge.  OK, I am confident that I could develop such a system by cobbling together already-existing tools and by adding my own magic.  So then what would happen?  I would be driven to push the envelope -- to figure out the true extent of such a system's capabilities, eventually to arrive at a point where I am confident I have run up against the limit of this method.  This, of course, would probably be a naive assumption.

Or, maybe my current assumptions are naive:  Such a system seems limitless.  I need to know just how limitless.

But the belief that it's possible just won't go away.  So why not get to work?  I can probably think of a few realistic types of "truly useful" output.  Right?  OK, let's start with the obvious: developing data about word associations.  So if someone wants to know what's related to "hybridomas" -- just in a purely statistical sense -- such a system could help.  But let's look at a real problem.  Cancer.  Pollution.  Overpopulation.  World hunger.  SETI.  I don't want my system to be conceptually complex.  How will it work?  What can it do?  How can it transcend certain abilities of human thought simply with computational power?  Data, statistics, truth.  Unambiguous, plain, scientific truth.  Shattering human misconceptions.  Logical deduction.  No room for argument or error.  So I want to develop a system that will tell the truth?

Then what happened to fuzzy logic, Bayesian statistics, chaos theory, relativity, "the world is not black and white"?  What is truth?  Is truth relative?  See, computers are good at math.  Pure math, one plus one is truly two.  Some people argue that since math is a human construction...

But what is true in the world of humans changes over time.  My computer could say that most Americans are under 38 years old, but next year most Americans will not be under 38 years old.  So there is a certain realm of knowledge for my type of system.  It's that realm that deals with interconnectedness between constants.  Variables have no place in this system's backbone, a rigid ontology.

So if this system can tell the truth, then can I ask it a question like "is a dog an animal"?  Yes, it will say "yes".  Is that artificial intelligence?  No, I'm not trying to design an artificial intelligence system.  It's just that I KNOW we can get useful information out of the corpus of textual information on the Web, and in a way that people don't expect.

OK, here's something funny that I just noticed -- I went to Google to do a search for "semantic neighborhood", not certain that I was using the best term.  And I remarked that not even I, I who am interested in this area of technology, know of any way to figure out in a standard way WHAT CONCEPTS REALLY ARE RELATED to the term "semantic neighborhood"!!!

Friday, August 3, 2001

Wouldn't it be neat if there were a computer system that could tell you interesting things?   I mean things that are really interesting to you, based on your interests?  I know that this sort of thing already exists in the marketing world -- companies use profiles and other similar instruments to send customers tailored advertisements.

True, I could always do a Web search on my favorite things -- musicians, science topics, Volkswagen New Beetle, boomerangs, etc., and learn a lot more… but I'm picky.  I want the results in a certain format, like this:

  • The Volkswagen New Beetle is the top-selling Volkswagen in North America.

  • Mark O'Connor's main musical influences stem from the old time music traditions of the American South.

  • The conferences related to visualization are as follows: SIGIR '01, Dallas, Texas; Information Retrieval Symposium, London, England; etc.

These are just made-up examples and the facts may not be true.  But I want my information to come to me automatically in this format.  I want the computer system to find sentences that:

  1. Contain my topic of interest.

  2. State a fact about that topic or how it is related to other topics.

I want the system to indicate the source of the information, with hyperlinks, and allow me to expand any of the sentences to see surrounding sentences if desired.  And as I learn about new topics of interest, I can add them to the list.  That way, my computer system continually sends me new and interesting information about my topics of interest; I can read them as I currently read the morning paper.  I expect and accept that maybe 95% of the sentences will not be that interesting.

Sunday, July 29, 2001

As Jessica studied, I sat on the couch staring at my pencil.  Eventually I wrote: This kind of pencil -- what is a satisfactory definition?  One that permits some inference about how it is related to other objects, by their definitions.  This kind of sock.  Universal identifier.  Is "capturing the essence" sufficient?  Can the overall shape of the signature be an approximation, on the macro scale, with details apparent upon closer look, e.g. (and I drew a shape that had a distinct overall shape with a detailed surface.

To devise a method for universal identifiers to be generated automatically based on information available to the system; can be regenerated when new information is discovered.  Question -- can I set up an algorithm to process input from the Internet and produce a digital signature as output for any given concept?  Maybe, but it will not be the concept's universal identifier; it will only be the concept's digital signature based on that domain (the Internet).

For simplicity, the concept is… Kofi Annan, the Secretary General of the United Nations.  Well if I apply the search technique (described earlier) to his name, and only sentences containing Kofi Annan are returned -- then these sentences are merged into a single text file, then this text file, along with many others, could form the input for a galaxy map of all human knowledge.  Or, just use concepts related to Kofi Annan and make a galaxy map just based on those concepts -- and this map is the digital signature for Kofi Annan.

Yes, with enough detail and at a high enough resolution, this digital signature for Kofi Annan could have a visual shape.  Kofi Annan's galaxy map as subset of overall galaxy (I drew a picture of a series of dots in a particular shape on a Cartesian plane.)  The United Nations' galaxy map as a subset of overall galaxy (A similar-looking series of dots; parts that match Kofi Annan's galaxy map are circled.)  Note the parts in common (circled).  Same shape!  Recognizable by human or by computer.

At this point it was time to go to bed.  I lay awake thinking.  Unable to sleep and still conceptualizing new ideas, I used a new pen with a built-in flashlight:

Dog - ball -- simple human concepts -- Previous galaxy map sequences predictive in regular conversation -- lost the ball.  A story makes a galaxy map movie; a galaxy map movie can be condensed visually into one galaxy map.  If people are talking about George W. Bush and Iraq, and the word invade is said, then other similar conversations are retrieved and the computer could chime in, simulating intelligence.

Context is taken care of too -- the way "invade" is being used here is clear because of the words said around the word invade.  Maybe there are other uses, like those pertaining to the video game Space Invaders -- but the main portion of invade’s galaxy map includes some of the same words that are used in the conversation.  It is therefore clear that the use of the word invade is its main use.  As a result, only that meaning of invade is used in the galaxy map movie.  Also, the shape of invade can be approximated (if only the top ten associated concepts are mapped) -- or revealed in greater detail (if the top 100 or 1000 associated concepts are mapped.)  So the "resolution" of the galaxy map movie is variable.

Note this model describes the extent of relationships between concepts but not the nature of those relationships.

Application -- a simple chatterbox.  I say "salt"; it says "pepper".  I say "peas porridge hot"; it says "peas porridge cold, peas porridge in the pot, nine days old."  I say Beatles drummer; it says the top concept or two from the place where Beatles' and drummer's image maps ("semantic maps?") overlap.  ("Concept maps").  There may already be a computational linguistics term…

This is simulated intelligence though not truly artificial intelligence.  But it is certainly fun and possibly useful.  A good conversation mate?  Sellable?

Editorial note:  I should mention that I wrote on the back of a printed web page from http://www.pnl.gov/infoviz/papers.html, which I had read earlier that evening.  This contained abstracts from several papers, including:

Wong PC, Foote H, Leung R, Jurrus E, Adams D, and Thomas J. Vector Fields Simplification -- A Case Study of Visualizing Climate Modeling and Simulation Data Sets.  Proceedings IEEE Visualization 2000.  Salt Lake City, Utah, Oct 8 - Oct 13, 2000

Wong PC, Foote H, Leung R, Adams D, Thomas J.  Data Signatures and Visualization of Very Large Datasets.  IEEE Computer Graphics and Application, Vol 20, No 2, March 2000.

Hetzler, Beth, Harris WM, Havre S, Whitney P.  Visualizing the Full Spectrum of Document Relationships.  In: Structures and Relations in Knowledge Organization.  Proc. 5th Int. ISKO Conf.  Wurzburg: ERGON Verlag, 1998. Pp. 168-175.

Sunday, July 8, 2001

Jess and I were debating whether to go see the movie "Artificial Intelligence" tonight.  I decided that rather than watch a movie about AI, I should create an artificial intelligence system.  One that doesn't really think, but... tendencies... fuzzy logic... Bayesian, neural networks... some way to make that delicate transition from the discrete, dichotomous world of the binary circuit into the real world of usually and often and "most of the time".  It's like making an analog-to-digital conversion -- no, it is, basically, the analog to digital conversion.  This seems so, well, unlikely to happen in an ideal way.  So maybe artificial intelligence won't happen in the silicon world but in some other way.

The analog signal.  A representation of our world.  Continuous -- a mathematical function with an infinite number of X's and corresponding Y's.  Can be digitized at any resolution.....

The digitized signal is a portrait that can, in turn, be compared mathematically with other signals, yielding a measure of similarity or dissimilarity.  The digitized signal can also be represented visually, at a low or high resolution.

Can an artificially intelligent entity wander around our world and "learn" from spoken word?

Can every concept in the universe be represented with a short or long binary signature?  Can the binary signature be standardized in a way that every artificially intelligent entity could figure out the appropriate binary signature for a concept?  Would it phone the "mother ship" for clarification?  Could it add new knowledge to the mother ship's database?  How would this help humanity?

Could it collect other kinds of information?  Visual?

Here's a story I heard from my Korean friend in grad school, Won Suk Yoo.  There once was an alien who had the job of recording all the world's knowledge, but his brain wasn't big enough to hold all the information.  So he took a stick, measured, made a mark on the stick, and brought it back to his planet.  Mission accomplished.  How did he do it?  He converted all the world's knowledge into binary format, giving him a binary number with many digits.  He converted that number into a decimal number between zero and one by adding a decimal point at the beginning of the number.  Then he measured very carefully and made a small mark at that point on the stick, where 0 was one end of the stick and 1 was the other end.  Then he remeasured when he was back on his planet, and wrote down all the world's knowledge.  That clever alien!

Anyway, if every concept could be represented mathematically, then maybe the degree of similarity between two concepts could be determined using the signature.  So the binary signature wouldn't be just the label for a concept, it would be a representation of the essential information about the concept as well.  But it's not a human's version of the essential information about the concept!  Let the artificially intelligent entities determine the binary signatures.  How?  Comparatively?  Based on other things that are already known?  Based on the physical characteristics of the object?  For abstract concepts, based on the words that are used to describe the concept?

I'm out of ideas, so I stare at the wall.  I see wood paneling.  I make the immediate deduction that I am not staring at a single object, but at a number of instantiations of the same object lined up side-by-side.  How do I know this?  I can't see the gaps between the panels.  I guess I know it for two reasons: first, the name of the object is "wood paneling", and I know that "paneling" is installed in this way.  Second, I guess I made a call to my own "mother ship" for more information on paneling.  I made a database query, of sorts.

Tuesday, May 15, 2001

Notes I took while listening to speakers at the AMIA spring symposium on Public Health Informatics:

How my system (using existing systems) might work:  It has input of data + information and metadata.  It organizes that information -- lots of it -- using best guess from natural language processing.  Maybe it can auto-assign relationships between things based on |relationship phrases| appearing near various concepts that appear together.  So the NLP can ID the concepts and their guessed relationship -- OR it can be done just quantitatively (much less processing time).  Then once it auto assigns the relationships, it can take a concept, map how it is related to other concepts, then do so with many other concepts.  Then it can run a query and output analogies: Call it Analogy Finder.  "Did you know that x is to y as a is to b?

A system designed to know things about all domains of knowledge.  How about one that people can contribute to?  NO, human element is too chaotic and unpredictable.

Here's a dot com that might actually make money -- a web site -- you have to log in -- and it costs 5 cents per use -- and it's like automated 'ready reference' -- OK, let someone else do it and make a lot of money. But here is a simple way to do this one.  First, do it with one domain of knowledge (maybe use Copernic, then search for all docs containing "Santa Claus" or something.)  Then NLP all the sentences with Santa Claus only.  If you want system only to be able to answer questions containing "Santa Claus", or if you want it to output a report on Santa Claus.  Or automatically generate a term paper.  Yes, and then the system keeps that term paper (with hyperlinks) organized into paragraphs with related ideas.  Then when the end user searches on Santa Claus, he or she will get that document!  But maybe the term papers will have to be edited by humans, maybe.

OK, but what would people really like?  What would people really find useful?  No, what would revolutionize science?  OK, any system that could do any of these would be OK.

But people may not know what they need -- so don't think in those terms.  Create a powerful yet elegant system, and people will find ways to use it, right?

I need to subscribe to, or read, Science and/or Nature.  Discovery about mammalian brains using existing data.  Discovering things about human knowledge.

What about a shared information space, a dynamic environment where people can share ideas visually and communicate at the speed of speech.  The power of human thought would be incredible if people could think in parallel.  Then people could make the deductions from all the information provided by my system.

-OR- how about a computer system that can help people think or brainstorm about a topic, or organize thoughts and contribute new ideas or at least propose related concepts.  And it's visual.  You can move words around by grabbing them with your hand, and maybe the concepts have ghost ideas around them that you can activate.  When you activate them, other, related concept ghosts appear.  At the end, you have followed the path from idea A to idea K, and the system paints a picture -- some kind of colorful graphical representation -- as well as a textual representation -- of the idea chain.  Could be used by scientists for hypothesis generation… or collaboratively if 2 or 3 people sharing the info space could be used as a writing or next generation word processing tool, complete with voice recognition maybe.  And you can choose the right word from a list of options by touching it.  Could also be used for exercise -- it's more interesting than walking on a treadmill -- and you can do it with loved ones.  By the way, what did ever happen to AltaVista LiveTopics???

Eventually, data entry will be voice activated.  To confirm your choice, a voice will repeat what you said and it will flash on the screen.

Saturday, May 12, 2001

A thought as I was falling asleep last night -- why always do, do, do?  What is motivating me to keep moving?  Is it the fact that Bales's must always be busy?  Maybe it's the "no idleness, no laziness, no procrastination" written above the stage in my high school auditorium.  But I seem to be cursed… and especially now.  Yesterday I went to a presentation on "Knowledge Discovery through Visualization", by Battelle/ Pacific Northwest National Laboratories.  They demonstrated about ten paradigms for visualization of large data sets, and most of there input was -- get this -- FREE TEXT!!!  It's as though they were planted in my life, at this very moment, for some higher purpose.  So now my head is swimming with the possibilities.

Here's a music-related one.  On the Internet right now, some people represent Irish tunes using a format called ABC.  For example, here is one of my favorite tunes in ABC format (from http://rigel.csuchico.edu/~pubscout/tunes/swallowtail.jig.html)

X:1
T:The Swallowtail Jig
M:6/8
L:1/8
K:EDor
(EF)|:GEE BEE|GEE (BAG)|FDD ADD|(dcd) (AGF)|
GEE BEE GEE (B2 c)|(dcd) (AGF)|1 (GE)E EEF:|2(GE)E E2 B|
|:(Bcd) (e2 f)|(e2 f) (edB)|(Bcd) (e2 f)|(edB) (d2 d)|
(Bcd) (e2 f)|(e2 f) (edB)|(dcd) (AGA)|1 (GE)D E2 B :|2 (GE)D E3 |>|

Now, if a great number of these were put into a database that understood how they work, it wouldn't be difficult to create a map using multidimensional vectors.  And this information could be collapsed at a very high resolution onto 2D space, and voila!!  Tunes that are related would cluster together.  You could do one for reels, another for jigs, etc.  I know that people at Fado would love this, especially if the system used only commonly-known tunes.  It could be a good way to know what tunes logically can follow one another, for example.

Wednesday, May 9, 2001

Researched the Semantic Web today.  It's clear that my vision requires a dictionary that specifies the exact nature of relationships.  The relationships database might have the following fields: Unique ID, the relationship, hierarchical nature, whether fuzzy, any number of ways that the relationship is expressed in English.  Actually, this last field might better be assigned to the natural language parser, which would need to know the many ways a particular relationship can be expressed so that it can create the assertions from the free prose.  It would be interesting to know what percent of the bits of displayed (crawlable) info are in (non semantic webbified) prose.

What makes my idea different from the ideas behind the semantic web?  I propose to convert certain information -- selected sentences among the prose information on the Web -- into semantic information in a single master database and search engine capable of responding to search queries with clear, logical assertions (I accept that many will be irrelevant).

Here's an interesting kind of query:  Given a story or situation in which A is related in a particular manner to B, and B to C, C to D, and D to E, how many other A, B, C, D, and E exist on the web (PERHAPS IN OTHER DOMAINS OF KNOWLEDGE) with the same exact (or similar) set of relationships?  Maybe we really can learn from history.  Or, if we enter information about our current personal situation, we can learn about other similar situations in the past.  If we read about how things were related in the past situation, we can potentially gain some insight into our current situation.

I'm convinced that the universe of data and information on the web can be harvested.  This harvest will reap not only new hypotheses, but new knowledge.

Tuesday, May 8, 2001

At Fado last night I scribbled the following on a piece of paper:

The system feeds on the data/information on the WWW.  It uses natural language processing to parse the information.  The exact nature of the relationship between subjects is described.  The system describes such connections into a database.  The system deduces things based on these relationships creating new knowledge.  It answers questions with existing knowledge and new knowledge. 

A is equal to B
B is equal to C

Therefore, A is equal to C.

T. Crowley said stocks usually go up after the signing of an international trade agreement.
The European xyz pact was an international trade agreement.
Therefore, stocks may go up.

It can use fuzzy logic and probabilities -- or,
It can be strictly based on "black and white" factual relationships.

My friend Katherine, another fiddler, wondered what I was doing and I explained briefly.  She mentioned that it sounded a lot like a story she'd heard on NPR about two days ago, about a scientist in the 1930s who proved that every discipline has things that cannot be proven.  She said they mentioned something about the Semantic Web.  When I woke up this morning I searched around on NPR's web site and found out that the NPR story was about Gödel’s Uncertainty Theory.  During lunch today I came home and developed a simple model of a knowledge-based system that can deduce new information.  It looks like this: 

Subject

Relationship

Object

Deduction: Creation of new knowledge

Atlanta traffic

increasing

 

 

Atlanta drivers

waste

time

 

Atlanta's population

grows faster than

Atlanta's transportation system

.

Atlanta's population

burned

fuel

 

Fuel prices

depend upon

crude oil prices

 

Crude oil prices

have

cost

 

Cost of crude oil prices

increasing

 

Cost of crude oil prices increasing; fuel prices depend upon crude oil prices; therefore, fuel prices are increasing.  (This information gets added to the system -- see next row)

Fuel prices

increasing

 

 

Michael

owns

car

 

Cars

have

wheels

Michael owns a car and cars have wheels; therefore, Michael owns wheels.

Michael

owns

wheels

 

Editorial note, 4-14-02: It doesn't make sense to say "Michael owns wheels."  This is common sense.

After dinner today, Jessica and I discussed the limitations of this type of system.  I decided that if it were working correctly, it would come up with opposite conclusions.  It would also be unable to distinguish fact from opinion.  I thought of restricting the system to scientific journals, and Jessica pointed out that studies can reach opposite conclusions.  I became somewhat discouraged and we left for a walk.

Undaunted, I continued thinking.  When we had almost arrived at the Lullwater parking garage, I exclaimed that I had discovered another potential use for my system from yesterday.  The system could find many sentences on the Web containing "John Adams" and use this to form a data set of sentences.  The words appearing in these sentences could be given positive or negative values:  A word like "suspicious" would receive a value of two or three out of ten, while a word like "exemplary" would receive a nine or a ten.  The average value would be a score for John Adams; this score could be compared with the scores of other presidents.  The results might lend some insight into public approval of various presidents.

Now that I think about it, I think that similar experiments were described in the book "Learned Optimism" to describe why more optimistic candidates, who use more positive words and ideas in speeches, tend to win more often.

While we walked, we discussed these ideas further.  I described the web as "a universe of data and information just waiting to be analyzed".  Jessica said that some of the information on the web can be used as-is, but most of it needs to be processed in some way before it could be useful in an automated system.  I agreed, but explained that for a long time I've been under the impression that knowledge discovery on the web would need to be limited to those few organized data sets -- to data in the rectangular table, or row-and-column, spreadsheet format.  But now I believe otherwise.

I gave her an example of a method for turning simple prose into useful information that could be entered into a data table.  Let's say you have an essay about John Adams.  I know that there is a computer program that can make a list of every word in that essay, and how many times each word appears.  This data could be entered into a data table, where the first column would contain the words and the second the number of occurrences of each word.  I realized quickly that this data would no longer carry the meaning of the original essay, because we would lose the way the words fit together.  I said it's like taking the derivative in calculus.  What's the derivative of x2? 2x!  And what information does one lose when moving from x squared to 2x?  You know that there are two x's, but you no longer know the relationship between them -- the fact that they are supposed to be multiplied together.

There are a number of other ideas to explore.  So many words and ideas floating around in my head!  Bayesian networks, neural networks, rule-based systems, expert systems, knowledge discovery in databases (KDD), data warehousing, data mining, lexical theory, translation to different languages, etymology, linguistic theory, computational linguistics, heuristics.

Sunday, May 6, 2001

I'm reading my journal entry from 12-14-97, and it has a most exciting idea!  I know that the information on the Web could be made much more accessible.  Jessica just told me that King Richard III was suspected of killing his nephews.  The information in this sentence could be parsed automatically -- it has a subject, King Richard III; a verb, suspected of killing; and a direct object, his nephews.  With some fairly simple programming I could create a system that crawls the web searching for these types of sentences -- and only simple sentences at first, sentences that match this kind of pattern.  How will it know what's a verb form?  Well, it could analyze the sentence and assign probabilities to each of the words, according to type.  When a certain level of confidence is achieved, it would put the words into a database.  King Richard III, let's say he already has a record in the "subjects" table -- well, he would get another record.  In the "sentence" field, the sentence; in the "hyperlink" field, a link to the information; in the "verb" field, 'was suspected of killing', and in the "direct object" field, 'his nephews'.

Then, if a user asks question A, B, or C below, the response will be the original information -- hyperlinked to that information.

A) Whom was King Richard III suspected of killing?
B)  What did King Richard III do to his nephews?
C)  Which king was suspected of killing his nephews?

We can't expect that an entirely automated system would end up with the right information in the right place.  We also can't expect the answers to be correct or even relevant.  This may already be under development.  Is there a lexical database attempting to record all human knowledge in an organized way?

John Adams was the 2nd president of the United States.  He was born on xyz date.  He had x sons and x daughters.  He is credited with xyz.  ('Is credited with' could always be simplified to 'did').  He went to school at X.

All this information would probably need to occupy one row of a table.  Should John Adams get his own table?  If so, there would need to be a meta-index linking to the John Adams table so that people who search on John Quincy Adams would retrieve his relationship with John Adams, even if the relationship is nowhere mentioned in the John Quincy Adams table.  Of course each subject would get a unique identifier, probably a long hexadecimal code with some relationship to the topic.  It wouldn't have to be a code of any particular length.  It could be as large as necessary to accommodate the level of complexity of the subject.  What about including some information within the actual code?  What about XML strings, a web page for each subject that could be indexed by an information management product like Verity search engine?

Editorial note, 4-14-02: I now know that question answering systems are incorporating similar techniques.

All original material on KnowledgeBlog copyright Michael Bales unless otherwise noted.