On Mar 30, 2010, at 11:51 PM, Duane Nickull wrote:
My response to this very reasonable question is in the context of searching BEHIND the corporate firewall, where the Google approach is not particularly effective for a variety of reasons.
I do a search for “foo”. Rather than a list of pages that contain “foo”, I am presented a list of contexts (for lack of a better word) that “foo” is found within. This would be something like “your search for foo yielded results that include foo which has a plurality of meanings. Please narrow your search down based on the following contexts: 1. <meaning_a> 2. <meaning_b> ... An example might be a search for “nut” which then presents the searcher with the meanings “seed of a tree”, “slang for a males reproductive organs”, “slang for a person who works on semantic web ideas”, etc.... I would then select a context which would then present me with a narrow set of results.
To my memory an outfit in Cambridge (Massachusetts) called Northern Light (not sure if that was name of website) did that approach in maybe the mid 1990s. The guiding light was a David Seuss. You'd issue the query & the response would be labeled folders down the left panel containing different context answers. Their "magic" was to have a large number of professional librarians curating the documents being returned.
To make this work, when a resource is added to the index, I would see that the person tagging a resource with a label might be prompted to disambiguate the label if the plurality of meanings is detected.
Personally I am NOT in favor of relying on using an author to tag their own materials since they may be too busy, too burned out, not good tagger, etc. Granted the original author may be well motivated for personal fame & glory to tag six ways to Sunday simply to get the exposure. Depending on the material, I suspect additional readers will need to have tagging rights.
The problem, of course, with tagging is that so far (please correct me if I've missed this) tags are applied without any documented, discernible context... assuming the next reader will have the same understanding of the term as the original tagger is something that I reject immediately. Point being: what does "NO" mean? Negative? Index? North? Number? Whence the disambiguation?
Are there tagging disambiguation mechanisms? I certainly haven't found any & I've been using tags for 20 years.
FACT: inside an organization (I don't care if that's a work group of 5 or multiple divisions with 100,000 people) there is heavy use of industry, corporate & local slang. [See "foo" above... step away from your personal technical circle & I'd bet you'd find lots of people who'd be totally baffled by "foo." I've always wondered if there's a connection to WWII's "fubar."]
To drive that home... my little 2,000 term dictionary of admittedly short "words" (I make no distinction between a real word or an acronym, abbreviation or initialism) has some 68,000 meanings.
Googling "vocabulary problem" produces an interesting academic study where the punch line is that you have AT BEST a 20% chance of guessing the word someone else used in a software interface.
I search for “foo” and the search mechanism miraculously deducts the exact top result I am seeking. This could work with simplistic cases (like a history student searching for “berlin” getting historical results based on an IP address being detected from the history lab of a school). This would obviously not cater to 100% of the searchers but maybe the semantic web (whatever that means) might be only meant to be useful for 75% of the users. This is something worthy of consideration.
This scenario is way too much like thought control.
I'd like to see some kind of interface that ASKS some contextual, narrowing questions before doing the query. That's what reference librarians do. The clueless high schooler needs to do a paper on the Panama Canal & ASKS the reference librarian... who being a contextually sensitive carbon-based life form, asks "Have you looked at Teddy Roosevelt?" The clueless student thinks: "What does Teddy Roosevelt have to do with the Panama Canal?"
I see big danger in automatically setting context (e.g. sensing the query is coming from history lab)...
In about 1992, prior to public Internet & when CD-ROMS were popular in libraries, I was searching a CD for "data dictionary." The CD had magazines, research & lots of resources. I got a long list & was plodding thru it when I found the name of a writer who I recognized. I'd always wondered if she wrote for more than "Software Magazine." I went back to the query & searched on her name... & got just the single hit. Huh? There's 7 years of "Software Magazine" on the CD & she'd write 3-4 times per year for it. After some head scratching, it turned out that the same magazine & same author had spelled her name differently. Obviously there was no Soundex on the name search.
When using Google & getting 10,000,000 hits (when was the last time you went to the 2nd page of your Google results?) you obviously don't see this sort of error. But inside the firewall when you need to find ALL references to <whatever>, a Google mindset is going to bite you big time.