To: | scorek <scorek@xxxxx>, "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx> |
---|---|
From: | Duane Nickull <dnickull@xxxxxxxxx> |
Date: | Wed, 27 Feb 2008 08:26:02 -0800 |
Message-id: | <C3EACF1A.D47D%dnickull@xxxxxxxxx> |
Scorek: Having built the same type of search engine from 1995-1999: http://iandavis.com/blog/1999/07/goxmlsearchengine I can tell you in advance there are huge bloating issues with the index. These are the main pain points to avoid. Google and other advanced indexes use pre-cached searches with a combination of B-tree and linked list look ups to get the index ratio down to about 1/8 of the size of the average web page. Contextual engines start in at around 1.5 X and get uglier the more you add, although we eventually got GoXML down to around 3/5 of the raw size without throwing away any meaningful information. What both Yahoo and Google do is use ontological links combined with secondary mechanisms like geo-mapping and persistent knowledge of individuals to guess the right ontological context. This works fairly well and humans are constantly refining the trees (DMoz). Using a KB is not difficult to overlay as a relationship between instances in your index and nodes in an ontology/taxonomy. The real problem is how to scale it. What is your intended scale? How big will your index be? What are acceptable parameters for search results? These are the questions I recommend focusing on first. I am happy to share more with you and I might even be able to find you a code base somewhere (written in ANSI CPP) Duane On 27/02/08 5:49 AM, "scorek" <scorek@xxxxx> wrote: Hey all, -- ********************************************************************** "Speaking only for myself" Senior Technical Evangelist - Adobe Systems, Inc. Blog - http://technoracle.blogspot.com Community Music - http://www.mix2r.com My Band - http://www.myspace.com/22ndcentury Adobe MAX 2008 - http://technoracle.blogspot.com/2007/08/adobe-max-2008.html ********************************************************************** _________________________________________________________________ Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/ Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/ Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx Shared Files: http://ontolog.cim3.net/file/ Community Wiki: http://ontolog.cim3.net/wiki/ To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (01) |
<Prev in Thread] | Current Thread | [Next in Thread> |
---|---|---|
|
Previous by Date: | [ontolog-forum] Search engine for the ontology, scorek |
---|---|
Next by Date: | Re: [ontolog-forum] Search engine for the ontology, Sharma, Ravi |
Previous by Thread: | [ontolog-forum] Search engine for the ontology, scorek |
Next by Thread: | Re: [ontolog-forum] Search engine for the ontology, Sharma, Ravi |
Indexes: | [Date] [Thread] [Top] [All Lists] |