Hey again, (01)
First of all, thank you all for the responses and the discussion - it is great
that this forum is always so alive;) (02)
> Having built the same type of search engine from 1995-1999: (03)
> http://iandavis.com/blog/1999/07/goxmlsearchengine (04)
> I can tell you in advance there are huge bloating issues with the index.
> These are the main pain points to avoid. Google and other advanced indexes
> use pre-cached searches with a combination of B-tree and linked list look
> ups to get the index ratio down to about 1/8 of the size of the average web
> page. Contextual engines start in at around 1.5 X and get uglier the more
> you add, although we eventually got GoXML down to around 3/5 of the raw size
> without throwing away any meaningful information. (05)
> What both Yahoo and Google do is use ontological links combined with
> secondary mechanisms like geo-mapping and persistent knowledge of
> individuals to guess the right ontological context. This works fairly well
> and humans are constantly refining the trees (DMoz). (06)
> Using a KB is not difficult to overlay as a relationship between instances
> in your index and nodes in an ontology/taxonomy. The real problem is how to
> scale it. (07)
> What is your intended scale? How big will your index be? What are
> acceptable parameters for search results? (08)
> These are the questions I recommend focusing on first. I am happy to share
> more with you and I might even be able to find you a code base somewhere
> (written in ANSI CPP) (09)
Duane thanks for the tips! I need some time to answer the
questions and analyze your solution - after that I will probably try
and reach you on your e-mail! (010)
Pawel (011)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (012)
|