ontolog-forum
[Top] [All Lists]

[ontolog-forum] Re: [Fwd: [xml-dev] Statistical vs "semantic web" appro

To: ontolog-forum@xxxxxxxxxxxxxxxx, mp-sofi-dev@xxxxxxxxxxx
From: "Peter P. Yim" <yimpp@xxxxxxxxxxx>
Date: Thu, 24 Apr 2003 12:29:28 -0700
Message-id: <3EA83B18.2010603@xxxxxxxxxxx>
Thank you, Monica, for sharing this.    (01)

Mind provoking ... and definitely worth the bits and bytes this piece 
is consuming -- although I will have to take a position that it is not 
even a matter of "either-or" in this case.    (02)

I'm passing this on ...    (03)

-ppy
--    (04)

Monica J. Martin wrote Wed, 23 Apr 2003 22:03:43 -0600:    (05)

> -------- Original Message --------
> Subject: [xml-dev] Statistical vs "semantic web" approaches to making 
> sense of the Net
> Date: Wed, 23 Apr 2003 21:09:48 -0400
> From: Mike Champion <mc@xxxxxxxxxxx>
> To: "xml-dev@xxxxxxxxxxxxx" <xml-dev@xxxxxxxxxxxxx>    (06)

> There was an interesting conjunction of articles on the ACM "technews" 
> page [http://www.acm.org/technews/current/homepage.html] -- one on "AI" 
> approaches to spam filtering  
> http://www.nwfusion.com/news/tech/2003/0414techupdate.html and the other 
> on the Semantic Web 
> http://www.computerworld.com/news/2003/story/0,11280,80479,00.html.    (07)

> What struck me is that the "AI" approach (I'll guess it makes heavy use 
> of pattern matching and statistical techniques such as Bayesian 
> inference) is working with raw text that the authors are deliberately 
> trying to obfuscate the meaning of to get past "keyword" spam filters, 
> and the Semantic Web approach seems to require explicit, honest markup.  
> Given the "metacrap" argument about semantic metadata 
> (http://www.well.com/~doctorow/metacrap.htm) I suspect that in general 
> the only way we're going to see a "Semantic Web"  is for 
> statistical/pattern matching software to create the semantic markup and 
> metadata.  That is, if such tools can make useful inferences today about 
> spam that pretends to be something else, they should be very useful in 
> making inferences tomorrow about text written by people who try to say 
> what they mean.    (08)

> This raises a question, for me anyway:  If it will take a "better Google 
> than Google" (or perhaps an "Autonomy meets RDF") that uses Baysian or 
> similar statistical techniques to create the markup that the Semantic 
> Web will exploit, what's the point of the semantic markup?  Why won't 
> people just use the "intelligent" software directly?  Wearing my "XML 
> database guy" hat, I hope that the answer is that it will be much more 
> efficient and programmer-friendly to query databases generated by the 
> 'bots containing markup and metadata to find the information one needs.  
> But I must admit that 5-6 years ago I thought the world would need 
> standardized, widely deployed XML markup before we could get the quality 
> of searches that Google allows today using only raw HTML and PageRank 
> heuristic algorithm.    (09)

> So, anyone care to pick holes in my assumptions, or reasoning?  If one 
> does accept the hypothesis that it will take smart software to produce 
> the markup that the Semantic Web will exploit, what *is* the case for 
> believing that it will be ontology-based logical inference engines 
> rather than statistically-based heuristic search engines that people 
> will be using in 5-10 years?  Or is this a false dichotomy?  Or is the 
> "metacrap" argument wrong, and people really can be persuaded to create 
> honest, accurate, self- aware, etc. metadata and semantic markup?    (010)

> [please note that my employer, and many colleagues at W3C, may have a 
> very different take on this and please don't blame anyone but me for 
> this blather!]    (011)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Unsubscribe/Config: 
http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (012)

<Prev in Thread] Current Thread [Next in Thread>