ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Amazon vs. IBM: Big Blue meets match in battle for t

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Tue, 30 Jul 2013 19:32:31 -0400
Message-id: <51F84D0F.9030006@xxxxxxxxxxxxxx>
On 7/30/13 5:13 PM, John F Sowa wrote:
> Kingsley,
>
> Note the qualification:  replacing *words* (i.e., ordinary words
> in English or other NLs) with IRIs gives a misleading impression
> of precision.    (01)

Ah!    (02)

Certainly don't want to replace words with IRIs. We should leverage 
'words' in the text based annotations that comprise the description (the 
graph) of an entity denoted using an IRI.
>
> JFS
>>> Replacing words with IRIs is worse than useless,
>>> because it gives a misleading impression of precision.
> KI
>> Doesn't this depend on the communication medium though? If any of these
>> entities are communicating by desktop, notebook, tablet, palm top, phone
>> etc., circa, 2013, there is immense value in have HTTP URI denotes the
>> entities, relationships, and relations in the discourse domain. To the
>> participants in such communications the HTTP URIs will be tucked behind
>> HTML anchor tags.
> I agree that you need a precise identifier to locate something
> that you need to access on the WWW.
>
> But the issue that Hans and Ed were discussing is the problem of
> detecting the context or other information needed to determine
> the exact sense of a word in an ordinary language text.
>
> Studies of inter-annotator agreement among well-trained humans
> show that 95% agreement is very rarely achieved.  More typically,
> the best computer systems achieve about 75% accuracy.
>
> If you have a page with 300 words, you have about 15 errors
> in the best cases (95% accuracy).  With the typical accuracy
> of 75%, you would get about 75 errors in a 300-word page.
>
> What this implies is that if you want to process documents written
> in ordinary language, it's better to use the raw text designed
> for human consumption than annotated text that some human or
> computer had marked up with IRIs for word senses.    (03)

Yes, but you can also associate an IRI with raw text by way of 
annotation property based relations. That's why I made reference (in a 
my post) to annotation properties such as rdfs:label [1], rdfs:comments 
[2], and skos:prefLabel [3]. I would even add dcterms:description [4] to 
that default list.    (04)

When all is said and done, you end up with the best of both worlds, 
since there's basically a slot for everything that constitutes the 
description graph to which an entity IRI (e.g., HTTP URI) resolves.    (05)

Following what's outlined above, I do believe that one could then use 
document metadata as a "best effort" mechanism for establishing 
context-lenses through which description graph based document content is 
processed.    (06)

>
> Note that the IBM Watson system for Jeopardy! used a large number
> of different algorithms that came up with independently derived
> methods for generating an answer.  Then it used a kind of learning
> method to estimate which of the many possible answers was best.
> And they ran the system on a supercomputer with 2880 cores.    (07)

Yes, as I've stated in the past, that was a nice showcase of combined 
exploitation of:    (08)

1. Linked Data -- webby (or web-like) structured data that leverages 
HTTP URIs as an integral component
2. Linked Open Data cloud corpus -- massive Linked Data collection on 
the Web
3. NLP
4. Machine Learning
5. Entity Relationship Semantics .    (09)


Take a single item out of the list above and there wouldn't the Watson 
we saw on jeopardy.    (010)

Links:    (011)

[1] http://www.w3.org/2000/01/rdf-schema#label
[2] http://www.w3.org/2000/01/rdf-schema#comment
[3] http://www.w3.org/2004/02/skos/core#prefLabel
[4] http://purl.org/dc/terms/description .    (012)


Kingsley
>
> John
>   
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>   
>
>    (013)


--     (014)

Regards,    (015)

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen    (016)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>