On 7/30/13 5:13 PM, John F Sowa wrote:
> Kingsley,
>
> Note the qualification: replacing *words* (i.e., ordinary words
> in English or other NLs) with IRIs gives a misleading impression
> of precision. (01)
Ah! (02)
Certainly don't want to replace words with IRIs. We should leverage
'words' in the text based annotations that comprise the description (the
graph) of an entity denoted using an IRI.
>
> JFS
>>> Replacing words with IRIs is worse than useless,
>>> because it gives a misleading impression of precision.
> KI
>> Doesn't this depend on the communication medium though? If any of these
>> entities are communicating by desktop, notebook, tablet, palm top, phone
>> etc., circa, 2013, there is immense value in have HTTP URI denotes the
>> entities, relationships, and relations in the discourse domain. To the
>> participants in such communications the HTTP URIs will be tucked behind
>> HTML anchor tags.
> I agree that you need a precise identifier to locate something
> that you need to access on the WWW.
>
> But the issue that Hans and Ed were discussing is the problem of
> detecting the context or other information needed to determine
> the exact sense of a word in an ordinary language text.
>
> Studies of inter-annotator agreement among well-trained humans
> show that 95% agreement is very rarely achieved. More typically,
> the best computer systems achieve about 75% accuracy.
>
> If you have a page with 300 words, you have about 15 errors
> in the best cases (95% accuracy). With the typical accuracy
> of 75%, you would get about 75 errors in a 300-word page.
>
> What this implies is that if you want to process documents written
> in ordinary language, it's better to use the raw text designed
> for human consumption than annotated text that some human or
> computer had marked up with IRIs for word senses. (03)
Yes, but you can also associate an IRI with raw text by way of
annotation property based relations. That's why I made reference (in a
my post) to annotation properties such as rdfs:label [1], rdfs:comments
[2], and skos:prefLabel [3]. I would even add dcterms:description [4] to
that default list. (04)
When all is said and done, you end up with the best of both worlds,
since there's basically a slot for everything that constitutes the
description graph to which an entity IRI (e.g., HTTP URI) resolves. (05)
Following what's outlined above, I do believe that one could then use
document metadata as a "best effort" mechanism for establishing
context-lenses through which description graph based document content is
processed. (06)
>
> Note that the IBM Watson system for Jeopardy! used a large number
> of different algorithms that came up with independently derived
> methods for generating an answer. Then it used a kind of learning
> method to estimate which of the many possible answers was best.
> And they ran the system on a supercomputer with 2880 cores. (07)
Yes, as I've stated in the past, that was a nice showcase of combined
exploitation of: (08)
1. Linked Data -- webby (or web-like) structured data that leverages
HTTP URIs as an integral component
2. Linked Open Data cloud corpus -- massive Linked Data collection on
the Web
3. NLP
4. Machine Learning
5. Entity Relationship Semantics . (09)
Take a single item out of the list above and there wouldn't the Watson
we saw on jeopardy. (010)
Links: (011)
[1] http://www.w3.org/2000/01/rdf-schema#label
[2] http://www.w3.org/2000/01/rdf-schema#comment
[3] http://www.w3.org/2004/02/skos/core#prefLabel
[4] http://purl.org/dc/terms/description . (012)
Kingsley
>
> John
>
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>
>
> (013)
-- (014)
Regards, (015)
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen (016)
smime.p7s
Description: S/MIME Cryptographic Signature
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (01)
|