[Top] [All Lists]

Re: [ontolog-forum] fitness of XML for ontology

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Ali H <asaegyn@xxxxxxxxx>
Date: Tue, 11 Feb 2014 13:39:13 -0500
Message-id: <CADr70E1F4MeJVNb=Q6PoRripPwYOLozt8Q7finciw6o498A5ag@xxxxxxxxxxxxxx>
Dear Duane,

On Mon, Feb 10, 2014 at 1:07 PM, Duane Nickull <duane@xxxxxxxxxxxxxxxxxxxxxxx> wrote:

The case you present would be structured then.  If it has structured, it is structured.    There is a different argument here which may be more relevant to tag something as deterministic.  Many structured documents are not deterministic which is what people usually mean when they state semi-structured.


I'm also of the thought that semi-structured texts do exist.

While I see why you would suggest "structured" is a binary classification - some text either has one or more structured elements, or else it has none - in many cases it is useful to speak about semi-structured texts.

Consider a newspaper article. Written in natural language, most would agree that it is an unstructured text (though some linguists would argue that the use of NL and its underlying grammar etc. is itself a form of structure, but let's put that perspective aside for the moment). Now imagine that I were to run this article through some NER algorithm  which (for the sake of illustration) inserts XML tags around people, places and times in the text. Now there's an NL document with some (immediately) computable markup, though on only some subset of the NL text. I suspect that many people would call this sem-structured text, since for example, one might not know whether the article is about business, politics or sports.

Would I be correct in thinking that you would call the NL document with NER elements marked-up as structured? If so, I hope you would agree or at least empathize that from the perspective of a firm interested in say, tracking mergers and acquisitions in a financial domain, this level of mark-up would at best be semi-structured, since important semantics remain in the un-marked NL. It is a mix of structured and unstructured data, and hence it is semi-structured.



(•`'·.¸(`'·.¸(•)¸.·'´)¸.·'´•) .,.,

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>