Well put, Ed:
A
text ‘is structured’ if the relevant information can be extracted
by a rote process that interprets the structural elements. It is
‘semi-structured’, if you can get some of the information that way,
but you have to deal with ‘unstructured content’ to extract other
important information for your purpose. It is
‘unstructured’ if you can get little or no useful information
by applying a rote interpretation process.
Examples:
DateTime stamps,
Dates,
Times,
3
line city addresses,
Email
addresses,
URLs
are all structured in the bidirectional
conversion to strings and back.
Patent
Claims,
are semistructured, and
Patent
Specifications,
are unstructured, in many common
application purposes.
But as Ed mentioned, it can be highly
application dependent, and also technology dependent. A few years from
now, there will likely be a raised level of the terms depending on what
successful new inventions reach the consumer markets over that time.
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
From:
ontolog-forum-bounces@xxxxxxxxxxxxxxxx
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Barkmeyer, Edward J
Sent: Monday, February 10, 2014
11:03 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] fitness of XML for ontology
Duane,
It seems to me only
that we have different understandings of ‘the structure of a
corpus’, and the relationship of the structure to the information
content.
I think:
A text ‘is
structured’ if the relevant information can be extracted by a rote
process that interprets the structural elements. It is
‘semi-structured’, if you can get some of the information that way,
but you have to deal with ‘unstructured content’ to extract other
important information for your purpose. It is
‘unstructured’ if you can get little or no useful information
by applying a rote interpretation process.
Do you have a
definition for your use of ‘is structured’?
-Ed
From:
ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Duane Nickull
Sent: Monday, February 10, 2014
1:08 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] fitness of XML for ontology
The case you present
would be structured then. If it has structured, it is structured. There
is a different argument here which may be more relevant to tag something as
deterministic. Many structured documents are not deterministic which is
what people usually mean when they state semi-structured.
***********************************
Technoracle Advanced
Systems Inc.
Consulting and
Contracting; Proven Results!
i. Neo4J, PDF,
Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
NOTICE: This
e-mail and any attachments may contain confidential information. If you
are the intended recipient, please consider this a privileged communication,
not to be forwarded without explicit approval from the sender. If you are not
the intended recipient, please notify the sender immediately by return
e-mail, delete this e-mail and destroy any copies. Any dissemination
or use of this information by a person other than the intended recipient
is unauthorized and may be illegal. The originator reserves the right
to monitor all e-mail communications through its networks for quality
control purposes.
Duane,
> I would like to
suggest that structured/unstructured is binary.
"Semi-structured" is ipso facto structured.
Well, there is some
middle ground. A lot of nominally structured data has fields whose
content may be critical information in an unstructured form. (This is
often the case with ontologies, e.g. OWL annotations.) Similarly, you can
have essentially unstructured text with formal attachments, like a
spreadsheet. I think that is what Rich means by ‘semi-structured’,
given his example.
-Ed
I would like to
suggest that structured/unstructured is binary.
"Semi-structured" is ipso facto structured.
***********************************
Technoracle Advanced
Systems Inc.
Consulting and
Contracting; Proven Results!
i. Neo4J, PDF,
Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
NOTICE: This
e-mail and any attachments may contain confidential information. If you
are the intended recipient, please consider this a privileged communication, not
to be forwarded without explicit approval from the sender. If you
are not the intended recipient, please notify the sender immediately
by return e-mail, delete this e-mail and destroy any copies.
Any dissemination or use of this information by a person other than
the intended recipient is unauthorized and may be illegal. The originator
reserves the right to monitor all e-mail communications through its
networks for quality control purposes.
At 100-300 words, how little structured
does it have to be before its “unstructured”? How about
patent claim language, which can be up to and beyond 1000 words in a single
sentence? How about Wikipedia articles which can be several thousand
words in dozens on sentences, all encoded as strings?
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
I would say this is not unstructured but
little structured. Similar to a table in db with one or two fields containing a
text blob.
Martijn
On Saturday, February 8, 2014, Rich Cooper <rich@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
Dear Kingsley,
XML can also representunstructured
text in string format. For example, the description of a property title's
meets and bounds is typically from one hundred to three hundred words of text,
as written by the surveyor, and makes up one of the attributes in string
form. The XML parser simply passes it as a string to the software.
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of
Kingsley Idehen
Sent: Saturday, February 08, 2014 10:20 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] fitness
of XML for ontology
On 2/8/14 12:32 AM, Paul Tyson wrote:
> "-- It is a data base language for text.
That's the crux of the matter! A database is a document
comprised of
structured data. It isn't a database management system i.e.,
an
application that provides services such as: storage,
indexing,
declarative query access etc..
XML is fine as mechanism for marking up structured data in a
document.
The utility of this process isn't optimal if the endeavor
requires
entity relationships and relation semantics to be discernible
and
comprehensible to human authors and readers.
Links:
[1] http://bit.ly/1ievivx
-- Database
[2] http://bit.ly/1d6gvSR
-- Database Management System (DBMS)
[3] http://bit.ly/1n1FrMr
-- Relational Database Management System (RDBMS) .
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (01)
|