Structured tom me means that there are patterns within the content that can be used to divide it into smaller divisions. If I can look at the examples below, I would state that they are all structured, given you have stated they can be divided into smaller components.
I am using this at a pure abstract level and there is no presumption that the smaller components can be retrieved via any programming construct or language.
If you decide to use this in the context of a specific programming language and data chunk, then it would depend on whether or not you can retrieve the content and get it out of the larger corpus.
Not sure if this is in alignment with anyone else's definitions.
Duane
*********************************** Technoracle Advanced Systems Inc. Consulting and Contracting; Proven Results! i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile t. @duanenickull
Well put, Ed: A
text ‘is structured’ if the relevant information can be extracted
by a rote process that interprets the structural elements. It is
‘semi-structured’, if you can get some of the information that way,
but you have to deal with ‘unstructured content’ to extract other
important information for your purpose. It is
‘unstructured’ if you can get little or no useful information
by applying a rote interpretation process. Examples: DateTime stamps, Dates, Times, 3
line city addresses, Email
addresses, URLs are all structured in the bidirectional
conversion to strings and back. Patent
Claims, are semistructured, and Patent
Specifications, are unstructured, in many common
application purposes. But as Ed mentioned, it can be highly
application dependent, and also technology dependent. A few years from
now, there will likely be a raised level of the terms depending on what
successful new inventions reach the consumer markets over that time. -Rich Sincerely, Rich Cooper EnglishLogicKernel.com Rich AT EnglishLogicKernel DOT com 9 4 9 \ 5 2 5 - 5 7 1 2 Duane, It seems to me only
that we have different understandings of ‘the structure of a
corpus’, and the relationship of the structure to the information
content. I think: A text ‘is
structured’ if the relevant information can be extracted by a rote
process that interprets the structural elements. It is
‘semi-structured’, if you can get some of the information that way,
but you have to deal with ‘unstructured content’ to extract other
important information for your purpose. It is
‘unstructured’ if you can get little or no useful information
by applying a rote interpretation process. Do you have a
definition for your use of ‘is structured’? -Ed The case you present
would be structured then. If it has structured, it is structured. There
is a different argument here which may be more relevant to tag something as
deterministic. Many structured documents are not deterministic which is
what people usually mean when they state semi-structured. *********************************** Technoracle Advanced
Systems Inc. Consulting and
Contracting; Proven Results! i. Neo4J, PDF,
Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile NOTICE: This
e-mail and any attachments may contain confidential information. If you
are the intended recipient, please consider this a privileged communication,
not to be forwarded without explicit approval from the sender. If you are not
the intended recipient, please notify the sender immediately by return
e-mail, delete this e-mail and destroy any copies. Any dissemination
or use of this information by a person other than the intended recipient
is unauthorized and may be illegal. The originator reserves the right
to monitor all e-mail communications through its networks for quality
control purposes. Duane, > I would like to
suggest that structured/unstructured is binary.
"Semi-structured" is ipso facto structured. Well, there is some
middle ground. A lot of nominally structured data has fields whose
content may be critical information in an unstructured form. (This is
often the case with ontologies, e.g. OWL annotations.) Similarly, you can
have essentially unstructured text with formal attachments, like a
spreadsheet. I think that is what Rich means by ‘semi-structured’,
given his example. -Ed I would like to
suggest that structured/unstructured is binary.
"Semi-structured" is ipso facto structured. *********************************** Technoracle Advanced
Systems Inc. Consulting and
Contracting; Proven Results! i. Neo4J, PDF,
Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile NOTICE: This
e-mail and any attachments may contain confidential information. If you
are the intended recipient, please consider this a privileged communication, not
to be forwarded without explicit approval from the sender. If you
are not the intended recipient, please notify the sender immediately
by return e-mail, delete this e-mail and destroy any copies.
Any dissemination or use of this information by a person other than
the intended recipient is unauthorized and may be illegal. The originator
reserves the right to monitor all e-mail communications through its
networks for quality control purposes. At 100-300 words, how little structured
does it have to be before its “unstructured”? How about
patent claim language, which can be up to and beyond 1000 words in a single
sentence? How about Wikipedia articles which can be several thousand
words in dozens on sentences, all encoded as strings? -Rich Sincerely, Rich Cooper EnglishLogicKernel.com Rich AT EnglishLogicKernel DOT com 9 4 9 \ 5 2 5 - 5 7 1 2 I would say this is not unstructured but
little structured. Similar to a table in db with one or two fields containing a
text blob. Martijn
On Saturday, February 8, 2014, Rich Cooper <rich@xxxxxxxxxxxxxxxxxxxxxx>
wrote: Dear Kingsley, XML can also representunstructured
text in string format. For example, the description of a property title's
meets and bounds is typically from one hundred to three hundred words of text,
as written by the surveyor, and makes up one of the attributes in string
form. The XML parser simply passes it as a string to the software. -Rich Sincerely, Rich Cooper EnglishLogicKernel.com Rich AT EnglishLogicKernel DOT com 9 4 9 \ 5 2 5 - 5 7 1 2 -----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of
Kingsley Idehen
Sent: Saturday, February 08, 2014 10:20 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] fitness
of XML for ontology On 2/8/14 12:32 AM, Paul Tyson wrote: > "-- It is a data base language for text. That's the crux of the matter! A database is a document
comprised of structured data. It isn't a database management system i.e.,
an application that provides services such as: storage,
indexing, declarative query access etc.. XML is fine as mechanism for marking up structured data in a
document. The utility of this process isn't optimal if the endeavor
requires entity relationships and relation semantics to be discernible
and comprehensible to human authors and readers. Links: [1] http://bit.ly/1ievivx
-- Database [2] http://bit.ly/1d6gvSR
-- Database Management System (DBMS) [3] http://bit.ly/1n1FrMr
-- Relational Database Management System (RDBMS) . -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J _________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (01)
|