>Duane, > > With that definition, can you identify any significant content that isn’t “structured”?
Interesting to explore. Abstract, possibly not easily since there are an infinite number of things that may provide structure to something. There are definitely some examples however.
Sand on the beach is not structured other than the relationship a "beach" is "an aggregation" of "sand particles".
The ocean is not structured other than the ocean is a structure aggregated from unordered water molecules. My coffee cup is not structured as it has no discernible aggregation that I can see with my eyes on a non-microscopic level. So what else is unstructured?
A block of wood A chunk of cement A chunk of plastic Etc…
A binary file may be unstructured given I cannot even read into the contents of it. A JPEG may have no structure other than being able to go to the byte level.
I generally only like to use the term "structured" within a context. In the context of a programming language accessing data, it is much easier to use the term unstructured if the data cannot be systematically de-composed into smaller components.
Duane *********************************** Technoracle Advanced Systems Inc. Consulting and Contracting; Proven Results! i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile t. @duanenickull
NOTICE: This e-mail and any attachments may contain confidential information. If you are the intended recipient, please consider this a privileged communication, not to be forwarded without explicit approval from the sender. If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person other than the intended recipient is unauthorized and may be illegal. The originator reserves the right to monitor all e-mail communications through its networks for quality control purposes.
Duane, With that definition, can you identify any significant content that isn’t “structured”? As Rich pointed out, even natural language content is divided into
sentences, and sentences have grammatical structure with elements that are phrases made up of words, all of which contributes to conveying the intent. In a similar way a binary data stream, e.g., telemetry data or streaming video, has some internal organization
as a set of information units, with rules for its interpretation. -Ed 15-20 years ago, Haim Kilov succinctly summed up what I was on about: “I won’t agree with anything you say until you define your terms.” Structured tom me means that there are patterns within the content that can be used to divide it into smaller divisions. If I can look at the examples below,
I would state that they are all structured, given you have stated they can be divided into smaller components. I am using this at a pure abstract level and there is no presumption that the smaller components can be retrieved via any programming construct or language. If you decide to use this in the context of a specific programming language and data chunk, then it would depend on whether or not you can retrieve the content
and get it out of the larger corpus. Not sure if this is in alignment with anyone else's definitions. *********************************** Technoracle Advanced Systems Inc. Consulting and Contracting; Proven Results! i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile Well put, Ed: A text ‘is structured’ if the relevant information can be extracted by a rote process that interprets the structural elements. It is ‘semi-structured’, if you can get some of the
information that way, but you have to deal with ‘unstructured content’ to extract other important information for your purpose. It is ‘unstructured’ if you can get little or no useful information by applying a rote interpretation process. Examples:
DateTime stamps, Dates, Times, 3 line city addresses, Email addresses, URLs are all structured in the bidirectional conversion to strings and back.
Patent Claims, are semistructured, and
Patent Specifications, are unstructured, in many common application purposes.
But as Ed mentioned, it can be highly application dependent, and also technology dependent. A few years from now, there will likely be a raised level of the terms
depending on what successful new inventions reach the consumer markets over that time.
-Rich Sincerely, Rich Cooper EnglishLogicKernel.com Rich AT EnglishLogicKernel DOT com 9 4 9 \ 5 2 5 - 5 7 1 2 Duane, It seems to me only that we have different understandings of ‘the structure of a corpus’, and the relationship of the structure to the information content.
I think: A text ‘is structured’ if the relevant information can be extracted by a rote process that interprets the structural elements. It is ‘semi-structured’, if
you can get some of the information that way, but you have to deal with ‘unstructured content’ to extract other important information for your purpose. It is ‘unstructured’ if you can get little or no useful information by applying a rote interpretation
process. Do you have a definition for your use of ‘is structured’? -Ed The case you present would be structured then. If it has structured, it is structured. There is a different argument here which may be more relevant to tag
something as deterministic. Many structured documents are not deterministic which is what people usually mean when they state semi-structured. *********************************** Technoracle Advanced Systems Inc. Consulting and Contracting; Proven Results! i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile NOTICE: This e-mail and any attachments may contain confidential information. If you are the intended recipient, please consider this a privileged communication,
not to be forwarded without explicit approval from the sender. If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person
other than the intended recipient is unauthorized and may be illegal. The originator reserves the right to monitor all e-mail communications through its networks for quality control purposes. Duane, > I would like to suggest that structured/unstructured is binary. "Semi-structured" is ipso facto structured.
Well, there is some middle ground. A lot of nominally structured data has fields whose content may be critical information in an unstructured form. (This is
often the case with ontologies, e.g. OWL annotations.) Similarly, you can have essentially unstructured text with formal attachments, like a spreadsheet. I think that is what Rich means by ‘semi-structured’, given his example. -Ed I would like to suggest that structured/unstructured is binary. "Semi-structured" is ipso facto structured. *********************************** Technoracle Advanced Systems Inc. Consulting and Contracting; Proven Results! i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile NOTICE: This e-mail and any attachments may contain confidential information. If you are the intended recipient, please consider this a privileged communication,
not to be forwarded without explicit approval from the sender. If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this e-mail and destroy any copies. Any dissemination or use of this information by a person
other than the intended recipient is unauthorized and may be illegal. The originator reserves the right to monitor all e-mail communications through its networks for quality control purposes. At 100-300 words, how little structured does it have to be before its “unstructured”? How about patent claim language, which can be up to and beyond 1000 words
in a single sentence? How about Wikipedia articles which can be several thousand words in dozens on sentences, all encoded as strings? -Rich Sincerely, Rich Cooper EnglishLogicKernel.com Rich AT EnglishLogicKernel DOT com 9 4 9 \ 5 2 5 - 5 7 1 2 I would say this is not unstructured but little structured. Similar to a table in db with one or two fields containing a text blob. Martijn
On Saturday, February 8, 2014, Rich Cooper <rich@xxxxxxxxxxxxxxxxxxxxxx> wrote: Dear Kingsley, XML can also representunstructured text in string format. For example, the description of a property title's meets and bounds is typically from one hundred to three hundred words of text,
as written by the surveyor, and makes up one of the attributes in string form. The XML parser simply passes it as a string to the software. -Rich Sincerely, Rich Cooper EnglishLogicKernel.com Rich AT EnglishLogicKernel DOT com 9 4 9 \ 5 2 5 - 5 7 1 2 -----Original Message-----
From:
ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Kingsley Idehen
Sent: Saturday, February 08, 2014 10:20 AM
To:
ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] fitness of XML for ontology On 2/8/14 12:32 AM, Paul Tyson wrote: > "-- It is a data base language for text. That's the crux of the matter! A database is a document comprised of
structured data. It isn't a database management system i.e., an
application that provides services such as: storage, indexing,
declarative query access etc.. XML is fine as mechanism for marking up structured data in a document.
The utility of this process isn't optimal if the endeavor requires
entity relationships and relation semantics to be discernible and
comprehensible to human authors and readers. Links: [1]
http://bit.ly/1ievivx -- Database [2]
http://bit.ly/1d6gvSR -- Database Management System (DBMS) [3]
http://bit.ly/1n1FrMr -- Relational Database Management System (RDBMS) . -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web:
http://www.openlinksw.com Personal Weblog:
http://www.openlinksw.com/blog/~kidehen Twitter Profile:
https://twitter.com/kidehen Google+ Profile:
https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile:
http://www.linkedin.com/in/kidehen
_________________________________________________________________ Message Archives:
http://ontolog.cim3.net/forum/ontolog-forum/ Config Subscr:
http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/ Unsubscribe:
mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx Shared Files:
http://ontolog.cim3.net/file/ Community Wiki:
http://ontolog.cim3.net/wiki/ To join:
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J _________________________________________________________________ Message Archives:
http://ontolog.cim3.net/forum/ontolog-forum/ Config Subscr:
http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/ Unsubscribe:
mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx Shared Files:
http://ontolog.cim3.net/file/ Community Wiki:
http://ontolog.cim3.net/wiki/ To join:
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J _________________________________________________________________ Message Archives:
http://ontolog.cim3.net/forum/ontolog-forum/ Config Subscr:
http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/ Unsubscribe:
mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx Shared Files:
http://ontolog.cim3.net/file/ Community Wiki:
http://ontolog.cim3.net/wiki/ To join:
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (01)
|