[Top] [All Lists]

Re: [ontolog-forum] Data, Silos, Interoperability, and Agility

To: Gary Berg-Cross <gbergcross@xxxxxxxxx>
Cc: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: John Bottoms <john@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 29 Sep 2013 21:06:34 -0400
Message-id: <5248CE9A.2090300@xxxxxxxxxxxxxxxxxxxx>
Gary (comments below in blue)
On 9/29/2013 11:05 AM, Gary Berg-Cross wrote:
John B

You mentioned taxonomies as part of the NIST Big Data: Definitions and Taxonomies paper. Often taxonomies are used in definitions
so I was interested to see what definitions of data and Big data were in the document.  Actually the definitions seemed quite modest
and perhaps homegrown.  For example there was this for Data Curation

Data Curating provides cleansing, outlier removal, standardization for the ingestion and storage processes

This type of thing got me to wondering whether there is a good source of prior work to build these definitions on.

Yes, I wondering that also as I read through it. The materials in the paper are broadly stated definitions and concepts. I believe that it reflects NIST's charter in that it is a step toward gathering consensus on the use and structure of Big Data systems and taxonomies.

As to prior work, I would argue that tomorrow's systems require tomorrow's definitions. One may start with an existing definition, but the real key is identifying the stakeholders and gathering their contributions toward building robust definitions.

The first issue one encounters is of the multitude of stakeholders and the broadly stated goals of "Big Data". Each person hears them speaking in their own language (in Babel) when in fact what really exists is an unspecified concept.

The next issue is that as soon as you define a group of stakeholders you have segregated off a group with common concepts, language, jargon and syntax. That is probably as it should be. Psychologists can tell us that children can learn multiple languages when they are young but as they grow they want to disregard the languages that are less useful. This results in technology islands or silos when discussing Big Data. My person belief is that the silo's or segregations that are most useful as those that are aligned with the academic world. They closely track industry needs, with some delay; and the academic world has long memories and they also focus the discussions around topics that accrete reasonably well. So, are academic research areas "silos"? How do we handle common issues that crossover between multiple academic areas. That is a question of growing concern.

There are other issues that affect definitions as well. One of my personal favorites concerns the myths that we select to define where we are, and where we should go. Another is my desire to have some functional definitions for ontology components that are machine and human readable. These may have to wait for a later discussion.

I didn't see citations of things from ISO for example.  Not that these would necessarily be better  but one might review

prior work in the data sciences as at least a point  of departure and comparison.

Actually some of the best discussion to point to is on the Ontolog Forum where the topic of 

"What is Data? What is a Datum?"

I'm not sure that anyone picked up on this exchange and tried to form concise definitions from it,
but that might be a nice thing to see in a definitional effort. Perhaps this is already part of some definitional effort,
and if it is I would welcome knowing about it.  These types of definitions grounding some higher concepts in
lower ones can tend to spin around and be reinvented without progressing such as definitions of data that start 
simply with "data is information"

For example In computing, data is information that has been translated into a form that is more convenient to move or process.


Structure Data?

In the broadest terms, Structured Data is information that has been organized in a certain way to be interpreted by someone or something."

Discussed as part of Schema.org


Gary Berg-Cross, Ph.D.  
SOCoP Executive Secretary
Knowledge Strategies    
Potomac, MD

On Fri, Sep 27, 2013 at 6:44 PM, John Bottoms <john@xxxxxxxxxxxxxxxxxxxx> wrote:
David et al,

Last month (Aug.) NIST released a document on taxonomies for Big Data. It is interesting to this discussion in part because it includes an architectural drawing and addresses interfaces to legacy systems.

Further, in describing the basic architecture for Big Data, they briefly discuss their architectural view of analytics and taxonomies.

In looking at taxonomy papers that have been published recently, it is apparent that the definition and use of the term "taxonomy" is malleable. Perhaps that is ok for now because a broad definition may be appropriate for the level of development.

The paper is a MSWord document, NIST Big Data: Definitions and Taxonomies", is at:
( http://bigdatawg.nist.gov/_uploadfiles/M0142_v4_2364649822.docx)

-John Bottoms
 FirstStar Systems
 Concord, MA USA

On 9/27/2013 6:23 PM, Kingsley Idehen wrote:
On 9/27/13 3:24 PM, John F Sowa wrote:
On 9/27/2013 1:39 PM, Kingsley Idehen wrote:
To what degree did data-silo-fication matter (in the minds or users and
IT decision makers) prior to the ubiquitous Web and Internet explosions?
That was the central focus of the conceptual schema (CS) work during
the 1970s.  Their goal was to define the conceptual schema as the
semantic specification language (roughly speaking, logic + ontology).
In fact, my first publication on conceptual graphs addressed that issue:

     Conceptual graphs for a database interface

Yes, I read that paper. 1976 was clearly an insightful year :-)

Then the APIs for all applications, the physical DBs (network,
relational, or hierarchical), and the user interface would be
mapped to and from the CS.

That would enable applications and user interfaces to be independent
of the details of the physical storage.  You could mix & match them
in any combination.


Yes, but what happened after that, following the rise of the SQL RDBMS?
Unfortunately, certain vendors correctly saw that the CS could weaken
their market dominance.


  So they blocked all attempts to define a
standard for the conceptual schema.


  That project ended as an ANSI
technical report in 1978.  It was later revived by ISO, and ended as
an ISO TR in 1987 and another TR in 1999.  No standards.

And they've played this game successfully for years. It's taken the combined effects of the Web and Internet to create an industry inflection that's finally altered the landscape -- for these counter-productive RDBMS vendor patterns.

In comparison, I would call the SW hype naive, provincial, and
based on wishful thinking that was untested against reality.
Methinks, too harsh, even on its very worst "poor narrative" day :-)
Maybe.  But my "harsh" words are the result of my frustration with
the lost opportunity.

I understand the frustration, believe me I do. That said, I also believe the opportunity isn't lost thanks to the principles that underlie Linked Open Data.

   I had hoped that an SW along the lines of TBL's
DAML proposal could produce a conceptual schema that was outside the
clutches of the DB vendors -- but interoperable with them.

That's exactly where we are headed. It's what Linked Open Data is all about i.e., delivering Open Data Connectivity and Open Database Connectivity [1] etc..

[1] http://bit.ly/15zUSDa -- Data Connectivity & Database Connectivity Pathways Illustrated (draft) .


Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>