ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Big Data Buzzwords From A to Z

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: William Frank <williamf.frank@xxxxxxxxx>
Date: Sun, 2 Dec 2012 18:48:08 -0500
Message-id: <CALuUwtAuzWzV3WqGLYGAvAPba6YR14RPmd9JGcwk6c7qs+3Wqg@xxxxxxxxxxxxxx>
Thanks, Kingsley, John, and of course Peter Chen, a father to us all.  I am very happy to see these three pieces of work linked.

I have found  some very skilled developers in the Hadoop space believing that because they are not using a relational database, they have no need for E/R-style concept models (be these expressed in traditional E/R, UML, etc.. ).   I have seen them go along fine, based on their being so smart they can keep the underlying relationships in their heads, till their systems start to grow.   I wonder if there are any very practically-oriented brief sources that I could point them to so that the folly could be avoided, instead of having to wait till it obviously needs fixing.


On Sun, Dec 2, 2012 at 5:21 PM, Kingsley Idehen <kidehen@xxxxxxxxxxxxxx> wrote:
On 12/2/12 4:58 PM, Obrst, Leo J. wrote:
Kingsley, I agree. Implicit semantics in procedural code and structural data models just don't get us where we need to go. Humans simply cannot do all the semantic interpretation, or you never break out of this bottleneck. Machines have to lend a hand, by doing some explicit semantic interpretation. Machine understanding? No. Don't go down  that rabbit hole.

Thanks,
Leo

Yep!

Today's "Big Data" meme is just the latest in a series of buzzwords (and phrases) that just dance around the fact that entity relationship model semantics can exists in self-describing structured data. Arguments about formats are an eternal distraction that play into the hands of marketeers that fall into two categories:

1. understand the problem but muddy the waters for competitive reasons
2. are clueless about the matter and muddy the waters inadvertently.

As clearly articulated by Peter Chen [1] circa. 1976. An entity relationship model can be used to construct a unified view of disparate shaped data. Similar clarity comes from John in his work relating to conceptual graphs and the use of Logic as the conceptual schema for driving data integration via semantics [2].

Where it all gets into trouble is when a single entity makes a power-grab for all things relating to semantics, which is where RDF drove itself into a horrible ditch, first time around. Of course, this wasn't the fault of those who now much better (e.g., Pat Hayes and a few others that worked on RDF), the problem arose from those who (for the most part) discovered these concepts via RDF and failed to understand that RDF was a tweak of what already existed, as is always the case in an innovation continuum. The problem then got compounded when the Linked Data meme took off.

Links:

1. http://slidesha.re/SbfHQG -- Integrating Semantic Systems (John Sowa)
2. http://bit.ly/YTdz3N -- Entity Relationship Model: Unified View of Data (Peter Chen).
3. http://slidesha.re/UGg18k -- connecting Big Data, Linked Data, RDF, and the Semantic Web vision (my attempt to connect these artificially disconnected dots).

Kingsley

-----Original Message-----
From: ontolog-forum-bounces@ontolog.cim3.net [mailto:ontolog-forum-
bounces@xxxxxxxxxxxxxxxx] On Behalf Of Kingsley Idehen
Sent: Sunday, December 02, 2012 4:22 PM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] Big Data Buzzwords From A to Z

On 12/1/12 11:11 PM, John F Sowa wrote:
I came across some slides with "Big Data Buzzwords From A to Z".
So I browsed through them to see what terms the author (and his
associates) consider important.  See below for the list of terms
and the URL of the slides.

The words 'logic' and 'ontology' are not on their list.  Tools
developed and supported by the Apache Foundation are very well
represented.  The word for X is 'XML'.  But none of the slides
mention the Semantic Web or any of its notations and tools.
John,

Which is why "Big Data" is typically pitched as being all about the
exponential effects:

1. data volume
2. data velocity
3. data variety.

Unfortunately, for the proponent of this particular meme, they omit
"data verity" which is exactly where the virtues of machine
comprehensible semantics delivered by self-describing structured data
comes into play. Basically, they never hone into the real problem, one
that's 40+ years old, at the very least.

When all is said an done,  the mercurial pursuit (40+ years and
counting) is still all about agility driven by insights culled from
heterogeneously shaped data, accessible from disparate network locations.

To me, meme labels and monikers just don't matter, substance of the
matter -- insightful data access, integration, and management -- is
ultimately the immutable item of relevance :-)

Kingsley
John

___________________________________________________________
_________
Big Data Buzzwords From A to Z

by Rick Whiting

Big data is one of the, well, biggest trends in IT today, and it has
spawned a whole new generation of technology to handle it. And, with
new technologies come new buzzwords: acronyms, technical terms,
product
names, etc.

Even the phrase "big data" itself can be confusing. Many think of
"lots of data" when they hear it, but big data is much more than just
data volume.

Here, in alphabetical order, are some of the buzzwords we think you
need to be familiar with.

ACID, Atomicity, Consistency, Isolation and Durability.
Big data.
Columnar (or Column-Oriented) Database.
Data Warehousing.
Extract, transform and load (ETL).
Flume, a technology in the Apache Hadoop family.
Geospatial Analysis.
Hadoop.
In-Memory Database.
Java.
Kafka, a high-throughput, distributed messaging system.
Latency.
Map/reduce.
NoSQL Databases.
Oozie, an open-source workflow engine.
Pig, a platform for analyzing huge data sets.
Quantitative Data Analysis.
Relational Database.
Sharding, a form of database partitioning.
Text Analytics.
Unstructured Data.
Visualization.
Whirr, libraries for running big data cloud services.
XML.
Yottabyte.
ZooKeeper, manage and coordinate Hadoop nodes.

http://www.crn.com/slide-shows/data-center/240142568/big-data-
buzzwords-from-a-to-z.htm


___________________________________________________________
______
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@ontolog.cim3.net
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J




--

Regards,

Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen




  _________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@ontolog.cim3.net
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
 



--

Regards,

Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen







_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
 



--
William Frank

413/376-8167


This email is confidential and proprietary, intended for its addressees only.
It may not be distributed to non-addressees, nor its contents divulged,
without the permission of the sender.

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>