ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Big Data Buzzwords From A to Z

To: "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Natale, Bob" <RNATALE@xxxxxxxxx>
Date: Sun, 2 Dec 2012 17:45:47 +0000
Message-id: <A65E21691881E64DBF058A66E53068ED069A8764@xxxxxxxxxxxxxxxxxx>
Most people immersed in Big Data processing today would probably have gone with 
"Advanced Analytics"* as the most representative "A" word (not "ACID").  The 
meaning would encompass the included "geospatial analysis", "quantitative data 
analysis", and "text analytics" entries, for sure, but also entails predictive, 
stream, and other varieties of analytics, as applied to some understanding of 
domain relevance.      (01)

And concerning that last aspect, Advanced Analytics could benefit hugely from 
the application of Reasoner capabilities from the Semantic Web tools 
category.  Need to find the "sweet spot" where the Reasoner does not require a 
full ontology and where the Big Data metadata provides a useful degree of 
discrimination (clarity and granularity) ... on the latter, perhaps the Apache 
Accumulo project (contributed by the NSA) and its multi-element keys enabling 
cell-based access control (and other capabilities) is a target at which the 
Reasoner developers could aim?  (See "Accumulo:  Extensions to Google's Bigtable
Design", http://people.apache.org/~afuchs/slides/morgan_state_talk.pdf for an 
authoritative introduction to Accumulo ... s     (02)

[ * - Some might want to use "Analytics" w/o the "Advanced" qualifier here, but 
I consider it useful to distinguish between them ... in my usage, "Analytics" 
can be readily mechanized in software, "Advanced Analytics" requires 
qualitatively more reasoning power applied across a broader data and semantic 
scope ... successful application of "Advanced Analytics" in a given domain over 
time leads to more and improved "Analytics" capabilities in that domain.]    (03)

Cheers,
BobN    (04)

-----Original Message-----
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx 
[mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F Sowa
Sent: Sunday, December 02, 2012 10:47 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] Big Data Buzzwords From A to Z    (05)

Dear Matthew,    (06)

MW
> OK, so based on this list, big data is mostly about massively parallel
> data warehousing and the low level technologies that support various
> approaches to this, particularly on cheap hardware.    (07)

I agree that many of the terms in the list suggest that conclusion --
the terms 'Hadoop', 'map/reduce', and the names of software designed
to process huge amounts of data.  Map/reduce is an algorithm published
by Google, and Hadoop is an open source implementation by Yahoo.    (08)

But Google and Yahoo process huge amounts of web data. Cost/performance
is very important for them, but so is semantics.  They also process,
store, and analyze the links in and among web pages and their contents.    (09)

The letter D is represented by Data warehousing.  But note slide 5:
> But as data volumes explode, data warehouse systems are rapidly
> changing. They need to store more data -- and more kinds of data
> -- making their management a challenge.    (010)

http://www.crn.com/slide-shows/data-center/240142568/big-data-buzzwords-from-a-to-z.htm?pgno=5    (011)

MW
> In addition, analysis of unstructured data is thrown in as well, but
> I guess that is just an input to the data warehousing.    (012)

Note the terms 'text analytics', 'geospatial analysis', 'quantitative
data analytics', and 'visualization'.  They are definitely concerned
about the semantics of everything in the warehouse.    (013)

As slide 5 says, data warehousing means much more than it did when
the term was introduced 25 years ago.  Google and Yahoo, for example,
have enormous data warehouses, but each web page has as much semantic
information about its content as they can derive from it and from
all the pages it's linked to and from.    (014)

Note all the terms that refer to databases:  'columnar database',
'NoSQL', and 'relational database'.  The term 'extract, transform,
and load' addresses the issues of aligning and mapping independent
databases.  Note 'sharding' for partitioning databases -- that
requires a lot of semantics about a database and how it's used.
Note 'Whirr' for "running libraries for data cloud services".    (015)

Note Kafka -- a messaging system developed by LinkedIn and
contributed to the Apache Foundation.  LinkedIn maintains a lot
of semantic information about their members and their interests.    (016)

The tools contributed to the Apache Foundation are developed and used
by multibillion dollar corporations.  They hire the best graduates from
the best universities.  Compared to them, the academics who designed
OWL and SPARQL are amateurs who don't understand the problems.    (017)

I don't believe we should dump everything developed by the SW.
But if the academics want mainstream IT to adopt their toys, they
have to understand the problems of mainstream IT.  They can start
by studying what industry does now and needs to do in the future.    (018)

John    (019)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (020)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (021)

<Prev in Thread] Current Thread [Next in Thread>