ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Big Data Buzzwords From A to Z

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Sun, 02 Dec 2012 10:47:22 -0500
Message-id: <50BB780A.2040105@xxxxxxxxxxx>
Dear Matthew,    (01)

MW
> OK, so based on this list, big data is mostly about massively parallel
> data warehousing and the low level technologies that support various
> approaches to this, particularly on cheap hardware.    (02)

I agree that many of the terms in the list suggest that conclusion --
the terms 'Hadoop', 'map/reduce', and the names of software designed
to process huge amounts of data.  Map/reduce is an algorithm published
by Google, and Hadoop is an open source implementation by Yahoo.    (03)

But Google and Yahoo process huge amounts of web data. Cost/performance
is very important for them, but so is semantics.  They also process,
store, and analyze the links in and among web pages and their contents.    (04)

The letter D is represented by Data warehousing.  But note slide 5:
> But as data volumes explode, data warehouse systems are rapidly
> changing. They need to store more data -- and more kinds of data
> -- making their management a challenge.    (05)

http://www.crn.com/slide-shows/data-center/240142568/big-data-buzzwords-from-a-to-z.htm?pgno=5    (06)

MW
> In addition, analysis of unstructured data is thrown in as well, but
> I guess that is just an input to the data warehousing.    (07)

Note the terms 'text analytics', 'geospatial analysis', 'quantitative
data analytics', and 'visualization'.  They are definitely concerned
about the semantics of everything in the warehouse.    (08)

As slide 5 says, data warehousing means much more than it did when
the term was introduced 25 years ago.  Google and Yahoo, for example,
have enormous data warehouses, but each web page has as much semantic
information about its content as they can derive from it and from
all the pages it's linked to and from.    (09)

Note all the terms that refer to databases:  'columnar database',
'NoSQL', and 'relational database'.  The term 'extract, transform,
and load' addresses the issues of aligning and mapping independent
databases.  Note 'sharding' for partitioning databases -- that
requires a lot of semantics about a database and how it's used.
Note 'Whirr' for "running libraries for data cloud services".    (010)

Note Kafka -- a messaging system developed by LinkedIn and
contributed to the Apache Foundation.  LinkedIn maintains a lot
of semantic information about their members and their interests.    (011)

The tools contributed to the Apache Foundation are developed and used
by multibillion dollar corporations.  They hire the best graduates from
the best universities.  Compared to them, the academics who designed
OWL and SPARQL are amateurs who don't understand the problems.    (012)

I don't believe we should dump everything developed by the SW.
But if the academics want mainstream IT to adopt their toys, they
have to understand the problems of mainstream IT.  They can start
by studying what industry does now and needs to do in the future.    (013)

John    (014)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (015)

<Prev in Thread] Current Thread [Next in Thread>