[Top] [All Lists]

Re: [ontolog-forum] FW: Looking to the Future of Data Science - NYTimes.

To: "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Barkmeyer, Edward J" <edward.barkmeyer@xxxxxxxx>
Date: Tue, 2 Sep 2014 15:25:16 +0000
Message-id: <d6755fdc213345e6ab3fbf77f2611b3d@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

I regret to say, I think this definition is about buzzword maintenance.  The idea is clearly:  Big Data is about inventing a new information processing technology that will work better for datasets that RDB technology just can’t handle – “a paradigm shift” in technology. 


What is wanted is not a paradigm shift in processing technology – the last two paradigm shifts got us XML databases and XQuery and RDF triple stores, both of which are clumsy repositories that just make the Big Data problem more expensive. 


What is wanted (as Michael Brunnbauer hinted) is a paradigm shift in data acquisition mindset.  I will paraphrase some other contribution to this exploder, which I have since lost:  “If you don’t know what you have when you get it, you will never know it later.”  


There is a big difference between large volumes of data that must be maintained in order to perform a particular set of business or governmental functions and responsibilities, and large volumes of data that are available and might enable some analytical process that is at best desirable.  Amazingly enough, we have muddled through the support of the former for 50 years with established technologies and state of the art computational resources, and newer technologies have become established as the quality of the implementations and the resources for supporting them became able to carry the increasing load.  We have been able to do this by working around the limitations to deliver satisfactory, if less than ideal, services somehow.  As John Sowa and others have said, this is a recurring problem; it is not a new problem.


The problem we have is with our appetite.  There is so much information food out there that we could surely find the taste treats for the most discriminating palates if we could just search it all fast enough.  That is all very exciting, but it is irrelevant to solving the problem of delivering to everyone his daily information bread.  The problem is in focusing on what we need to process, not what we would like to process.  The people who are concerned about data they need to process in order to deliver adequate services and products are experiencing the 2014 version of the 1960 problem.  The rest are just blowing Big Data horns.


The would-be ISO definition fails to say: 

Big Data: a data set(s) with characteristics that for *a required function* at a given point in time cannot be efficiently processed using current/existing/established/traditional technologies and techniques in order to *provide adequate support for that function*.


It is not about an arbitrary “particular problem domain” or being able to “extract [some perceived] value”.   That is an academic view, and why we have research institutions.




"Know your own resources; emphasize your strengths but do not

ignore your limitations.  Plan what you know you can do; and do

not consider what you cannot do."

  -- Sun Tzu, c. 300 B.C.



From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of David Price
Sent: Saturday, August 30, 2014 2:43 PM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] FW: Looking to the Future of Data Science - NYTimes.com - 2014.08.27


On 30 Aug 2014, at 18:35, Kingsley Idehen <kidehen@xxxxxxxxxxxxxx> wrote:

When one accesses Big Data for some purpose, what has one really accessed? 

Data distorted by qualification using a meaningless buzz-phrase 


According the ISO task force [1] trying to make sense of big data a draft definition is:


Big Data: a data set(s) with characteristics that for a particular problem domain at a given point in time cannot be efficiently processed using current/existing/established/traditional technologies and techniques in order to extract value


You may not agree with the text of that particular definition, but it is not simply a buzzword. Established technologies have their limits, and "big data" is what organizations are creating that has "broken" those technologies causing them to try to find alternatives to them. As with anything new and in the media, there will be hangers-on who pervert the original intent (e.g. the RDBMS vendors now claiming they are big data too). However, the fact that some organizations try to take advantage of a problem for their own purposes does not mean the problem does not exist.


FWIW That same definition also includes the text:


[Editor’s Note on June 2014]: there is a concern about the Big Data definition is too narrow within the dataset(s) concept; it is more a paradigm shift of technology changes. Contributions are invited.

Perhaps a few of you might want to take them up on their invitation.






UK +44 7788 561308

US +1 336 283 0606



Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>