regret to say, I think this definition is about buzzword
maintenance. The idea is clearly: Big Data is about
inventing a new information processing technology that will
work better for datasets that RDB technology just can’t handle
– “a paradigm shift” in technology.
is wanted is not a paradigm shift in processing technology –
the last two paradigm shifts got us XML databases and XQuery
and RDF triple stores, both of which are clumsy repositories
that just make the Big Data problem more expensive.
is wanted (as Michael Brunnbauer hinted) is a paradigm shift
in data acquisition mindset. I will paraphrase some other
contribution to this exploder, which I have since lost: “If
you don’t know what you have when you get it, you will never
know it later.”
is a big difference between large volumes of data that must be
maintained in order to perform a particular set of business or
governmental functions and responsibilities, and large volumes
of data that are available and might enable some analytical
process that is at best desirable. Amazingly enough, we have
muddled through the support of the former for 50 years with
established technologies and state of the art computational
resources, and newer technologies have become established as
the quality of the implementations and the resources for
supporting them became able to carry the increasing load. We
have been able to do this by working around the limitations to
deliver satisfactory, if less than ideal, services somehow.
As John Sowa and others have said, this is a recurring
problem; it is not a new problem.
problem we have is with our appetite. There is so much
information food out there that we could surely find the taste
treats for the most discriminating palates if we could just
search it all fast enough. That is all very exciting, but it
is irrelevant to solving the problem of delivering to everyone
his daily information bread. The problem is in focusing on
what we need to process, not what we would like to process.
The people who are concerned about data they need to process
in order to deliver adequate services and products are
experiencing the 2014 version of the 1960 problem. The rest
are just blowing Big Data horns.
would-be ISO definition fails to say:
Big Data: a data set(s) with
characteristics that for *a required function*
at a given point in time cannot be efficiently processed using
current/existing/established/traditional technologies and
techniques in order to *provide adequate support for that
is not about an arbitrary “particular problem domain” or being
able to “extract [some perceived] value”. That is an
academic view, and why we have research institutions.