On 8/30/2014 11:01 PM, John F Sowa
wrote:
David, Phil, and John B,
I suggest a very simple definition for Big Data:
Data whose size N (in bytes) is so large that any algorithm that
takes time that is polynomial in N (for any exponent greater than 1)
is prohibitively expensive with existing hardware.
This definition scales with the technology. It was true in 1960, when
people did research on sorting algorithms that took (N log N) time.
The computers today are a million times bigger and faster than in 1960,
but BIg Data today cannot be processed by any polynomial algorithm
(with an exponent greater than 1).
And that definition will still be true when computers are a million
times bigger and faster than today's.
Sorry John, I disagree. Metrics are useful for
assessment, but there are times when mathematics needs to be
supplemented with other concepts in CS. It is not clear that
having an assessment is sufficient to show you the way. It might
even lead to a false negative.
When I was developing our browser in '87 I was told absolutely,
by known experts, that it was not possible to
search large documents in a reasonable time (3 seconds in those
days). (The less knowledgeable experts just kept their opinions to
themselves.) The argument was that text could only searched using
relational database tables and that was too slow to make it
usable. I suggested other approaches and was told that tables were
the only way to search text. I am not sure if the
relational db tables in those days had B-trees in them, but even
if they did they would be used for control, not for data.
We adopted B-trees and that has become one of the mainstays, along
with B+ trees, Red/Black trees and Page Rank, of search today. Our
search results across the largest files we could muster at the
time, 40MB had search response times of 30ms. This was just one of
the myths that made education about search difficult at the time.
I'll be saying more on that later.
But the point I would like to make is that creative use of data
structures and architectures can be useful when algorithmic
solutions can't be found. We have always had a practice in CS of
dealing with large data; I date my entry into the discipline at
1964.
-John Bottoms
FirstStar Systems
Concord, MA USA
(not far from The Old North Bridge)
John
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (01)
|