ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] FW: Looking to the Future of Data Science - NYTimes.

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John Bottoms <john@xxxxxxxxxxxxxxxxxxxx>
Date: Sat, 30 Aug 2014 23:39:45 -0400
Message-id: <54029901.3010706@xxxxxxxxxxxxxxxxxxxx>
On 8/30/2014 11:01 PM, John F Sowa wrote:
David, Phil, and John B,

I suggest a very simple definition for Big Data:

Data whose size N (in bytes) is so large that any algorithm that
takes time that is polynomial in N (for any exponent greater than 1)
is prohibitively expensive with existing hardware.

This definition scales with the technology.  It was true in 1960, when
people did research on sorting algorithms that took (N log N) time.

The computers today are a million times bigger and faster than in 1960,
but BIg Data today cannot be processed by any polynomial algorithm
(with an exponent greater than 1).

And that definition will still be true when computers are a million
times bigger and faster than today's.
Sorry John, I disagree. Metrics are useful for assessment, but there are times when mathematics needs to be supplemented with other concepts in CS. It is not clear that having an assessment is sufficient to show you the way. It might even lead to a false negative.

When I was developing our browser in '87 I was told absolutely, by known experts, that it was not possible to search large documents in a reasonable time (3 seconds in those days). (The less knowledgeable experts just kept their opinions to themselves.) The argument was that text could only searched using relational database tables and that was too slow to make it usable. I suggested other approaches and was told that tables were the only way to search text. I am not sure if the relational db tables in those days had B-trees in them, but even if they did they would be used for control, not for data.

We adopted B-trees and that has become one of the mainstays, along with B+ trees, Red/Black trees and Page Rank, of search today. Our search results across the largest files we could muster at the time, 40MB had search response times of 30ms. This was just one of the myths that made education about search difficult at the time. I'll be saying more on that later.

But the point I would like to make is that creative use of data structures and architectures can be useful when algorithmic solutions can't be found. We have always had a practice in CS of dealing with large data; I date my entry into the discipline at 1964.

-John Bottoms
 FirstStar Systems
 Concord, MA USA
 (not far from The Old North Bridge)


John
 
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
 



_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>