[Top] [All Lists]

Re: [ontolog-forum] FW: Looking to the Future of Data Science - NYTimes.

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Sun, 31 Aug 2014 02:33:57 -0400
Message-id: <5402C1D5.3000008@xxxxxxxxxxx>
On 8/30/2014 11:39 PM, John Bottoms wrote:
> I disagree.    (01)

I have no idea what you're disagreeing with.  I basically agree with
what you wrote, and I can't see anything that is inconsistent with
my recommended definition:    (02)

To repeat:
> I suggest a very simple definition for Big Data:
> Data whose size N (in bytes) is so large that any algorithm that
> takes time that is polynomial in N (for any exponent greater than 1)
> is prohibitively expensive with existing hardware.    (03)

> When I was developing our browser in '87 I was told absolutely,
> by known experts, that it was not possible to search large documents
> in a reasonable time (3 seconds in those days).The argument was
> that text could only searched using relational database tables and
> that was too slow to make it usable.    (04)

Clearly, those so-called experts didn't know what they were talking
about.  But it's irrelevant to the definition I proposed.  Fundamental
principles:    (05)

  1. With indexes, you can find what you're looking for in (log N) time.    (06)

  2. But to create the indexes, you need algorithms that take no more
     than (N log N) time.    (07)

  3. For finding data, linear searches are bad.  For creating indexes,
     polynomial time is hopelessly inefficient.    (08)

  4. There's always room for innovation in finding (log N) algorithms
     for searching more complex data in more flexible ways.    (09)

> We adopted B-trees and that has become one of the mainstays, along
> with B+ trees, Red/Black trees and Page Rank, of search today.    (010)

Of course.  Those algorithms are logarithmic.  With any kind of
hardware, you need logarithmic algorithms to search indexed data.
And you must have no worse than (N log N) to create the indexes.    (011)

> This was just one of the myths that made education about search
> difficult at the time.    (012)

Back in the 1960s, Donald Knuth was very clear about these issues.
Anybody who had studied Knuth could never have made the kind of
claims you're talking about.    (013)

> But the point I would like to make is that creative use of data
> structures and architectures can be useful when algorithmic solutions
> can't be found. We have always had a practice in CS of dealing with
> large data; I date my entry into the discipline at 1964.    (014)

The field known as computer science (or informatics) was established
in the mid-60s.  There were many pioneers who established the basic
principles.  By the 1970s, the principles were fairly well known.    (015)

But the PCs of the 1980s brought a new generation of kiddies who
had no training in any of the hard-won results of the '60s and '70s.    (016)

Those people you met in 1987 either (a) belonged to the younger and
stupider generation or (b) belonged to the older generation that had
never studied computer science.  In my 30 years at IBM, I met plenty
of both -- along with some who had invented many of the ideas.    (017)

John    (018)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (019)

<Prev in Thread] Current Thread [Next in Thread>