ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] FW: Looking to the Future of Data Science - NYTimes.

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Fitch, Dale K." <fitchd@xxxxxxxxxxxx>
Date: Sun, 31 Aug 2014 16:51:18 +0000
Message-id: <1E593BE8FDD2AD47A40EDCDA7066AA17D41BEFAD@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Is it at all helpful to more precisely identify what is contributing to the 
bigness in big data? In my understanding there are three components:
1. Instances of attributes - what most often gets mentioned with references to 
millions of rows of data
2. The number of attributes - some times these millions of rows of data pertain 
to only a few dozen columns
3. The 'thing' these attributes are describing - it seems to me this 'thing' is 
addressed less often    (01)

If we are to learn from the past:
1. Computing power will continue to increase making the number of instances a 
trivial matter
2. Our ability to model the increasing number of attributes will continue to 
evolve as the statistical and computer sciences pursue mutually beneficial 
goals through new algorithms
3. Which brings me back to the 'thing,' which I believe lies under the purview 
of logicians and our ability to understand the world    (02)

Dale
________________________________________
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx 
[ontolog-forum-bounces@xxxxxxxxxxxxxxxx] on behalf of John F Sowa 
[sowa@xxxxxxxxxxx]
Sent: Sunday, August 31, 2014 1:33 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] FW: Looking to the Future of Data Science - 
NYTimes.com - 2014.08.27    (03)

On 8/30/2014 11:39 PM, John Bottoms wrote:
> I disagree.    (04)

I have no idea what you're disagreeing with.  I basically agree with
what you wrote, and I can't see anything that is inconsistent with
my recommended definition:    (05)

To repeat:
> I suggest a very simple definition for Big Data:
>
> Data whose size N (in bytes) is so large that any algorithm that
> takes time that is polynomial in N (for any exponent greater than 1)
> is prohibitively expensive with existing hardware.    (06)

JB
> When I was developing our browser in '87 I was told absolutely,
> by known experts, that it was not possible to search large documents
> in a reasonable time (3 seconds in those days).The argument was
> that text could only searched using relational database tables and
> that was too slow to make it usable.    (07)

Clearly, those so-called experts didn't know what they were talking
about.  But it's irrelevant to the definition I proposed.  Fundamental
principles:    (08)

  1. With indexes, you can find what you're looking for in (log N) time.    (09)

  2. But to create the indexes, you need algorithms that take no more
     than (N log N) time.    (010)

  3. For finding data, linear searches are bad.  For creating indexes,
     polynomial time is hopelessly inefficient.    (011)

  4. There's always room for innovation in finding (log N) algorithms
     for searching more complex data in more flexible ways.    (012)

JB
> We adopted B-trees and that has become one of the mainstays, along
> with B+ trees, Red/Black trees and Page Rank, of search today.    (013)

Of course.  Those algorithms are logarithmic.  With any kind of
hardware, you need logarithmic algorithms to search indexed data.
And you must have no worse than (N log N) to create the indexes.    (014)

JB
> This was just one of the myths that made education about search
> difficult at the time.    (015)

Back in the 1960s, Donald Knuth was very clear about these issues.
Anybody who had studied Knuth could never have made the kind of
claims you're talking about.    (016)

JB
> But the point I would like to make is that creative use of data
> structures and architectures can be useful when algorithmic solutions
> can't be found. We have always had a practice in CS of dealing with
> large data; I date my entry into the discipline at 1964.    (017)

The field known as computer science (or informatics) was established
in the mid-60s.  There were many pioneers who established the basic
principles.  By the 1970s, the principles were fairly well known.    (018)

But the PCs of the 1980s brought a new generation of kiddies who
had no training in any of the hard-won results of the '60s and '70s.    (019)

Those people you met in 1987 either (a) belonged to the younger and
stupider generation or (b) belonged to the older generation that had
never studied computer science.  In my 30 years at IBM, I met plenty
of both -- along with some who had invented many of the ideas.    (020)

John    (021)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (022)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (023)

<Prev in Thread] Current Thread [Next in Thread>