ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] FW: Looking to the Future of Data Science - NYTimes.

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: John Bottoms <john@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 31 Aug 2014 07:57:28 -0400
Message-id: <54030DA8.6070901@xxxxxxxxxxxxxxxxxxxx>
JohnS,    (01)

You wrote:
"Data whose size N (in bytes) is so large that any algorithm that takes 
time that is polynomial in N (for any exponent greater than 1) is 
prohibitively expensive with existing hardware."    (02)

This is what I was responding to. What you say is true and a good place 
to start.The primary issue is not with the size of the data, it is more 
of an architectural question. Very few Big Data systems collect all the 
data in a short time. It is more an aggregation across a long time and 
the data is being aggregated into a large data set.    (03)

There are exceptions such as the terabyte level satellite systems. But 
even with those the analysis takes much longer than the data collection 
time. I think we all agree that when the data fills an ocean, we can't 
drink the ocean dry. Many systems can deal with large data by indexing 
dynamically across time. That has proven successful for web information 
systems.    (04)

I am more concerned with lack of expertise in the Big Data systems. I 
have yet to see a big data system that really justified designs beyond 
our current level of expertise. To me Big Data is a successful marketing 
term with very little published rationale. And I am unconvinced that it 
should be part of a curriculum beyond a decent and balanced discussion 
as part of systems design. Perhaps someone here could set me straight on 
this.    (05)

For a "hard science" CS seems to have more than its share of myths. But 
let's be careful about calling things myths. All of what we do is based 
on myths that have grown up in CS. But we hope that we keep the good 
myths, adopt the new myths that are useful and discard the old myths 
that are not as useful. That should be an academic discussion, but 
marketing terms have started to leak into the ivory towers and technical 
conferences.    (06)

I cringe when I hear high school students talking about "majoring in Big 
Data" when they have never had an introduction to what it is. They just 
know that term is "getting a lot of interest" so there must be an 
opportunity there.    (07)

We also have a fair share of unproductive, blocking myths in AI. As 
Easterly points out in "The Tyranny of Experts", there are high level 
experts that impose very poor decisions through their positions and 
roles as experts. There may be a justification for that as part of 
diplomacy but we have questionable decisions supported at times by academia.    (08)

I appreciate the work DougL has done with Cyc and I understand his 
dilemma with his work going forward. What I don't understand is the 
justification of large investments in AI systems that do not have a 
strong foundation. The Fifth Gen work was another similar system. 
Perhaps this is part of "Learning by Doing", but we should all beware of 
how theses systems come into existence. We had a similar story in the 
early days of CD-ROMs.    (09)

One company sucked up and blew through $26M in capital with little to 
show for it other than nice offices. We should have a discussion about 
the myth making and myth breaking process as part of every core 
undergrad curriculum. My experience has been just the opposite. In one 
well-revered college Cambridge CS classroom we were informed that the 
best metric for a data system was "whatever the customer believes it to be".    (010)

When I survey the marketing outreach current in CS and try to project 
forward to work with ontologies, I think we are in for a rough time. 
Likewise, as we get AI into robots and large db systems we may be 
offering corporations an opportunity to present a new face on how 
ontologies "really work" that is more marketing than effective design. 
We are already behind the 8-ball because we have not put together an 
effective presentation covering ontologies and they are not discussed in 
schools.    (011)

-John Bottoms    (012)

On 8/31/2014 2:33 AM, John F Sowa wrote:
> On 8/30/2014 11:39 PM, John Bottoms wrote:
>> I disagree.
> I have no idea what you're disagreeing with.  I basically agree with
> what you wrote, and I can't see anything that is inconsistent with
> my recommended definition:
>
> To repeat:
>> I suggest a very simple definition for Big Data:
>>
>> Data whose size N (in bytes) is so large that any algorithm that
>> takes time that is polynomial in N (for any exponent greater than 1)
>> is prohibitively expensive with existing hardware.
> JB
>> When I was developing our browser in '87 I was told absolutely,
>> by known experts, that it was not possible to search large documents
>> in a reasonable time (3 seconds in those days).The argument was
>> that text could only searched using relational database tables and
>> that was too slow to make it usable.
> Clearly, those so-called experts didn't know what they were talking
> about.  But it's irrelevant to the definition I proposed.  Fundamental
> principles:
>
>    1. With indexes, you can find what you're looking for in (log N) time.
>
>    2. But to create the indexes, you need algorithms that take no more
>       than (N log N) time.
>
>    3. For finding data, linear searches are bad.  For creating indexes,
>       polynomial time is hopelessly inefficient.
>
>    4. There's always room for innovation in finding (log N) algorithms
>       for searching more complex data in more flexible ways.
>
> JB
>> We adopted B-trees and that has become one of the mainstays, along
>> with B+ trees, Red/Black trees and Page Rank, of search today.
> Of course.  Those algorithms are logarithmic.  With any kind of
> hardware, you need logarithmic algorithms to search indexed data.
> And you must have no worse than (N log N) to create the indexes.
>
> JB
>> This was just one of the myths that made education about search
>> difficult at the time.
> Back in the 1960s, Donald Knuth was very clear about these issues.
> Anybody who had studied Knuth could never have made the kind of
> claims you're talking about.
>
> JB
>> But the point I would like to make is that creative use of data
>> structures and architectures can be useful when algorithmic solutions
>> can't be found. We have always had a practice in CS of dealing with
>> large data; I date my entry into the discipline at 1964.
> The field known as computer science (or informatics) was established
> in the mid-60s.  There were many pioneers who established the basic
> principles.  By the 1970s, the principles were fairly well known.
>
> But the PCs of the 1980s brought a new generation of kiddies who
> had no training in any of the hard-won results of the '60s and '70s.
>
> Those people you met in 1987 either (a) belonged to the younger and
> stupider generation or (b) belonged to the older generation that had
> never studied computer science.  In my 30 years at IBM, I met plenty
> of both -- along with some who had invented many of the ideas.
>
> John
>   
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>   
>    (013)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (014)

<Prev in Thread] Current Thread [Next in Thread>