ontolog-forum
[Top] [All Lists]

[ontolog-forum] FW: FW: Looking to the Future of Data Science

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Rich Cooper" <rich@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 2 Sep 2014 19:54:31 -0700
Message-id: <038901cfc722$63279e30$2976da90$@englishlogickernel.com>

“Whoops”

               -Rick Perry.

 

I forgot to put Hans’ email body in line – forgot to include it in the last post.

 

-Rich

 

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

From: Hans Polzer [mailto:hpolzer@xxxxxxxxxxx]
Sent: Tuesday, September 02, 2014 7:39 PM
To: 'Rich Cooper'
Subject: FW: [ontolog-forum] FW: Looking to the Future of Data Science

 

Rich,

 

Not sure why my email response to the ontolog forum keeps getting rejected by the ontolog mail server as spam, so I am sending my email directly to you.

 

Hans

 

From: Hans Polzer [mailto:hpolzer@xxxxxxxxxxx]
Sent: Tuesday, September 02, 2014 10:37 PM
To: [ontolog-forum] (ontolog-forum@xxxxxxxxxxxxxxxx)
Subject: FW: [ontolog-forum] FW: Looking to the Future of Data Science

 

Rich,

 

It’s not just recognizing new information obscured by system/schema complexity, it’s also about recognizing information resulting from correlations across multiple data sources that are not otherwise integrated or managed as a single entity for some set of purposes. Of course, such discovery has elements of uncertainty and incompleteness precisely because such sources are not integrated or managed as a single entity.  There is also the issue of data capture context and purpose, which impacts the potential utility and validity of inferences drawn from accessing such data for other purposes. Much of the anticipated benefit from accessing big data is from drawing inferences and insights that were not part of the original context for capturing and analyzing the data in its original source systems.

 

This also points out yet another concern/challenge with big data that impacts the scalability and computability aspects that have been much discussed in previous posts on this topic.  In most big data examples I have seen, the data needs to be accessed “in situ”, usually over long-haul network connections and in various data source formats and access methods. In other words, you don’t get to specify and implement the access methods or hardware resources applied to the problem, except maybe at the point at which you have extracted information over the network connections and created your own data set(s) from the data sources.

 

Hans Polzer

 

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Rich Cooper
Sent: Tuesday, September 02, 2014 4:03 PM
To: '[ontolog-forum] '
Subject: Re: [ontolog-forum] FW: Looking to the Future of Data Science - NYTimes.com - 2014.08.27

 

EJB:>  What is wanted is not a paradigm shift in processing technology – the last two paradigm shifts got us XML databases and XQuery and RDF triple stores, both of which are clumsy repositories that just make the Big Data problem more expensive. 

 You state three items, “both of which” are clumsy.  Actually, the first item, XML, has been a very useful method for communicating within N-tier systems.  It has great value there but is usually converted into the tables, columns and domains of RDBs where the info gets stored.  So XML is not a problem for most systems.  There are even free XML parsers which have been packaged as components for programmers to call so they don’t have to do the parsing themselves.  It has been very, very useful for multiple system interchanges of data. 

EJB:> What is wanted (as Michael Brunnbauer hinted) is a paradigm shift in data acquisition mindset.  I will paraphrase some other contribution to this exploder, which I have since lost:  “If you don’t know what you have when you get it, you will never know it later.”  

Wrong!!!!  The whole point of discovery systems is in recognizing new information that was in the database, but which is obscured from the obvious observers due to the complexity of typical systems today.  You don’t know what it is in advance; you can only discover it through analysis. 

 

The stuff that is already known to be in the database can just be queried.  But bringing out the full range of relationships, which are NOT KNOWN uniquely in the data model, can be found through discovery processes. 

 

See http://www.EnglishLogicKernel.com/ElkForPatents.html for an example of the kinds of things that can be discovered from relational databases containing both structured and unstructured columns, as in the USPTO database of patents. 

 

-Rich

 

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Kingsley Idehen
Sent: Tuesday, September 02, 2014 10:34 AM
To: ontolog-forum@xxxxxxxxxxxxxxxx
Subject: Re: [ontolog-forum] FW: Looking to the Future of Data Science - NYTimes.com - 2014.08.27

 

On 9/2/14 11:25 AM, Barkmeyer, Edward J wrote:

I regret to say, I think this definition is about buzzword maintenance.  The idea is clearly:  Big Data is about inventing a new information processing technology that will work better for datasets that RDB technology just can’t handle – “a paradigm shift” in technology. 

 

What is wanted is not a paradigm shift in processing technology – the last two paradigm shifts got us XML databases and XQuery and RDF triple stores, both of which are clumsy repositories that just make the Big Data problem more expensive. 

 

What is wanted (as Michael Brunnbauer hinted) is a paradigm shift in data acquisition mindset.  I will paraphrase some other contribution to this exploder, which I have since lost:  “If you don’t know what you have when you get it, you will never know it later.”  

 

There is a big difference between large volumes of data that must be maintained in order to perform a particular set of business or governmental functions and responsibilities, and large volumes of data that are available and might enable some analytical process that is at best desirable.  Amazingly enough, we have muddled through the support of the former for 50 years with established technologies and state of the art computational resources, and newer technologies have become established as the quality of the implementations and the resources for supporting them became able to carry the increasing load.  We have been able to do this by working around the limitations to deliver satisfactory, if less than ideal, services somehow.  As John Sowa and others have said, this is a recurring problem; it is not a new problem.

 

The problem we have is with our appetite.  There is so much information food out there that we could surely find the taste treats for the most discriminating palates if we could just search it all fast enough.  That is all very exciting, but it is irrelevant to solving the problem of delivering to everyone his daily information bread.  The problem is in focusing on what we need to process, not what we would like to process.  The people who are concerned about data they need to process in order to deliver adequate services and products are experiencing the 2014 version of the 1960 problem.  The rest are just blowing Big Data horns.

 

The would-be ISO definition fails to say: 

Big Data: a data set(s) with characteristics that for *a required function* at a given point in time cannot be efficiently processed using current/existing/established/traditional technologies and techniques in order to *provide adequate support for that function*.

 

It is not about an arbitrary “particular problem domain” or being able to “extract [some perceived] value”.   That is an academic view, and why we have research institutions.

 

-Ed


Ed,

Great addition to this evolving conversation. Naturally, I've incorporated your comments into the "Big Data" description that I am maintaining:

[1] http://linkeddata.uriburner.com/describe/?url=""> -- without the effect of owl:sameAs relation reasoning and inference

[2]
http://linkeddata.uriburner.com/describe/?url=""> -- with the effect of owl:sameAs relation semantics reasoning and inference

[3]
https://plus.google.com/112399767740508618350/posts/79nHeum5DQR -- how I am using G+ post based nanotations to fit the pieces of this puzzle together, as I encounter new and interesting insights

[4] https://plus.google.com/112399767740508618350/posts/MRsyNtqgTXz -- ditto in regards to comments by John Sowa .


Related:

[1] http://kidehen.blogspot.com/2014/07/nanotation.html -- about Nanotation
[2] https://twitter.com/kidehen/status/506813897043881984 -- Tweet related to paradigm shift re. data acquisition (i.e., RDF sentence based Nanotations that fit into place where text exists) .

-- 
Regards,
 
Kingsley Idehen       
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>
  • [ontolog-forum] FW: FW: Looking to the Future of Data Science, Rich Cooper <=