Re: [ontolog-forum] Getting The Dirt On Big Data

To:	"[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From:	"Barkmeyer, Edward J" <edward.barkmeyer@xxxxxxxx>
Date:	Thu, 12 Sep 2013 00:45:32 +0000
Message-id:	<9073cb818c404b43b8b4633f44896504@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

Kingsley,

Regrettably, versions of this have been bandied about the manufacturing industries and others for the last 5 years. What is not obvious from this 15^th hand excerpt is the content of the original Forbes (variously Business Week) report. The BIG problem for 80+% of the companies reporting is erroneous customer addresses and erroneous customer contact information. The consequence is returned shipments and invoices, lost business, etc. The second big source of problems in the supply-chain is incorrect identifiers for shipments, parts, etc., which results in incorrect hazards coding, incorrect customs coding, etc., which in turn produce long delays in shipments and emergency shipments and other serious costs. The third major source is incorrect, often out-of-date, engineering data, which results in shipments of parts that aren’t what you expected. After we get past the first 80%, which is largely a consequence of humans getting it wrong at the outset, the next two big problem areas can be addressed primarily by automating the interchanges and getting the human copiers out of the loop. Most of this latter problem is about “spreadsheet data management” and one-off “data management for X” systems that can’t export anything any other system can read.

There is nothing subtle here. It is 99% about getting business practice to catch up to 1990. These are big problems for middle-size databases. The BIG Data problems are an entirely different matter, and we need not to confuse them, even though media hype does. Nobody’s customer records database has the Social Security Administration’s problems.

What we did try to do with semantic technologies, that proved to be a hard problem, is capture a lot of transactions and compare the data/assertions in them with a nominally correct knowledge base, thereby identifying miscopying and miscoding events before they caused serious trouble. But the logic technologies of 2010 at least were not up to maintaining the nominally correct knowledge base in the face of actual change in state, or doing the comparison of ‘assertions’ (that might be false) with the KB very well, and the logistical problem of capturing the transactions is major. (An automotive manufacturer has 30000 supply-chain transactions a day.) One of our industry evaluators characterized the project as “mammoth tooth tower” – only an incredibly wide-eyed academic would even try something like that! J

-Ed

Edward J. Barkmeyer Email: edbark@xxxxxxxx

National Institute of Standards & Technology

Systems Integration Division

100 Bureau Drive, Stop 8263 Work: +1 301-975-3528

Gaithersburg, MD 20899-8263 Mobile: +1 240-672-5800

"The opinions expressed above do not reflect consensus of NIST,

and have not been reviewed by any Government authority."

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Kingsley Idehen
Sent: Wednesday, September 11, 2013 6:25 PM
To: [ontolog-forum]
Subject: [ontolog-forum] Getting The Dirt On Big Data

All,

Here's an article [1] by Jim Hendler and Karen Waterman that I believe many on this list will find interesting.

An interesting excerpt:

" A 2010 survey by Forbes of more than 200 global enterprises revealed that more than a third believed they were losing more than $10 million per year because of bad data. In 2011, an industry executive produced a rough estimate that U.S. business losses due to bad data were now *$3 trillion* per year, more than double the federal deficit for that year. "

Links:

[1] http://online.liebertpub.com/doi/pdfplus/10.1089/big.2013.0026 .

--

Regards,

Kingsley Idehen

Founder & CEO

OpenLink Software

Company Web: http://www.openlinksw.com

Personal Weblog: http://www.openlinksw.com/blog/~kidehen

Twitter/Identi.ca handle: @kidehen

Google+ Profile: https://plus.google.com/112399767740508618350/about

LinkedIn Profile: http://www.linkedin.com/in/kidehen


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread]	Current Thread	[Next in Thread>
[ontolog-forum] Getting The Dirt On Big Data, Kingsley Idehen Re: [ontolog-forum] Getting The Dirt On Big Data, David Eddy Re: [ontolog-forum] Getting The Dirt On Big Data, Barkmeyer, Edward J <= Re: [ontolog-forum] Getting The Dirt On Big Data, David Eddy

Previous by Date:	Re: [ontolog-forum] Getting The Dirt On Big Data, David Eddy
Next by Date:	Re: [ontolog-forum] Getting The Dirt On Big Data, David Eddy
Previous by Thread:	Re: [ontolog-forum] Getting The Dirt On Big Data, David Eddy
Next by Thread:	Re: [ontolog-forum] Getting The Dirt On Big Data, David Eddy
Indexes:	[Date] [Thread] [Top] [All Lists]