Regrettably, versions of this have been bandied about the manufacturing industries and others for the last 5 years. What is not obvious from this 15th
hand excerpt is the content of the original Forbes (variously Business Week) report. The BIG problem for 80+% of the companies reporting is erroneous customer addresses and erroneous customer contact information. The consequence is returned shipments and
invoices, lost business, etc. The second big source of problems in the supply-chain is incorrect identifiers for shipments, parts, etc., which results in incorrect hazards coding, incorrect customs coding, etc., which in turn produce long delays in shipments
and emergency shipments and other serious costs. The third major source is incorrect, often out-of-date, engineering data, which results in shipments of parts that aren’t what you expected. After we get past the first 80%, which is largely a consequence
of humans getting it wrong at the outset, the next two big problem areas can be addressed primarily by automating the interchanges and getting the human copiers out of the loop. Most of this latter problem is about “spreadsheet data management” and one-off
“data management for X” systems that can’t export anything any other system can read.
There is nothing subtle here. It is 99% about getting business practice to catch up to 1990. These are big problems for middle-size databases. The BIG Data
problems are an entirely different matter, and we need not to confuse them, even though media hype does. Nobody’s customer records database has the Social Security Administration’s problems.
What we did try to do with semantic technologies, that proved to be a hard problem, is capture a lot of transactions and compare the data/assertions in them
with a nominally correct knowledge base, thereby identifying miscopying and miscoding events before they caused serious trouble. But the logic technologies of 2010 at least were not up to maintaining the nominally correct knowledge base in the face of actual
change in state, or doing the comparison of ‘assertions’ (that might be false) with the KB very well, and the logistical problem of capturing the transactions is major. (An automotive manufacturer has 30000 supply-chain transactions a day.) One of our industry
evaluators characterized the project as “mammoth tooth tower” – only an incredibly wide-eyed academic would even try something like that!
Edward J. Barkmeyer Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Systems Integration Division
100 Bureau Drive, Stop 8263 Work: +1 301-975-3528
Gaithersburg, MD 20899-8263 Mobile: +1 240-672-5800
"The opinions expressed above do not reflect consensus of NIST,
and have not been reviewed by any Government authority."
From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Kingsley Idehen
Sent: Wednesday, September 11, 2013 6:25 PM
Subject: [ontolog-forum] Getting The Dirt On Big Data
Here's an article  by Jim Hendler and Karen Waterman that I believe many on this list will find interesting.
An interesting excerpt:
" A 2010 survey by Forbes of more than 200 global enterprises revealed that more than a third believed they were losing more than $10 million per year because of bad data.
In 2011, an industry executive produced a rough estimate that U.S. business losses due to bad data were now
*$3 trillion* per year,
more than double the federal deficit for that year.
 http://online.liebertpub.com/doi/pdfplus/10.1089/big.2013.0026 .
Founder & CEO
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen