Re: [ontolog-forum] Unit testing and usability validation of schemas and

To:	ontolog-forum@xxxxxxxxxxxxxxxx
From:	John Bottoms <john@xxxxxxxxxxxxxxxxxxxx>
Date:	Tue, 21 May 2013 14:52:50 -0400
Message-id:	<519BC282.2010109@xxxxxxxxxxxxxxxxxxxx>

To:

ontolog-forum@xxxxxxxxxxxxxxxx

From:

John Bottoms <john@xxxxxxxxxxxxxxxxxxxx>

Date:

Tue, 21 May 2013 14:52:50 -0400

Message-id:

<519BC282.2010109@xxxxxxxxxxxxxxxxxxxx>

On 5/21/2013 2:05 PM, David Eddy wrote:

John -

On May 21, 2013, at 10:20 AM, John Bottoms wrote:

With the most complex data sets I've worked on, which is on the order to 150 million points of dirty data,

So what does the enterprising Big Data Scientist do with so much suspect data?

Clean it up? Smooth out the statistical anomalies? Cross their fingers?

Deddy,

The cleanup for that project was based on an informed understanding of the types of errors. Some came from the Scantron machine that read the data sheets, and some came from incorrect input from the users. In developing metrics the outliers offer little value in some cases. Sometimes the data is graphed and the type of graph used is important. Sometimes "rule-of-thumb" metrics are used, but you have to know when they are validly usable. At times, an estimation calculation is done first and then used in the statistical analysis.

The statisticians and psychometricians have an almost intuitive feel for how to deal with dirty data. It is a part of BigData that has not been addressed sufficiently yet.

-John Bottoms
FirstStar Systems
Concord, MA USA

__________________

David Eddy

deddy@xxxxxxxxxxxxx

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread]	Current Thread	[Next in Thread>
[ontolog-forum] Unit testing and usability validation of schemas and ontologies, Osorno, Marcos Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, John Bottoms Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, Osorno, Marcos Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, John Bottoms Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, David Eddy Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, John Bottoms <= Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, John Bottoms Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, Matthew West Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, Aldo Gangemi Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, Osorno, Marcos

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:	Re: [ontolog-forum] What is the role of an upper level ontology?, doug foxvog
Next by Date:	Re: [ontolog-forum] Consensus on labeling of relationships?, doug foxvog
Previous by Thread:	Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, David Eddy
Next by Thread:	Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, John Bottoms
Indexes:	[Date] [Thread] [Top] [All Lists]

Previous by Date:

Re: [ontolog-forum] What is the role of an upper level ontology?, doug foxvog

Next by Date:

Re: [ontolog-forum] Consensus on labeling of relationships?, doug foxvog

Previous by Thread:

Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, David Eddy

Next by Thread:

Re: [ontolog-forum] Unit testing and usability validation of schemas and ontologies, John Bottoms

Indexes:

[Date] [Thread] [Top] [All Lists]