One major issue with Big Data (and any data, for that matter) is the issue of the scope of the data sets, which is usually left implicit and often inferred from external knowledge about the data source. By scope of the data sets I mean what portion of reality does the data set purport to represent. Sometimes the complexity of the data representation in a data set is due to explicit inclusion of scope information, but usually scope is left unspecified in the data representation.

For example, if one is trying to determine air traffic patterns from data sets provided by the various national/regional air traffic authorities or airlines, aside from all the differences in representation and complexity of such data by the different sources, one has to determine what portion of the overall air traffic is captured by the aggregate sources one has access to, and whether there is any overlap among the sources (and what the nature of the overlap might signify with respect to one's objective for accessing the data sources).  Do some of the sources include general aviation traffic or only scheduled commercial (passenger?) traffic. What portion of the world's air traffic (of a particular set of types) do we not have data sources for? Are the time ranges of the data sources compatible with the data access objectives?  Does a particular source include military aircraft traffic? Does it include charters. Does it include Government executive aircraft?

What about helicopter traffic or lighter than air traffic or UAVs? Up and down to what vehicle size ranges? What about sub-orbital or orbital traffic (even if one excludes "space" traffic as not being "air" traffic, space and orbital traffic typically traverses the atmosphere when launched and often returns through the atmosphere)? Are hovercraft considered air traffic? What about gliders, paragliders, and "airsuits", or are we only interested in powered aircraft or fuel-burning aircraft (not all powered craft burn fuel)? Note that there are fuel-burning paragliders. Are rockets/missiles and artillery considered "air" traffic?

When one accesses Big Data for some purpose, what has one really accessed?

Data distorted by qualification using a meaningless buzz-phrase

How big is "Big"?

Reinforcing my comment above.

More importantly, how big a portion of what one is looking for does Big Data represent?


And what can one safely conclude for the purposes at hand, given that scope information (assuming it is available or can be inferred)? I'm not sure this is totally a question of logic.

Correct, it is inevitably illogical. Thanks to the meaningless nature of a classic marketing buzz-phrase [1].

In regards to Air Traffic Control data, I found an ontology from NASA, and then tweaked it with these SPARQL statements:

## Air Traffic Control Ontology

# URI: <http://cdm.arc.nasa.gov/tfm.owl#>
# URL: <http://ti.arc.nasa.gov/m/profile/shawn/tfmontology/tfmBJ1.owl>
 {GRAPH  <http://cdm.arc.nasa.gov/tfm.owl>
             ?s rdfs:isDefinedBy  <http://cdm.arc.nasa.gov/tfm.owl#> .
             <http://cdm.arc.nasa.gov/tfm.owl#> <http://open.vocab.org/terms/defines> ?s.
             <http://cdm.arc.nasa.gov/tfm.owl#> a owl:Ontology .
             ?s <http://www.w3.org/2007/05/powder-s#describedby> <http://ti.arc.nasa.gov/m/profile/shawn/tfmontology/tfmBJ1.owl> .
             <http://ti.arc.nasa.gov/m/profile/shawn/tfmontology/tfmBJ1.owl>  <http://open.vocab.org/terms/describes> ?s .
 {GRAPH <http://ti.arc.nasa.gov/m/profile/shawn/tfmontology/tfmBJ1.owl>
             {?s rdfs:subClassOf ?o}
             {?s rdfs:subPropertyOf ?o}
             {?s owl:equivalentClass ?o}
             {?s owl:equivalentProperty ?o}
             {?s a ?o}

End product:

[1] http://linkeddata.uriburner.com/c/8BPYQ5 -- Tweaked (for easy navigation and exploration) Air Traffic Control Ontology
[2] http://linkeddata.uriburner.com/c/9C6PQJ35 -- A Class from the Ontology .


