ontology-summit
[Top] [All Lists]

Re: [ontology-summit] Clarification re Big Data Challenges Synthesis

To: Ontology Summit 2012 discussion <ontology-summit@xxxxxxxxxxxxxxxx>
From: Simon Spero <sesuncedu@xxxxxxxxx>
Date: Mon, 2 Apr 2012 17:38:27 -0400
Message-id: <CADE8KM57xDtX+Gprv=0v4g=h-ATO6TP0o2CRsFWw1-WU9asFGw@xxxxxxxxxxxxxx>
I have to agree that the claims must be interpreted contextually, since, to the extent I understand them, as universal statements they are false.  

  1. It is possible for something to be computationally expensive, yet have minimal or no effect on latency or throughput.  Obvious examples are: 
    1. where the ontological _expression_ is used to identify the semantics of a value within a scientific dataset, and is used only for matching;
    2. where it  is used to generate a streaming transformation kernel at the start of the run (or other cases where the computation is only performed once); for example generating partially evaluating kernels using a JIT for execution on one or more GPUs (    
    3. Where the time for computation is masked by other latency;
    4. Where the job is limited by I/O; the more expressive representation is more compact than the less expressive, and the time saved by the reduction in I/O is less than the cost of extra computation
  2. I cannot properly interpret "Higher expressivity often involves more than one piece of information from the abox – meaning you have to cross server boundaries. With lower expressivity you can replicate the ontology everywhere on the cluster and answer questions LOCALLY." 
    1. If you only need one piece of information from the a-box, it's big data, not big datum. 
    2. I cannot see how this necessarily requires crossing server boundaries. 
    3. I also cannot see how lower or higher  expressivity necessarily bears on the ability to replicate the ontology. If the more expressive form is more compact than the less expressive form, then the opposite may be true.
    4. If the  t-box is such that it cannot fit on a node for scientific data processing, then it's either a problem with the t-box, a problem of too-limited expressivity,  or a problem with the problem.  
    5. Libraries and formats for dealing with scientific data sets, such as HDF5  and MPI are designed to work with slices and and chunks of large sets. If the problem of expressivity is that it fails to prevent captured or generated data from being represented, then the usual code will handle it in stride (hah). 
    6. Big scientific  datasets are typically rather homogenous.  Very little raw data generated by Herschel SPIRE has associated pizza toppings. Also, note that the Level-0 output is pure sensor data. By combining this raw data with calibration data (plus a little theory), the raw data is converted into physical units not specific to this instrument. 
Simon


On Mon, Apr 2, 2012 at 3:34 PM, Ali SH <asaegyn+out@xxxxxxxxx> wrote:
Hello all,

I want to direct some attention to this segment on the Big Data Challenges synthesis page.

3. Many times people try to have both expressivity and scale. This is very expensive    (38G5)

Don’t be seduced by expressivity    (38G6)

* Just because you CAN say it doesn’t mean you SHOULD say it. Stick to things that are strictly useful to building your big data application.    (38G7)

Computationally expensive    (38G8)

* Expressivity is not free. It must be paid for either with load throughput or query latency, or both.    (38G9)

Not easily partitioned    (38GA)

* Higher expressivity often involves more than one piece of information from the abox – meaning you have to cross server boundaries. With lower expressivity you can replicate the ontology everywhere on the cluster and answer questions LOCALLY.    (38GB)

A little ontology goes a long way    (38GC)

* There can be a lot of value just getting the data federated and semantically aligned.    (38GD)

As noted in last Friday's organizing committee, this section may elicit a lot of comments. We should address this prior to the cut-off for Communique revisions.

My interpretation of the above is that the claim is contextually. It is certainly true that in some cases a small amount of machine readable semantics can go a long way. As noted on bullet (38G7), it really seems to depend on the target application and the underlying value proposition that drives the creation or application of computational ontology to the problem space.

The wording as above focuses on the negatives of increased expressivity, which imo is less constructive than perhaps highlighting the fact that it really should be the intended application and purpose of the ontology that drives the level of required expressivity. Most of the points above would then apply to only those cases where the value proposition and intended ontology applications really only do require limited expressivity. 

Indeed, Leo's slides from the 2007 summit, esp. 20, 25 & 26 say pretty much the same thing: http://ontolog.cim3.net/file/resource/presentation/LeoObrst_20060112/OntologySpectrumSemanticModels--LeoObrst_20060112.ppt

What do others think?

Best,
Ali


_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2012/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2012
Community Portal: http://ontolog.cim3.net/wiki/



_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2012/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2012  
Community Portal: http://ontolog.cim3.net/wiki/     (01)
<Prev in Thread] Current Thread [Next in Thread>