OpenOntologyRepository: "OOR for Big Data" Workshop-I - Tue 2012_08_14    (3DSM)

Topic: "OOR for Big Data" - Brainstorm Session    (3DTW)

Session Chair: MikeDean (OOR; Raytheon-BBN)    (3DTX)

Archives:    (3DTY)

Conference Call Details:    (3DSP)

Attendees    (3DT7)

Agenda Ideas:    (3DTH)

please insert any additional items below (along with your name for follow-up purposes)    (3DTI)

Abstract:    (3DUE)

Topic: "OOR for Big Data" - Brainstorm Session    (3DUF)

This is a following up session from the presentation our team made on the case of "Leveraging OOR in Big Open Data" at the virtual panel session on ConferenceCall_2012_05_17.    (3DSN)

We will use this workshop to strategize and brainstorm on how we should approach this very important initiative going forward.    (3DSO)

The plan for this session is to use the time of this session as fairly open discussion about use of OOR for Big Data. There will not be formal presentations (we are still a bit early for that). The chair will try to seed some discussion topics.    (3DUH)

Agenda:    (3DUI)

Topic: "OOR for Big Data" - Brainstorm Session    (3DUJ)

Proceedings:    (3DUX)

Please refer to the above    (3DUY)

IM Chat Transcript captured during the session:    (3DUZ)

 see raw transcript here.    (3DV0)
 (for better clarity, the version below is a re-organized and lightly edited chat-transcript.)
 Participants are welcome to make light edits to their own contributions as they see fit.    (3DV1)
 -- begin in-session chat-transcript --    (3DV2)
	[08:22] PeterYim: Welcome to the    (3E0G)
	 = "OOR for Big Data" Workshop-I - Tue 2012_08_14  =    (3E0H)
	Topic: "OOR for Big Data" Brainstorm Session    (3E0I)
	Session Chair: Mike Dean (OOR; Raytheon-BBN)    (3E0J)
	Session page: http://ontolog.cim3.net/cgi-bin/wiki.pl?OOR/ConferenceCall_2012_08_14    (3E0K)
	Mute control: *7 to un-mute ... *6 to mute    (3E0L)
	Can't find Skype Dial pad?
	* for Windows Skype users: Can't find Skype Dial pad? ... it's under the "Call" dropdown menu as "Show Dial pad"
	* for Linux Skype users: please stay with (or downgrade to) Skype version 2.x for now 
	  (as a Dial pad seems to be missing on Linux-based Skype v4.x for skype-calls.)    (3E0M)
	 == Proceedings: ==    (3E0N)
	[08:37] PeterYim: == MikeDean starting the session off with his intro slides ...    (3E0O)
	[08:41] ToddSchneider: [ref. Mike's slides #4 & 5] How do these supporting technologies fit into 
	overall IT/data architectures?    (3E0P)
	[08:47] PeterYim: == Q&A and Open Discussion ...    (3E0Q)
	[08:48] anonymous morphed into ElizabethFlorescu    (3E0R)
	[08:49] TerryLongstreth: Mike mentioned RDF from SQL productions; Terry asked if that included 
	capturing the logic from constraints, triggers (Mike added views).    (3E0S)
	[08:51] MikeBennett: How do you manage the relationship between the logical data model which 
	corresponds to the (e.g. SQL) data source, and the ontology of the domain. How do we distinguish an 
	ontology which is an RDF model of a data model design, versus an ontology of the real things?    (3E0T)
	[08:53] MikeBennett: That is, how to formalize the model theoretic relationship between the elements 
	in the RDF model, and the things which they represent (data elements v things).    (3E0U)
	[08:54] PeterYim: our opportunity (and challenge, of course) could be in: taking where the Linked 
	Data people leave off (say, having extracted a vocabulary on the dataset) and come up with the 
	ontology and the value added services that OOR is poised to provide ... of course, we should look at 
	starting from the dataset as well (not just from the vocabulary)    (3E0V)
	[08:58] MikeBennett: Suggestion: in extracting information from large datasets, is there a role for 
	an "Ontology of the data" and a clear way of distinguishing this from the ontology of the subject 
	matter itself?    (3E0W)
	[09:01] MichaelGruninger: @MikeBennett: Do you see a role for an "ontology of the data"? What kinds 
	of concepts and relations do you see in such an ontology? Would it be domain-independent?    (3E0X)
	[09:04] MikeBennett: This is what I'm wondering. It's all too easy for someone to take a logical 
	model (which has been designed, rightly), converting it into RDF/OWL and saying "Lo, an ontology". I 
	think there must be a role for it in data extraction, but also a clear difference to ontologies of 
	the subject matter. With perhaps an RDF/OWL mapping ontology between the two?    (3E0Y)
	[09:06] MikeBennett: @Michael PS I suspect that the basic concepts such a model would be built from, 
	i.e. the top level classes, might be the constructs that exist in the data modeling language, e.g. 
	data table, join table, UML class, UML attribute and so on? I don't know, I'm still ruminating on 
	this.    (3E0Z)
	[09:03] PeterYim: candidate data that we might get our hands on and communities we can collaborate 
	with are probably in the domains of: (i) government data; (ii) geospatial / geo-science; (iii) 
	biomedical (iv) standards    (3E10)
	[09:05] PeterYim: a mini-series is coming up shortly - see: 
	http://ontolog.cim3.net/cgi-bin/wiki.pl?EarthScienceOntolog ... the kick-off session for the series 
	is coming up next week (Thu 2012.08.23)    (3E11)
	[09:06] TerryLongstreth: Big data characterized by Size, by Complexity, by SxC?    (3E12)
	[09:08] TerryLongstreth: A characteristic of big systems is the fuzziness of the boundaries; does 
	the same hold for big data?    (3E13)
	[09:10] ToddSchneider: Ken, when the phrase 'big data' is used I haven't seen qualification to the 
	number of sources. Perhaps there's an implicit assumption of a single source.    (3E14)
	[09:11] KenBaclawski: When we talk about Big Data and OOR, there are two different problems: a 
	relatively small number of large complex ontologies (as in BioPortal which has ontologies with 
	millions of concepts) or a very large number (on the order of millions) of relatively simple 
	ontologies (e.g., the fields of a CSV file).    (3E15)
	[09:19] ToddSchneider: Ken, Okay there may be performance issues when the number of ontologies 
	becomes large.    (3E16)
	[09:18] TerryLongstreth: Challenge for OOR would seem to be more towards complexity rather than size 
	of a Big Data environment    (3E17)
	[09:20] MikeBennett: Open linked data marketing issue: so all these governments that are putting out 
	RDF linked data on the basis of the benefits of semantics: are they aware of the available 
	ontologies (per OOR) so that they start to use existing ontologies as the conceptual model from 
	which to structure new and future open linked data outputs.    (3E18)
	[09:20] MikeBennett: This also requires an awareness of the role of an ontology AS a conceptual 
	model - many folks are simply not aware of basic top down modeling best practice.    (3E19)
	[09:22] MichaelGruninger: Two different kinds of applications of OOR: 1) using ontologies within OOR 
	together with existing data sets e.g. integration, decision support. 2) using ontologies within OOR 
	to design new data sets or redesign existing data sets    (3E1A)
	[09:27] MikeDean: As part of "moving to the cloud", many organizations are moving from relational 
	databases to map/reduce frameworks such as Hadoop. Can ontologies be used to assist with or guide 
	such migration?    (3E1B)
	[09:30] ToddSchneider: Have to go. Cheers.    (3E1C)
	[09:37] KenBaclawski: Big Data also involves format conversion issues as well as semantics 
	(e.g., integer vs floating point, image formats).    (3E1D)
	[09:39] MikeBennett: OOR: would not include metadata about data formats (how could it?) but there is 
	a role for some demonstration of description of how to use this stuff, as part of the usage of OOR 
	in that scenario.    (3E1E)
	[09:39] KenBaclawski: Clarify these terms: data registry, data repository, ontology registry, 
	ontology repository.    (3E1F)
	[09:40] MikeBennett: Also clarify how these different moving parts may be framed in terms of the 
	Zachman Framework or some similar formal development framework.    (3E1G)
	[09:40] TerryLongstreth: The loss of meaning in data conversions should be part of the ontology    (3E1H)
	[09:41] TerryLongstreth: ...perhaps a separate micro-theory about data formats and lossy vs. 
	lossless conversions    (3E1I)
	[09:44] MikeBennett: Incompatible data sources: what ontologies are for.    (3E1J)
	[09:47] MikeDean: http://icpsr.umich.edu is a large repository of social science data    (3E1K)
	[09:47] KenBaclawski: I suggest as an action item: capture the use cases we just identified.    (3E1L)
	[09:48] MikeDean: Most domains/projects seem to have their own data registries    (3E1M)
	[09:58] TerryLongstreth: Are we agreed that Big Data for OOR is primarily an issue of complexity of 
	the target relationships?    (3E1N)
	[10:00] TerryLongstreth: Apparently no consensus yet    (3E1O)
	[10:01] MikeBennett: Or is it "Data that is Big" v "Big data architecture" i.e. Hadoop / Mapreduce. 
	Both seemed relevant according to today's conversation.    (3E1P)
	[10:04] PeterYim: as long as we are not arguing about what "Big Data" is ... and we know our 
	objective is to tackle how OOR can "serve" Big Data, I think we are fine (at least for now; until we 
	need to tackle at finer granularity)    (3E1Q)
	[10:12] PeterYim: assuming we have consensus on putting a focus on "OOR for Big Data" (along with 
	what we have been doing so far with OOR), we might want to: (a) identify the data, use case(s) and 
	one or two community(ies) we want to work closely with, to take this effort to the next level, (b) 
	figure out what we need to be doing differently (from this point onwards) for OOR    (3E1R)
	[10:21] PeterYim: Following event dates confirmed    (3E1S)
	(ref. http://ontolog.cim3.net/cgi-bin/wiki.pl?OOR/ConferenceCall_2012_08_07#nid3DJ5 ... and other events already scheduled)
	(... incorporating consensus arrived with email exchanges after the session as well) 
	* OOR Funding-I - Chair: KenBaclawski - "rethink strategy" - Tue 2012_08_21
	* No meeting on Tue 2012_08_28
	* OOR regular monthly team meeting - Tue 2012.09.04
	* OOR Architecture & API Workshop-XIII - Co-chairs: KenBaclawski & ToddSchneider - "use cases" - Tue 2012_09_11
	* OOR Content Workshop-IV - Co-chairs: MichaelGruninger and MikeDean - "Capturing FOIS Ontology Content" - Tue 2012_09_18
	* No meeting on Tue 2012_09_25
	* No meeting on Tue 2012_10_02
	* OOR regular monthly team meeting - Tue 2012.10.09
	* OOR Metadata Workshop-VIII - Chair: MichaelGruninger - will do "Mapping" and/or "metadata in OOR for Big Data" - Tue 2012_10_16
	* OOR Code Development-IX - Chair: MikeDean - Tue 2012_10_23
	* OOR Infrastructure-II - Chair: PeterYim - tba (will schedule this later; possibly after 
	                                                 the next major release of the BioPortal vm appliance)    (3E1T)
	[10:04] PeterYim: great session!    (3E1U)
	[10:05] PeterYim: -- session ended: 10:03am PDT --    (3E1V)
 -- end of in-session chat-transcript --    (3DV3)

Additional Resources:    (3DV8)


For the record ...    (3DVM)

How To Join (while the session is in progress)    (3DVN)