OntologySummit2014 session-11 Track-D: Tackling the Variety Problem in Big Data - II - Thu 2014-03-27    (46RL)

Panelists / Briefings:    (4AOP)

Archives:    (4AVG)

Abstract    (4AOK)

Track D: Tackling the Variety Problem in Big Data - II ... intro slides    (4APO)

This is our 9th OntologySummit, a joint initiative by Ontolog, NIST, NCOR, NCBO, IAOA & NCO_NITRD with the support of our co-sponsors.    (4AUX)

Since the beginnings of the Semantic Web, ontologies have played key roles in the design and deployment of new semantic technologies. Yet over the years, the level of collaboration between the Semantic Web and Applied Ontology communities has been much less than expected. Within Big Data applications, ontologies appear to have had little impact.    (4AUY)

This year's Ontology Summit is an opportunity for building bridges between the Semantic Web, Linked Data, Big Data, and Applied Ontology communities. On the one hand, the Semantic Web, Linked Data, and Big Data communities can bring a wide array of real problems (such as performance and scalability challenges and the variety problem in Big Data) and technologies (automated reasoning tools) that can make use of ontologies. On the other hand, the Applied Ontology community can bring a large body of common reusable content (ontologies) and ontological analysis techniques. Identifying and overcoming ontology engineering bottlenecks is critical for all communities.    (4AUZ)

OntologySummit2014 will pose and address the primary challenges in these areas of interaction among the different communities. The Summit activities will bring together insights and methods from these different communities, synthesize new insights, and disseminate knowledge across field boundaries.    (4AVK)

At the Launch Event on 16 Jan 2014, the organizing team has provided an overview of the program, and how we will be framing the discourse. Today's session (OntologySummit2014 session-05) is the first virtual panel session featured by Track-D, which focuses on "Tackling the Variety Problem in Big Data."    (4AV0)

The session today continues the successful first session by examining the many issues that arise for managing variety in enterprises and on the web in general, and more specifically for the data being generated by cities.    (4AV1)

Data governance is "a collection of disciplines that ensure data is managed adequately in an enterprise." Malcolm Chisholm will discuss what is involved in data governance and its implications for managing Variety in Big Data. Dan Brickley will be examining the problems of managing variety on the web with Schema.org.    (4AYI)

More than half the world population live in cities and the proportion is growing, so cities are an enormous source of data. However, it is not just the amount of data that is daunting, but the enormous variety not only within a single city but also among the thousands of different cities. The session today includes Mark Fox and Rosario Uceda-Sosa who are addressing some of the many aspects of the variety of data generated by cities.    (4AYJ)

After the panelists briefings, there will be time for Q&A and an open discussion among the panel and all participants.    (4APL)

For more information about Track D, see Track D page.    (4APM)

Please add your input to the discussion at: OntologySummit2014_Tackling_Variety_in_BigData_CommunityInput    (4APN)

See more details at: OntologySummit2014 (homepage for this summit)    (4AO7)

Briefings    (4AOL)

Agenda:    (4AP5)

OntologySummit2014 session-11 Track-D: Tackling the Variety Problem in Big Data-II    (4AP7)

Session Format: this is a virtual session conducted over an augmented conference call    (4AP8)

Proceedings    (4AV3)

Please refer to the above    (4AV4)

IM Chat Transcript captured during the session:    (4AV5)

 see raw transcript here.    (4AV6)
 (for better clarity, the version below is a re-organized and lightly edited chat-transcript.)
 Participants are welcome to make light edits to their own contributions as they see fit.    (4AV7)
 -- begin in-session chat-transcript --    (4AV8)
	------
	Chat transcript from room: summit_20140327
	2014-03-27 GMT-08:00 [PDT]
	------    (4B2L)
	[9:18] PeterYim: Welcome to the    (4B2M)
	 = OntologySummit2014 session-11 Track-D: Tackling the Variety Problem in Big Data - II - Thu 2014-03-27 =    (4B2N)
	Summit Theme: OntologySummit2014: "Big Data and Semantic Web Meet Applied Ontology"    (4B2O)
	Session Topic: Track D: "Tackling the Variety Problem in Big Data - II"    (4B2P)
	Session Co-chairs: Professor KenBaclawski (Northeastern University), Professor AnneThessen (Arizona State University)    (4B2Q)
	Panelists / Briefings:    (4B2R)
	* Professor MarkFox (University of Toronto) - "Variety in Big Data: A Cities Perspective"    (4B2S)
	* Dr. MalcolmChisolm (AskGet.com) - "Data Governance to Manage Variety in Big Data"    (4B2T)
	* Mr. DanBrickley (Google) - "Schema.org, FOAF and Linked Data: Lessons for Web-scale vocabulary deployment"    (4B2U)
	* Dr. RosarioUcedaSosa (IBM) - "Open Data, Big Data and Smart Cities"    (4B2V)
	Logistics:    (4B2W)
	* Refer to details on session page at: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_27    (4B2X)
	* (if you haven't already done so) please click on "settings" (top center) and morph from "anonymous" to your RealName; also please enable "Show timestamps" while there.    (4B2Y)
	* Mute control (phone keypad): *7 to un-mute ... *6 to mute    (4B2Z)
	* Attn: Skype users ... see: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_27#nid49OW
	** you may connect to (the skypeID) "joinconference" whether or not it indicates that it is online 
	   (i.e. even if it says it is "offline," you should still be able to connect to it.)
	** if you are using skype and the connection to "joinconference" is not holding up, try using (your favorite POTS or 
	   VoIP line, etc.) either your phone, skype-out or google-voice and call the US dial-in number: +1 (206) 402-0100 
	   ... when prompted enter Conference ID: 141184#
	** Can't find Skype Dial pad?
	*** for Windows Skype users: Can't find Skype Dial pad? ... it's under the "Call" dropdown menu as "Show Dial pad"
	*** for Linux Skype users: if the dialpad button is not shown in the call window you need to press the "d" hotkey to enable it    (4B30)
	* when posting in this Chat-room, kindly observe the following ...
	** whenever a name is used, please use the full WikiWord name format (every time you don't, some volunteer will have to make an edit afterwards)
	** always provide context (like: "[ref. JaneDoe's slide#12], I think the point about context is great" ... rather than "that's great!" 
	   as the latter would mean very little in the archives.)
	** when responding to a specific individual's earlier remarks, please cite his/her full WikiWord names *and* 
	   the timestamp (in PST) of his/her post that you are responding to (e.g. "@JaneDoe [11:09] - I agree, but, ...")
	** use fully qualified url's (include http:// ) without symbols (like punctuations or parentheses, etc.) right before of after that URL    (4B31)
	Attendees: AleksandraSojic, AlexShkotin, AliHashemi, AmandaVizedom, AnatolyLevenchuk, 
	BartGajderowicz, CarmenChui, ChristiKapp, ChristopherSpottiswoode, ConradBeaulieu, DanBrickley, 
	EdBernot, HaroldBoley, JamesOverton, KenBaclawski, KrzysztofJanowicz, KushagraThakur, 
	LamarHenderson, LeoObrst, LesMorgan, LianaKiff, MalcolmChisholm, MarcelaVegetti, MariaHerrero, 
	MarkFox, MartinDavtyan, MatthewWest, MichaelGruninger, MikeDean, MikeRiben, NaicongLi, NancyWiegand, 
	PeterYim, RamSriram, RexBrooks, RichardMartin, RosarioUcedaSosa, ShahanKhatchadourian, SiewLam, 
	SimonSpero, StefanoBorgo, SundayOjo, ToddSchneider, TorstenHahmann, UriShani, VictorAgroskin, 
	VitLibal    (4B32)
	 == Proceedings ==    (4B33)
	[4:14] anonymous morphed into MalcolmChisholm    (4B34)
	[4:14] MatthewWest: Hello world    (4B35)
	[4:16] MalcolmChisholm: This is my response    (4B36)
	[4:18] MatthewWest: @MalcolmChisholm Slide 10: what did you mean by that?    (4B37)
	[9:13] anonymous morphed into DanBrickley    (4B38)
	[9:15] anonymous morphed into RosarioUcedaSosa    (4B39)
	[9:17] anonymous morphed into MarkFox    (4B3A)
	[9:18] MarkFox: Reminds me of the IEEE multi-topic computer conferences of the 80s where there were 
	20 parallel sessions, and the audience was composed only of the speakers :)    (4B3B)
	[9:19] DanBrickley: hi folks.    (4B3C)
	[9:19] DanBrickley: I'm dialed into the phone bridge and the audio seems clear.    (4B3D)
	[9:22] MarkFox: I'm on skype listening to Muzak.    (4B3E)
	[9:23] anonymous morphed into LamarHenderson    (4B3F)
	[9:26] DanBrickley: I hear noises...    (4B3G)
	[9:27] ShahanKhatchadourian: hi all    (4B3H)
	[9:29] anonymous1 morphed into MartinDavtyan    (4B3I)
	[9:30] DanBrickley: I'm hearing choppy noises.    (4B3J)
	[9:30] EdBernot2 morphed into EdBernot    (4B3K)
	[9:30] EdBernot: Hello everybody    (4B3L)
	[9:31] DanBrickley: I didn't hear Peter's response to me very clearly.    (4B3M)
	[9:31] DanBrickley: maybe when others are muted all ok.    (4B3N)
	[9:32] anonymous morphed into LesMorgan    (4B3O)
	[9:33] DanBrickley: [various road/car/traffic style noises.]    (4B3P)
	[9:33] MartinDavtyan: Sorry, does the screen sharing work/is on?    (4B3Q)
	[9:35] MatthewWest morphed into MatthewWest    (4B3R)
	[9:39] PeterYim: == KenBaclawski starts session on behalf of the co-chairs ... see slides 
	under: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_27#nid4AOX    (4B3S)
	[9:41] anonymous morphed into MikeRiben    (4B3T)
	[9:44] PeterYim: == AnatolyLevenchuk making an announcement about the hackathon ...    (4B3U)
	[9:53] AnatolyLevenchuk: Hackathon update: 
	http://ontolog.cim3.net/file/work/OntologySummit2014/2014-03-27_OntologySummit2014_Tackling-the-Variety-Problem-in-Big-Data-2/OntologySummit2014_Hackathon-s11-announcement--AnatolyLevenchuk_20140327.pdf    (4B3V)
	[9:53] PeterYim: == MarkFox presenting ...    (4B3W)
	[9:56] ... anonymous morphed into UriShani    (4B3X)
	[9:56] ... PeterYim: @KenBaclawski - we have 41 people on the call, and only 32 in the chat-room, 
	you might want to remind everyone verbally, the next time you do a speaker transition    (4B3Y)
	[10:04] ... KenBaclawski: @PeterYim - Will do.    (4B3Z)
	[10:03] ... LeoObrst: Sorry, joining late.    (4B40)
	[10:08] ... anonymous1 morphed into KushagraThakur    (4B41)
	[10:04] MatthewWest: Slide 9: In Shell we found similar problems when we tried to bring data 
	together from different Group Companies - indicators and data in general developed independently 
	were not comparable. Since then I'd actually be surprised if independently developed data/indicators 
	were comparable.    (4B42)
	[10:16] MarkFox: Part of what the Global Cities Institute will be providing with the ISO standard is 
	a process for certifying that cities are conforming to the standard. It helps, but is not a complete solution.    (4B43)
	[10:12] MatthewWest: Slide 34: An ontology is not sufficient to ensure fidelity. Unfortunately, 
	there is little you can do to prevent people using slots in a data model in ways other than those 
	that are intended - without active management of the data creation process.    (4B44)
	[10:14] MichaelGruninger: @MatthewWest[10:12] The idea is to have enough axioms in the ontology to 
	verify that the entered data is consistent with other parts of the data model to ensure consistency    (4B45)
	[10:23] MatthewWest: @MichaelGruninger: I agree that you can check consistency at a logical level - 
	and that is very useful, but that only covers some of the things you can do wrong.    (4B46)
	[10:13] PeterYim: == MalcolmChisolm presenting ...    (4B47)
	[10:25] SimonSpero: @MalcolmChisolm, slide 4 : Columnar databases are not always schemaless ;
	e.g. Amazon Redshift    (4B48)
	[10:31] SimonSpero: ^^ also Virtuoso : 
	http://docs.openlinksw.com/virtuoso/coredbengine.html#colstore    (4B49)
	[10:33] MalcolmChisholm: @SimonSpero Thanks for the heads up    (4B4A)
	[10:32] ChristiKapp: @MalcolmChisholm Data Governance Institute was created in July 2004 - 
	http://www.datagovernance.com/ - domain name registered in 2003. We used to talk it earlier than 
	that here in Orlando area.    (4B4B)
	[10:34] MalcolmChisholm: @ChristiKapp - Yes I first learned about it from Gwen Thomas in Orlando in 
	2005 (she began http://www.datagovernance.com/)    (4B4C)
	[10:32] PeterYim: == DanBrickley presenting ...    (4B4D)
	[10:35] anonymous2 morphed into LamarHenderson    (4B4E)
	[10:39] anonymous1 morphed into SundayOjo    (4B4F)
	[10:34] PeterYim: ... on slide#1 now    (4B4G)
	[10:36] ... PeterYim: @ALL: what DanBrickley is calling slide#2 is labeled "3" on the slide deck (he 
	started on "0", the label starts on "1" unfortunately)    (4B4H)
	[10:36] ... AmandaVizedom: @danbri (DanBrickley) - the slides we have are numbered -1 from yours 
	 (start with 1 rather than 0; schema.org movie description is on slide 3).    (4B4I)
	[10:41] ... AmandaVizedom: yes    (4B4J)
	[10:52] ... PeterYim: I have just uploaded an updated version of DanBrickley's slides, that is 
	numbered starting from "0"    (4B4K)
	[10:47] MatthewWest: It seems to me that what Schema.org is offering is an answer to the identity 
	question - a common name for some thing. This is enormously valuable in practice, since until you 
	have this licked there is not much point to reasoning.    (4B4L)
	[10:51] PeterYim: ... on slide (labelled) #18 now    (4B4M)
	[10:51] KrzysztofJanowicz: great talk, very informative    (4B4N)
	[10:54] AmandaVizedom: @DanBrickley, thanks for that talk. I found some of your brief, semi-aside 
	comments quite interesting. E.g.: "hiding" variety, essentially making mirror concepts inside 
	schema.org and keeping provenance accessible, vs. importing an open-ended number of other vocab, as 
	method of reuse.    (4B4O)
	[10:55] AmandaVizedom: This latter intersects with an increasingly comment point of concern for 
	users of w3c stack ontologies/vocabs, as all that importing can really challenge onto usability and 
	managability.    (4B4P)
	[10:57] SimonSpero: @MatthewWest, @danbri (DanBrickley): It seems to sort of punt on the identity 
	question - sameAs ~= foaf:page    (4B4Q)
	[10:57] SimonSpero: (unhelpful smile foaf : page )    (4B4R)
	[11:10] MatthewWest: @SimonSpero: The key is that identity has to be managed, managed duplication 
	still works - it is technically a little less efficient than no duplication, but may be politically 
	much more efficient.    (4B4S)
	[10:52] PeterYim: == RosarioUcedaSosa presenting ...    (4B4T)
	[11:23] BartGajderowicz: @Rosario, my MSc work used machine learning on instance data to identify 
	similarities between ontologies associated with those instances (ontology mapping). I'm wondering 
	whether this is along the same lines of research on IBMs Helix project?    (4B4U)
	[11:23] RosarioUcedaSosa: It may. Send me the refs to rosariou@us.ibm.com. Thanks    (4B4V)
	[11:25] BartGajderowicz: Will do. Thanks    (4B4W)
	[11:13] PeterYim: == Q & A and Open Discussion ...    (4B4X)
	[11:13] PeterYim: ... Question from MartinDavtyan ...    (4B4Y)
	[11:14] ... PeterYim: (one person is still identified as "anonymous) will you please click  
	on "settings" (top center) and morph from "anonymous" to your RealName; also please enable "Show 
	timestamps" while there.    (4B4Z)
	[11:15] ... PeterYim: (I still see a few names who are "new" here) @Those who are not already 
	subscribed to the [ontology-summit] mailing list: please do so (to receive all notifications and 
	participate in the ongoing asynchronous discourse) - 
	http://ontolog.cim3.net/mailman/listinfo/ontology-summit (or drop me a line - peter.yim [at] 
	cim3.com)    (4B50)
	[11:20] anonymous1 morphed into ConradBeaulieu    (4B51)
	[11:20] DanBrickley: I can't go into details but it is common knowledge that Google does a lot of 
	stats, machine learning etc. http://research.google.com/pubs/papers.html has published papers in 
	general. The most interesting published crossover (machine learning <-> entities/semantics/Freebase) 
	work lately is https://code.google.com/p/word2vec/    (4B52)
	[11:24] SimonSpero: Filling in missing data (Imputation) is heavily used by e.g. the Census    (4B53)
	[11:24] DanBrickley: oh, this dan?    (4B54)
	[11:25] DanBrickley: it was just an aside...    (4B55)
	[11:25] SimonSpero: e.g. http://www.census.gov/srd/papers/pdf/rr99-02.pdf    (4B56)
	[11:27] SimonSpero: or: 
	http://www.census.gov/content/dam/Census/programs-surveys/ahs/working-papers/hotdeck.pdf    (4B57)
	[11:26] DanBrickley: http://www.w3.org/blog/news/archives/3758    (4B58)
	[11:28] DanBrickley: http://lists.w3.org/Archives/Public/public-vocabs/2014Mar/0111.html    (4B59)
	[11:29] SimonSpero: TriG?    (4B5A)
	[11:29] SimonSpero: (named graph / dataset)    (4B5B)
	[11:29] DanBrickley: http://datasets.schema-labs.appspot.com    (4B5C)
	[11:30] AmandaVizedom: As an aside to the particular message link Dan just posted, I'd like the 
	express my positive experience with the public vocabs list : 
	http://lists.w3.org/Archives/Public/public-vocabs/...    (4B5D)
	[11:28] PeterYim: @ALL: please try to capture the verbal discussions onto the chat (for archival 
	purposes) as the chat-transcript will, as always, be archived as part of the session proceedings    (4B5E)
	[11:30] MartinDavtyan: Summary of my question: Is there any data analysis practices that are making 
	use of ontological metadata? Can metadata be used for handling the issue of missing data when using 
	data from sources with different data structure? Are there statistical tools working not on the 
	integrated data (as in, already merged from different sources to form standardized data structure), 
	but on federated data?    (4B5F)
	[11:30] MartinDavtyan: Simon, thanks a lot for the link!    (4B5G)
	[11:30] DanBrickley: so a link from Datasets and schema.org to the city data theme: 
	https://data.sfgov.org/Transportation/Parking-meters/28my-4796    (4B5H)
	[11:31] DanBrickley: found via schema.org Dataset.. but the challenge is: how can search engines 
	know more than "this is a dataset" linked in 
	https://data.sfgov.org/browse?q=transportation&sortBy=relevance&tags=parking&utf8=%E2%9C%93    (4B5I)
	[11:32] DanBrickley: list is here: http://lists.w3.org/Archives/Public/public-vocabs/    (4B5J)
	[11:33] DanBrickley: (nice mix of search marketing and KR debates there :)    (4B5K)
	[11:33] AmandaVizedom: Very true, Dan!    (4B5L)
	[11:33] KenBaclawski: Community input page for Track D: http://ontolog.cim3.net/cgi-bin/wiki.pl? 
	OntologySummit2014_Tackling_Variety_In_BigData_CommunityInput    (4B5M)
	[11:32] PeterYim: @ALL: as announced by our Symposium co-chairs, Professor TimFinin and Dr. Ram 
	Sriram yesterday, our Apr 28~29 Symposium (at NSF in Greater Washington DC) is now open for 
	registration. Please register yourself ASAP, as capacity is limited - see: 
	http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014/WorkshopRegistration ... Note that new 
	information about the availability (until Apr-4) of hotel reservation block (with preferred rates) 
	has been posted!    (4B5N)
	[11:32] PeterYim: @ALL: Please mark your calendars and reserve this time, every Thursday, for the 
	OntologySummit2014 virtual panel session series. In particular ... Session-12 will be up next 
	Thursday - Thu 2014.04.03 - OntologySummit2014: "Synthesis-II: Technical Tracks & Hackathon" *** 
	Again, please pay special attention to the start-time (9:30am PDT), as in this week is both 
	N.America and Europe will be in Summer time, but there are still other regions that don't do 
	daylight saving time at all! *** - see developing details at: 
	http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_04_03 ... the start-time for various 
	time-zones will be clearly posted there    (4B5O)
	[11:33] PeterYim: Of course ... See you at the HACKATHON this Saturday (Mar-29) - see latest details 
	at: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014_Hackathon    (4B5P)
	[11:33] ShahanKhatchadourian: thank you all    (4B5Q)
	[11:33] PeterYim: Great session!    (4B5R)
	[11:33] EdBernot: Great session, thanks!    (4B5S)
	[11:34] AmandaVizedom: It's that practical meets principled collision / collaboration that I've 
	enjoyed so much. :-)    (4B5T)
	[11:35] PeterYim: -- session ended: 11:31am PDT --    (4B5U)
 -- end of in-session chat-transcript --    (4AV9)

Additional Resources:    (4APD)


For the record ...    (4AUO)

How To Join (while the session is in progress)    (4AUP)

*** Please pay special attention to the start-time for this session, as this week is among the tricky ones, when North America is already in Summer time, Europe is still in Winter time, and lots of other regions don't even do daylight saving time at all! ***    (49PZ)

Conference Call Details    (49ON)

Attendees    (49PK)