ppy/OntologySummit2014-s11_chat-transcript_edited_20140327b.txt ------ Chat transcript from room: summit_20140327 2014-03-27 GMT-08:00 [PDT] ------ [9:18] PeterYim: Welcome to the = OntologySummit2014 session-11 Track-D: Tackling the Variety Problem in Big Data - II - Thu 2014-03-27 = Summit Theme: OntologySummit2014: "Big Data and Semantic Web Meet Applied Ontology" Session Topic: Track D: "Tackling the Variety Problem in Big Data - II" Session Co-chairs: Professor KenBaclawski (Northeastern University), Professor AnneThessen (Arizona State University) Panelists / Briefings: * Professor MarkFox (University of Toronto) - "Variety in Big Data: A Cities Perspective" * Dr. MalcolmChisolm (AskGet.com) - "Data Governance to Manage Variety in Big Data" * Mr. DanBrickley (Google) - "Schema.org, FOAF and Linked Data: Lessons for Web-scale vocabulary deployment" * Dr. RosarioUcedaSosa (IBM) - "Open Data, Big Data and Smart Cities" Logistics: * Refer to details on session page at: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_27 * (if you haven't already done so) please click on "settings" (top center) and morph from "anonymous" to your RealName; also please enable "Show timestamps" while there. * Mute control (phone keypad): *7 to un-mute ... *6 to mute * Attn: Skype users ... see: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_27#nid49OW ** you may connect to (the skypeID) "joinconference" whether or not it indicates that it is online (i.e. even if it says it is "offline," you should still be able to connect to it.) ** if you are using skype and the connection to "joinconference" is not holding up, try using (your favorite POTS or VoIP line, etc.) either your phone, skype-out or google-voice and call the US dial-in number: +1 (206) 402-0100 ... when prompted enter Conference ID: 141184# ** Can't find Skype Dial pad? *** for Windows Skype users: Can't find Skype Dial pad? ... it's under the "Call" dropdown menu as "Show Dial pad" *** for Linux Skype users: if the dialpad button is not shown in the call window you need to press the "d" hotkey to enable it * when posting in this Chat-room, kindly observe the following ... ** whenever a name is used, please use the full WikiWord name format (every time you don't, some volunteer will have to make an edit afterwards) ** always provide context (like: "[ref. JaneDoe's slide#12], I think the point about context is great" ... rather than "that's great!" as the latter would mean very little in the archives.) ** when responding to a specific individual's earlier remarks, please cite his/her full WikiWord names *and* the timestamp (in PST) of his/her post that you are responding to (e.g. "@JaneDoe [11:09] - I agree, but, ...") ** use fully qualified url's (include http:// ) without symbols (like punctuations or parentheses, etc.) right before of after that URL Attendees: AleksandraSojic, AlexShkotin, AliHashemi, AmandaVizedom, AnatolyLevenchuk, BartGajderowicz, CarmenChui, ChristiKapp, ChristopherSpottiswoode, ConradBeaulieu, DanBrickley, EdBernot, HaroldBoley, JamesOverton, KenBaclawski, KrzysztofJanowicz, KushagraThakur, LamarHenderson, LeoObrst, LesMorgan, LianaKiff, MalcolmChisholm, MarcelaVegetti, MariaHerrero, MarkFox, MartinDavtyan, MatthewWest, MichaelGruninger, MikeDean, MikeRiben, NaicongLi, NancyWiegand, PeterYim, RamSriram, RexBrooks, RichardMartin, RosarioUcedaSosa, ShahanKhatchadourian, SiewLam, SimonSpero, StefanoBorgo, SundayOjo, ToddSchneider, TorstenHahmann, UriShani, VictorAgroskin, VitLibal == Proceedings == [4:14] anonymous morphed into MalcolmChisholm [4:14] MatthewWest: Hello world [4:16] MalcolmChisholm: This is my response [4:18] MatthewWest: @MalcolmChisholm Slide 10: what did you mean by that? [9:13] anonymous morphed into DanBrickley [9:15] anonymous morphed into RosarioUcedaSosa [9:17] anonymous morphed into MarkFox [9:18] MarkFox: Reminds me of the IEEE multi-topic computer conferences of the 80s where there were 20 parallel sessions, and the audience was composed only of the speakers :) [9:19] DanBrickley: hi folks. [9:19] DanBrickley: I'm dialed into the phone bridge and the audio seems clear. [9:22] MarkFox: I'm on skype listening to Muzak. [9:23] anonymous morphed into LamarHenderson [9:26] DanBrickley: I hear noises... [9:27] ShahanKhatchadourian: hi all [9:29] anonymous1 morphed into MartinDavtyan [9:30] DanBrickley: I'm hearing choppy noises. [9:30] EdBernot2 morphed into EdBernot [9:30] EdBernot: Hello everybody [9:31] DanBrickley: I didn't hear Peter's response to me very clearly. [9:31] DanBrickley: maybe when others are muted all ok. [9:32] anonymous morphed into LesMorgan [9:33] DanBrickley: [various road/car/traffic style noises.] [9:33] MartinDavtyan: Sorry, does the screen sharing work/is on? [9:35] MatthewWest morphed into MatthewWest [9:39] PeterYim: == KenBaclawski starts session on behalf of the co-chairs ... see slides under: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_03_27#nid4AOX [9:41] anonymous morphed into MikeRiben [9:44] PeterYim: == AnatolyLevenchuk making an announcement about the hackathon ... [9:53] AnatolyLevenchuk: Hackathon update: http://ontolog.cim3.net/file/work/OntologySummit2014/2014-03-27_OntologySummit2014_Tackling-the-Variety-Problem-in-Big-Data-2/OntologySummit2014_Hackathon-s11-announcement--AnatolyLevenchuk_20140327.pdf [9:53] PeterYim: == MarkFox presenting ... [9:56] ... anonymous morphed into UriShani [9:56] ... PeterYim: @KenBaclawski - we have 41 people on the call, and only 32 in the chat-room, you might want to remind everyone verbally, the next time you do a speaker transition [10:04] ... KenBaclawski: @PeterYim - Will do. [10:03] ... LeoObrst: Sorry, joining late. [10:08] ... anonymous1 morphed into KushagraThakur [10:04] MatthewWest: Slide 9: In Shell we found similar problems when we tried to bring data together from different Group Companies - indicators and data in general developed independently were not comparable. Since then I'd actually be surprised if independently developed data/indicators were comparable. [10:16] MarkFox: Part of what the Global Cities Institute will be providing with the ISO standard is a process for certifying that cities are conforming to the standard. It helps, but is not a complete solution. [10:12] MatthewWest: Slide 34: An ontology is not sufficient to ensure fidelity. Unfortunately, there is little you can do to prevent people using slots in a data model in ways other than those that are intended - without active management of the data creation process. [10:14] MichaelGruninger: @MatthewWest[10:12] The idea is to have enough axioms in the ontology to verify that the entered data is consistent with other parts of the data model to ensure consistency [10:23] MatthewWest: @MichaelGruninger: I agree that you can check consistency at a logical level - and that is very useful, but that only covers some of the things you can do wrong. [10:13] PeterYim: == MalcolmChisolm presenting ... [10:25] SimonSpero: @MalcolmChisolm, slide 4 : Columnar databases are not always schemaless ; e.g. Amazon Redshift [10:31] SimonSpero: ^^ also Virtuoso : http://docs.openlinksw.com/virtuoso/coredbengine.html#colstore [10:33] MalcolmChisholm: @SimonSpero Thanks for the heads up [10:32] ChristiKapp: @MalcolmChisholm Data Governance Institute was created in July 2004 - http://www.datagovernance.com/ - domain name registered in 2003. We used to talk it earlier than that here in Orlando area. [10:34] MalcolmChisholm: @ChristiKapp - Yes I first learned about it from Gwen Thomas in Orlando in 2005 (she began http://www.datagovernance.com/) [10:32] PeterYim: == DanBrickley presenting ... [10:35] anonymous2 morphed into LamarHenderson [10:39] anonymous1 morphed into SundayOjo [10:34] PeterYim: ... on slide#1 now [10:36] ... PeterYim: @ALL: what DanBrickley is calling slide#2 is labeled "3" on the slide deck (he started on "0", the label starts on "1" unfortunately) [10:36] ... AmandaVizedom: @danbri (DanBrickley) - the slides we have are numbered -1 from yours (start with 1 rather than 0; schema.org movie description is on slide 3). [10:41] ... AmandaVizedom: yes [10:52] ... PeterYim: I have just uploaded an updated version of DanBrickley's slides, that is numbered starting from "0" [10:47] MatthewWest: It seems to me that what Schema.org is offering is an answer to the identity question - a common name for some thing. This is enormously valuable in practice, since until you have this licked there is not much point to reasoning. [10:51] PeterYim: ... on slide (labelled) #18 now [10:51] KrzysztofJanowicz: great talk, very informative [10:54] AmandaVizedom: @DanBrickley, thanks for that talk. I found some of your brief, semi-aside comments quite interesting. E.g.: "hiding" variety, essentially making mirror concepts inside schema.org and keeping provenance accessible, vs. importing an open-ended number of other vocab, as method of reuse. [10:55] AmandaVizedom: This latter intersects with an increasingly comment point of concern for users of w3c stack ontologies/vocabs, as all that importing can really challenge onto usability and managability. [10:57] SimonSpero: @MatthewWest, @danbri (DanBrickley): It seems to sort of punt on the identity question - sameAs ~= foaf:page [10:57] SimonSpero: (unhelpful smile foaf : page ) [11:10] MatthewWest: @SimonSpero: The key is that identity has to be managed, managed duplication still works - it is technically a little less efficient than no duplication, but may be politically much more efficient. [10:52] PeterYim: == RosarioUcedaSosa presenting ... [11:23] BartGajderowicz: @Rosario, my MSc work used machine learning on instance data to identify similarities between ontologies associated with those instances (ontology mapping). I'm wondering whether this is along the same lines of research on IBMs Helix project? [11:23] RosarioUcedaSosa: It may. Send me the refs to rosariou@us.ibm.com. Thanks [11:25] BartGajderowicz: Will do. Thanks [11:13] PeterYim: == Q & A and Open Discussion ... [11:13] PeterYim: ... Question from MartinDavtyan ... [11:14] ... PeterYim: (one person is still identified as "anonymous) will you please click on "settings" (top center) and morph from "anonymous" to your RealName; also please enable "Show timestamps" while there. [11:15] ... PeterYim: (I still see a few names who are "new" here) @Those who are not already subscribed to the [ontology-summit] mailing list: please do so (to receive all notifications and participate in the ongoing asynchronous discourse) - http://ontolog.cim3.net/mailman/listinfo/ontology-summit (or drop me a line - peter.yim [at] cim3.com) [11:20] anonymous1 morphed into ConradBeaulieu [11:20] DanBrickley: I can't go into details but it is common knowledge that Google does a lot of stats, machine learning etc. http://research.google.com/pubs/papers.html has published papers in general. The most interesting published crossover (machine learning <-> entities/semantics/Freebase) work lately is https://code.google.com/p/word2vec/ [11:24] SimonSpero: Filling in missing data (Imputation) is heavily used by e.g. the Census [11:24] DanBrickley: oh, this dan? [11:25] DanBrickley: it was just an aside... [11:25] SimonSpero: e.g. http://www.census.gov/srd/papers/pdf/rr99-02.pdf [11:27] SimonSpero: or: http://www.census.gov/content/dam/Census/programs-surveys/ahs/working-papers/hotdeck.pdf [11:26] DanBrickley: http://www.w3.org/blog/news/archives/3758 [11:28] DanBrickley: http://lists.w3.org/Archives/Public/public-vocabs/2014Mar/0111.html [11:29] SimonSpero: TriG? [11:29] SimonSpero: (named graph / dataset) [11:29] DanBrickley: http://datasets.schema-labs.appspot.com [11:30] AmandaVizedom: As an aside to the particular message link Dan just posted, I'd like the express my positive experience with the public vocabs list : http://lists.w3.org/Archives/Public/public-vocabs/... [11:28] PeterYim: @ALL: please try to capture the verbal discussions onto the chat (for archival purposes) as the chat-transcript will, as always, be archived as part of the session proceedings [11:30] MartinDavtyan: Summary of my question: Is there any data analysis practices that are making use of ontological metadata? Can metadata be used for handling the issue of missing data when using data from sources with different data structure? Are there statistical tools working not on the integrated data (as in, already merged from different sources to form standardized data structure), but on federated data? [11:30] MartinDavtyan: Simon, thanks a lot for the link! [11:30] DanBrickley: so a link from Datasets and schema.org to the city data theme: https://data.sfgov.org/Transportation/Parking-meters/28my-4796 [11:31] DanBrickley: found via schema.org Dataset.. but the challenge is: how can search engines know more than "this is a dataset" linked in https://data.sfgov.org/browse?q=transportation&sortBy=relevance&tags=parking&utf8=%E2%9C%93 [11:32] DanBrickley: list is here: http://lists.w3.org/Archives/Public/public-vocabs/ [11:33] DanBrickley: (nice mix of search marketing and KR debates there :) [11:33] AmandaVizedom: Very true, Dan! [11:33] KenBaclawski: Community input page for Track D: http://ontolog.cim3.net/cgi-bin/wiki.pl? OntologySummit2014_Tackling_Variety_In_BigData_CommunityInput [11:32] PeterYim: @ALL: as announced by our Symposium co-chairs, Professor TimFinin and Dr. Ram Sriram yesterday, our Apr 28~29 Symposium (at NSF in Greater Washington DC) is now open for registration. Please register yourself ASAP, as capacity is limited - see: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014/WorkshopRegistration ... Note that new information about the availability (until Apr-4) of hotel reservation block (with preferred rates) has been posted! [11:32] PeterYim: @ALL: Please mark your calendars and reserve this time, every Thursday, for the OntologySummit2014 virtual panel session series. In particular ... Session-12 will be up next Thursday - Thu 2014.04.03 - OntologySummit2014: "Synthesis-II: Technical Tracks & Hackathon" *** Again, please pay special attention to the start-time (9:30am PDT), as in this week is both N.America and Europe will be in Summer time, but there are still other regions that don't do daylight saving time at all! *** - see developing details at: http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2014_04_03 ... the start-time for various time-zones will be clearly posted there [11:33] PeterYim: Of course ... See you at the HACKATHON this Saturday (Mar-29) - see latest details at: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014_Hackathon [11:33] ShahanKhatchadourian: thank you all [11:33] PeterYim: Great session! [11:33] EdBernot: Great session, thanks! [11:34] AmandaVizedom: It's that practical meets principled collision / collaboration that I've enjoyed so much. :-) [11:35] PeterYim: -- session ended: 11:31am PDT -- ------