ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] WebSchemas, Schema.org and W3C

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Wed, 06 Feb 2013 15:16:55 -0500
Message-id: <5112BA37.1080600@xxxxxxxxxxxxxx>
All,

A message that I forgot to pass on to this list.

Kingsley

On 1/22/13 11:45 AM, Bernard Vatant wrote:
Hi Dan

I can't believe that such a rich and thoughtful message did not get any answer (at least any public one) in ten days.
Thanks to putting it down anyway. I wanted to answer right away when you posted it but had not until today the bandwidth to do so properly.

So here goes, answers below. Note for those who don't care to drill down in such a long discussion that the main point about it is a call to action that will be certainly be duplicated on other channels, but I extract it here :

ACTION

Make a list of "globally adopted schemas" (vocabularies)  and put a responsible agent name/email/URI whatever Web identifier in front of it https://docs.google.com/spreadsheet/ccc?key=0AiYc9tLJbL4SdHByWkRYUkYxZU5qS1lQOE5FV0hiNlE#gid=0
Free to edit by anyone. If you are currently responsible for a vocabulary, put your name and contact email address.
Let's take a month to see what we can gather. A month from now I will mail all declared responsible to have confirmation, lock the document, and add this information to LOV vocabularies description.

If you want to make sure what I mean by "responsible", read details below.

Best

Bernard

2013/1/13 Dan Brickley <danbri@xxxxxxxxxx>

...
As a member of the RDF community since 1997, I'm painfully aware of
some of our failings. It is (as has been expressed already in this
thread) important to avoid over-burdening schema.org with every hope
and aspiration that attaches to the RDF, '[sS]emantic [wW]eb', 'Linked
[open] Data' etc labels. Or put another way; schema.org has no
intention of being overburdened with such things.

Two particular failings of our community come to mind. One is that we
have an endearing and frustrating architecture of politeness based on
the use of namespaces that has led to a situation in which we have a
fragmented suite of independent vocabularies that are hard for new
parties to adopt.

I'm not sure that fragmentation and independence is the main obstacle to adoption. Or the same can be said for any linked data and dataset. Vocabularies are a particular kind of linked data, but they are linked data. Linked data are also fragmented and managed by independent sources. Choosing a reference vocabulary should be no more no less an issue that choosing reference entities in an authority list, a thesaurus or any kind of linked data base.
Main issues for adoption of reference URIs are quality and sustainability of the resources and responsibility of the publisher.
Discovering vocabularies might be tough, although we have more and more tools for that (not to mention LOV again here), but assessing those three key parameters (quality, sustainability, responsibilty) is a headache mainly because many vocabulary publishers do not take them seriously, as attested by the crying lack of documentation and metadata for many of them.
We have now serious people and orgnizations eager to enter the linked data game, and I meet more and more the question : "can we trust X or Y to be still available in 5-10 years?"

The culture around RDF is that you only publish
schemas for the 'diffs', the missing vocabulary that wasn't covered by
a jumbled mix of existing terminology. So anyone doing document-like
markup would be frowned at - "Did you consider using Dublin Core?";
anyone publishing an RDF vocabulary describing people "Why didn't you
use FOAF?", and so on. And the very architecture that supported this -
namespaces - allowed us to continue to design these parallel
descriptive systems without being forced to sit down together and work
out how they can be combined to solve real world problems.

Indeed. But previous architectures provided parallel vocabularies in parallel formats not interoperable at all, so we have a real progress. We had, and still have around, people convinced that the linked data technical infrastructure would work without social interagreement. But "Publish and let the Web do the REST" just does not work. Not only for vocabularies, but again for linked data at large. But seems to me now we have more and more people convinced that the technical interoperability ensured by the common linked data infrastructure is not enough if there is no social coordination. So let's sit down together, indeed.
See e.g. the thematic of next DC conference http://dcevents.dublincore.org/index.php/IntConf/dc-2013.

But I also agree with others that this forum is not necessarily the one to solve all problems, and certainly not by bringing every other vocabulary under the schema.org umbrella. Opening several focused tables of conversation is certainly more profitable.
 
 A couple of years ago, I did sit down and look at the words we'd
chosen in various deployed and popular-ish RDF vocabularies; I called
it "Zoo"; https://github.com/danbri/Zoo/blob/master/zoo.foaf.tv/index.html
https://github.com/danbri/Zoo/blob/master/zoo.foaf.tv/zoo/raw_manifest.txt
... this showed that 'Collection' was used in bibo:, swan:, 'Work' in
skos:; cc: vcard:; 'description' in dcterms: doap: gr: ical: sioc:,
'category' in 'doap: gr: po: vcard:', 'subject' in dcterms: po: rdf:
sioc:, title in 'dcterms: foaf: sioc: vcard:' and so on.

You could easily enrich this zoo now with the LOV search
lov.okfn.org/dataset/lov/search/#s=description
lov.okfn.org/dataset/lov/search/#s=title
...
 
Part of my hope for this forum is that  -yes, heavily nudged by the creation of
schema.org - RDF vocabulary managers and editors could finally take
the time to stay in touch.

Indeed!
 
That parties working on vocabularies
designed to be deployed alongside each other, could do the world a
favour and talk to each other a bit more.

YES !
 
It is good that we have the
namespaces technical mechanism; but it has for too long allowed us to
sidestep the need to talk about how different vocabularies fit
together as more than mere triples.

Having pursued the same objective inside the LOV project for about three years now, I would say that the main obstacle we've met is the pervasive lack of responsibility of vocabulary owners/authors/creators/publishers/curators. We have gathered more than 300 vocabularies, but for many of them it is not possible to identify who is the current responsible entity (person or organisation), under any definition of the word at http://en.wiktionary.org/wiki/responsible. In a nutshell people don't make things seriously, and/or they don't answer when called. I don't say it's a general rule, but from potential adopters it's very difficult to say if there is someone responsible behind a given vocabulary, in particular in those frequent cases where the project is closed, original editor has moved, or does not answer mails etc etc.

Seems to me a simple basic action should be taken to start with, either here or under any relevant forum, which would be in a nutshell : responsible people, step forward. Who wants to play nicely in this game, how do you make it public, and how would other know about it. We can define a simple markup on vocabularies, similar to creative commons spirit, showing the level of engagement or responsibility involved in the vocabulary publishing. Lists of vocabularies along with their current curators, endorsing a certain number of social rules, like taking part in process where their vocabularies are put on the table with other relevant ones, etc. could be easily published and updated on a regular basis. We have already exchanged with Tom Baker on this, DCMI have thought seriously about those issues for a while as you (Dan) are well aware of. 2013 should be a year of serious action on this.

The point is that in this community too many people have come to know each other too well, so that they don't see why those implicit connections and involvements should be explicited anywhere. But for people from outside, all this is currently totally opaque.
 
So WebSchemas was designed to be something a bit more than 'the
schema.org mailing list at W3C', and I still believe that. We (the
larger 'we') need a forum in which all schemas intended for
planet-wide use are equally 'on topic'. The existence of schema.org
should not have a chilling effect on the design, use and deployment of
other RDF vocabularies. Even if the schema.org partner companies are
not in a position right now to collectively promise to
support/understand/use/endorse non-schema.org vocabulary, it is still
healthy to have multiple efforts, initiatives and perspectives. (The
move towards RDFa Lite is a very positive thing here, btw.)

Very glad to read that. Diversity is good, but my above suggestions might help to clarify who are 'we' to begin with.
 
The second failing of the community around RDF is that we have - as
the years have drifted by - acquired a reputation for enjoying talk
over action, and this isn't entirely undeserved.

But basically unfair. We've talked a lot, but achieved a lot also.
There is an amazing lot of people around able to talk and code at the same time :)
 
Yesterday I was
re-reading some old mail threads with the late and lamented Aaron
Swartz - http://lists.foaf-project.org/pipermail/foaf-dev/2000-August/004215.html
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Jul/0034.html
- that frustration was already present in 2000. In the charter for
this WebSchemas group i.e.
http://www.w3.org/2001/sw/interest/webschema.html we list some semweb
permathread themes explicitly as out-of-scope.

"Out of scope topics include:

* Advocacy of data models or syntaxes without attention to real-world use cases
* The use of inference
* debate over foundational ontologies"

This does not mean that inference and foundational ontologies are
uninteresting or unimportant, just that every successful forum needs
to have some core scope, and that we have plenty of other places
around W3C to debate those topics. What makes the WebSchemas group
special? Just that here, finally, we have somewhere where parties
responsible for globally adopted RDF schemas can do the responsible
thing and stay more carefully in touch with each other.

You wrote the word : responsible. Now let's make a list of "globally adopted schemas" and put the responsible agent name/email/URI whatever Web identifier in front of it. Simple action, I've started here :
https://docs.google.com/spreadsheet/ccc?key=0AiYc9tLJbL4SdHByWkRYUkYxZU5qS1lQOE5FV0hiNlE#gid=0
Free to edit by anyone. If you are currently responsible for a vocabulary, put your name and contact email address.
Let's take a month to see what we can gather. A month from now I will mail all declared responsible to have confirmation, lock the document, and add this information to LOV vocabularies description.

As Martin points out in a mail that arrived while typing this, ... one
list is not going to be enough for everything. And in terms of work
style for getting (sub-)schemas created and integrated, one size
doesn't fit all. What we've found with schema.org is that different
collaboration styles make sense for different domains. I suggested a
W3C Community Group to Richard Wallis and I'm pleased to see that it
has independent existence and activity. A few months ago I helped set
up a 'sports schemas' group (just a Google Group mailing list), but
that initiative is yet to thrive. We have a very active and largely
independent community around the LRMI vocabulary managed quite
separately, but linked to this one by mail, wiki and occasional audio
catchups. There is of course Good Relations, which also enjoys
independent existence.

And there is an ongoing effort to make the Time Ontology move forward beyond it current "draft status".
 
In general I think W3C community groups are a fine mechanism for more
focussed and intense vocabulary collaboration, and this forum serves
more for integration issues and high level overview on how all the
pieces of the jigsaw fit together. It could be great, for example, to
see a community group around modeling fiction (and Comics?), but we
also need a place where all such efforts can report back to the wider
community. The creation of schema.org has made all this more urgent
and timely, but it is something we've needed for a while. In the
Dublin Core world we talk about this as 'application profiles';
templates and examples explaining how independently designed pieces of
vocabulary can be mixed together to address real world descriptive
needs. It should happen at W3C, schema.org should engage with it, but
the need is broader. I think WebSchemas is the right place for it.

I should also mention that there are a few areas now where groups
elsewhere around W3C have come up with vocabulary (e.g. Organization +
Registered Organization vocabs; DCAT/ADMS; Geo and post addresses)
that will likely inform improvements to schema.org. There is a need
for somewhere public to work out details around stability/versions,
appropriate acknowledgement, etc.

Exactly. What I call "sustainable vocabulary management".
I would like to mention that in France, in the framework of the Datalift project (datalift.org) we have among partners national institutions INSEE (statistics) and IGN (geographical) working together to publish linked data and harmonize their vocabularies and data with each other and the general vocabulary ecosystem. Those are "serious" "normal" data publishers playing the game nicely.
 
The fundamental problem of schema design is that the world is not
tidily partitioned; that all use cases interact and overlap -
'Intertwingularity'.  We can make focussed sub-fora for figuring out
how to describe sports, or fiction, or journals and books, but the
combinations and scope overlaps can be overwhelming. While good design
can help, perhaps even more important is communication.

Again, triple YES !
 
And for that we need somewhere to talk. I don't think it ultimately
matters hugely whether there is a schema.org-specific mailing list at
W3C alongside a more general 'all vocabularies' one, versus a single
list as we have now. My preference is for a unified forum, and we will
likely spin off various schema.org-specific lists for specific
detailed schema.org topics. But given schema.org's cross-domain
nature, it seems important for the project to remain highly visible in
a cross-domain, multi-schema forum.

Dan

> //Ed
>
> [1] http://www.w3.org/2001/sw/interest/webschema.html
>




--
Bernard Vatant
Vocabularies & Data Engineering
Tel :  + 33 (0)9 71 48 84 59
Skype : bernard.vatant
Blog : the wheel and the hub

--------------------------------------------------------
Mondeca                             
3 cité Nollez 75018 Paris, France
Follow us on Twitter : @mondecanews


-- 

Regards,

Kingsley Idehen	      
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>