OntologySummit2014_Hackathon - Project: (48GO)
Reference data for Anime and Manga: Semantic Linking and Publishing of Diverse Data-Sets (4BO0)
Project roster page: OntologySummit2014_Hackathon_ReferenceDataForAnimeAndManga (this page). (48F7)
Team lead: VictorAgroskin (MSK, UTC+4), vic5784 at gmail.com (48F8)
Final report slides:http://ontolog.cim3.net/file/work/OntologySummit2014/2014-04-28_29_OntologySummit2014_Symposium/Hackathon-RD-Anime_OntologySummit2014_Symposium-Report--VictorAgroskin_20140429.pdf (4DB5)
Short report, also provided to Summit community on 2014-04-01 by discussion list (4BO1)
Reference data for Anime and Manga Hackathon took place on the 29th of March 2014 12:00 20:00 MSK (virtual and real session) with subsequent activities during the next two days. (4BO2)
Real session gathered 3 participants in Moscow, 6 more people from Russia, Belorussia, Switzerland, Germany connected online. Working language of hackathon was Russian, with occasional use of English when international participants connected to the session. (4BO3)
Implementation work was carried out in the environment of .15926 Editor (specially tailored version downloadable from http://techinvestlab.ru/files/15alpha/dot15926Editor15alpha.rar ). (4BO4)
Some initial data sets, developed reference data and patterns, adapter code and project results are published in the open project repository http://github.com/ailev/anird (4BO5)
During an event the following tasks were performed: (4BO6)
1. Native database analysis. Two online data sources were identified for the work: http://www.animenewsnetwork.com/ and http://anidb.net . Various database access methods were identified and tested, resulting in XML database exports for local access and API specifications for remote access. (4BO7)
2. Ontology analysis and modeling. As planned we were focusing on the 4D ontology analysis prescribed by the ISO 15926 modeling methodology. (4BO8)
Just two aspect of the vast anime and manga domain were addressed alternative identifications of creative products and work division between staff members of product teams. (4BO9)
For alternative identifications Classified Identification model and corresponding ISO 15926 template was selected. Language identification in the models was omitted because of the lack of time. (4BOA)
For work division complex Activity model was developed, describing involvement of creative product itself and participation of individuals in classified staff activities. (4BOB)
3. Identification schema development. A namespace for semantic data was selected, and UUID-based non-human readable URI schema was decided, to avoid language identification and text transformations required for human-readable URIs. (4BOC)
4. Reference data development. Initial set of classes was developed for the ontology model: classes to classify creative work, team members, staff activities and basic types of identification (identification by name and by number in the context of corresponding databases). Additional classes for identification types and for activity classifiers were discovered in the data sources and created during the data import. (4BOD)
Set of initial templates from ISO 15926-8 was selected for modeling as the most generic template set available. (4BOE)
We were unable to reuse any other existing ontology for our reference data. Some ontologies for creative work were looked at but they were either not compliant to our 4D requirements, or too detailed in the areas we had no plans to address. (4BOF)
5. Ontology pattern development. Native database data-models were described by ontology patterns using IIP methodology used for ISO 15926 industrial data (as implemented in .15926 Editor). (4BOG)
Each database model was described by two patterns (for title import and for staff activity import). Pattern construction also included basic patterns for data visualization and some low-level patterns used as building blocks for import patterns. (4BOH)
6. Database adapter coding. Pattern-based adapter code for .16926 Platform was developed to query, parse and transform selected databases, using both XML dumps available to us and remote APIs. (4BOI)
7. Actual data processing. Select data from both native databases were processed and transformed to RDF following the single ontology model. (4BOJ)
At this point day-long hackathon activity was interrupted. Several problems were identified in the mapping engine of the .15926 Editor, which led to code and pattern changes during the next two days. In the end RDF data sources were finally created and verified to be compliant to the ontology model as initially developed. (4BOK)
Two more tasks remain unfinished from the original hackathon schedule and will be addressed to some extent by the .15926 team during the development of the next release of .15926 Editor: (4BOL)
- data source comparison and data merge (requires analysis of product title variability); (4BOM)
- Linked Data page configuration and web access to merged data for visual review and comparison. (4BON)
Initial data, patterns and adapter code are available in the project repository, anyone is welcome to test our environment and repeat the work described above, changing, augmenting or completing it. We'll be glad to answer questions and help with deeper understanding of our technologies and results. (4BOO)
Reference data for Anime and Manga: Semantic Linking and Publishing of Diverse Data-Sets (48F6)
Event starts 29th of March 2014 12:00 MSK / 8:00 UTC / 00:00 PST "virtureal" (Moscow in person in TechInvestLab office, all over the world via mikogo.com session 379-614-845). (48F9)
Ontology-based integration of engineering data should be transparent for engineers and to logicians. What does it mean? To understand simultaneously plant models, equipment, P&ID diagrams, and classes, properties and inverse properties, inheritance. (48FA)
- Few people know both. Even fewer among them use the same terms for objects they probable share. (48FB)
- Engineering datasets come from industrial proprietary CAD systems, ontology tools are open-source university apps. (48FC)
- Engineering data is guarded behind corporate firewall, ontology research in public. (48FD)
We need non-industrial example of engineering ontology work to test and publicize approaches, method and tools. What can it be? "Anime and Manga production as engineering"! (48FE)
- Studio, distribution and fandom as Enabling system and inside-anime world as System-of-interest complexly related. (48FF)
- Derivative works as Product lines and Custom-tuned models: what is our typical system? (48FG)
- ?omplex data with high variety: (48FH)
a) Reference Data Library (48FK)
Anime and manga reference data library creation: ontology of creative works on the base of engineering ontology brainstorm. (48FL)
- what is anime and manga reference data (ontology)? (48FM)
- what reference data are available to reuse purposes? (48FN)
We hold firm ontology commitments, grounded in ISO 15926 4D extensional ontology (self-education reading sequence: http://levenchuk.com/2012/10/01/iso-15926-self-education-sequence/). (48FO)
We want to be a part of federated Reference Data Libraries (RDL). (48FP)
We prefer to link to PCA reference data wherever possible: (http://rds.posccaesar.org/downloads/PCA-RDL.owl.zip or http://posccaesar.org/endpoint/sparql). (48FQ)
We will try to develop domain ontology (classes and templates n-ary relations) rich enough to map a sensible selection of anime-manga online resources. (48FR)
We will evaluate reuse of other ontologies (look at BBC Programmes ontology http://www.bbc.co.uk/ontologies/programmes/2009-09-07.shtml, Schema.org for TV and Radio markup http://blog.schema.org/2013/12/schemaorg-for-tv-and-radio-markup.html) (48FS)
Results: published anime-manga domain RDL (OWL file and probable SPARQL endpoint) (48FT)
b) Semantization of Datasets (48FU)
Ontology (reference data library) pattern-based mapping of anime and manga Data to semantic format (48FV)
We will develop patterns for anime-manga semantic data structures (bases on classes and templates of domain RDL). (48FW)
We will program adapters to access selected datasets and map native data model to developed patterns. (48FX)
We will run adapters and create sematic data from anime-manga sources in ISO 15926-8 RDF. (48FY)
Datasets: (48FZ)
- http://anidb.net (API: http://wiki.anidb.net/w/API) (48G0)
- http://myanimelist.net (API: http://myanimelist.net/modules.php?go=api) (48G1)
- IMDB (how to get it: http://blog.teamtreehouse.com/coding-a-dynamic-imdb-webapp-using-my-movie-api) (48G2)
- Wikipedia (48G3)
- example of XML page: http://cdn.animenewsnetwork.com/encyclopedia/api.xml?anime=3750 (49AI)
- Documentation for .15926 Editor patterns and mapping: http://techinvestlab.ru/V4 (48G4)
Example of markup for TVSeries for Schema.org -- http://schema.org/TVSeries (49AJ)
Results: patterns, adaptors to .15926 Editor v1.5alfa. RDL in ISO 15926-8 format (.owl file and SPARQL endpoint). (48G5)
c) The Linked Data (48G6)
Dereference (publishing) of anime and manga reference data library on the web (48G7)
We will attempt to use Flask webframework (http://flask.pocoo.org/) to publish our semantic data in human readable form. (48G8)
We will use the same patterns used for mapping to guide human representation. (48G9)
We will attempt to enhance Linked Data pages with more semantic mark-up (schema.org?). (48GA)
Result: published human readable Linked Data pages for anime-manga semantic data. (48GB)
Whom do we need? (48GC)
- Otaku (anime and manga fans) to understand data and engineers to build ontology. (48GD)
- Data modelers to develop patterns. (48GE)
- Python programmers to access native APIs of online resources and encode mappings. (48GF)
- Web designers to develop concept pages. (48GG)
- Semantic markup (schema.org anyone?) specialists to develop and embed semantic markup. (48GH)
Tools: (48GI)
- documentation: http://techinvestlab.ru/15926EditorDocumentation (48GJ)
- .15926 Editor 1.5alpha version for Reference Data for Anime and Manga Hackathon: http://techinvestlab.ru/files/15alpha/dot15926Editor15alpha.rar (49N2)
Public code repository for the Reference Data for Anime and Manga Hackathon: (49AK)
ISO 15926 self-education sequence: http://levenchuk.com/2012/10/01/iso-15926-self-education-sequence/ (48GK)
It is expected that in this project will be participating many of the team members, that provide success of last year ISO15926-related hackathon (see results of last year hackathon http://ontolog.cim3.net/forum/ontology-summit/2013-04/msg00038.html). (48GL)
Contacts: (48GM)
- Team lead: VictorAgroskin, vic5784 at gmail.com (48GN)