[ontology-summit] Reference data for Anime and Manga - Report for Hackat

To: Ontology Summit 2013 discussion <ontology-summit@xxxxxxxxxxxxxxxx>
From: Victor Agroskin <vic5784@xxxxxxxxx>
Date: Wed, 2 Apr 2014 02:55:04 +0400
Message-id: <CAJbbDjREwHe0_+PRUcO7xDO_RMhfYZWA-PkfD3DnojpZy6Yv4w@xxxxxxxxxxxxxx>
Reference data for Anime and Manga Hackathon took place on the 29th of
March 2014 12:00 – 20:00 MSK (virtual and real session) with
subsequent activities during the next two days.    (01)

Real session gathered 3 participants in Moscow, 6 more people from 4
countries connected online. Working language of hackathon was Russian,
with occasional use of English when international participants
connected to the session.    (02)

Implementation work was carried out in the environment of .15926
Editor (specially tailored version downloadable from
http://techinvestlab.ru/files/15alpha/dot15926Editor15alpha.rar ).    (03)

Some initial data sets, developed reference data and patterns, adapter
code and project results are published in the open project repository
http://github.com/ailev/anird .    (04)

During an event the following tasks were performed:    (05)

1. Native database analysis. Two online data sources were identified
for the work: http://www.animenewsnetwork.com/ and http://anidb.net .
Various database access methods were identified and tested, resulting
in XML database exports for local access and API specifications for
remote access.    (06)

2. Ontology analysis and modeling. As planned we were focusing on the
4D ontology analysis prescribed by the ISO 15926 modeling methodology.
Just two aspect of the vast anime and manga domain were addressed  –
alternative identifications of creative products and work division
between staff members of product teams.    (07)

For alternative identifications Classified Identification model and
corresponding ISO 15926 template was selected. Language identification
in the models was omitted because of the lack of time.    (08)

For work division complex Activity model was developed, describing
involvement of creative product itself and participation of
individuals in classified staff activities.    (09)

3. Identification schema development. A namespace for semantic data
was selected, and UUID-based non-human readable URI schema was
decided, to avoid language identification and text transformations
required for human-readable URIs.    (010)

4. Reference data development. Initial set of classes was developed
for the ontology model: classes to classify creative work, team
members, staff activities and basic types of identification
(identification by name and by number in the context of corresponding
databases). Additional classes for identification types and for
activity classifiers were discovered in the data sources and created
during the data import.    (011)

Set of initial templates from ISO 15926-8 was selected for modeling as
the most generic template set available.    (012)

We were unable to reuse any other existing ontology for our reference
data. Some ontologies for creative work were looked at but they were
either not compliant to our 4D requirements, or too detailed in the
areas we had no plans to address.    (013)

5. Ontology pattern development. Native database data-models were
described by ontology patterns using IIP methodology used for ISO
15926 industrial data (as implemented in .15926 Editor).    (014)

Each database model was described by two patterns (for title import
and for staff activity import). Pattern construction also included
basic patterns for data visualization and some low-level patterns used
as building blocks for import patterns.    (015)

6. Database adapter coding. Pattern-based adapter code for .16926
Platform was developed to query, parse and transform selected
databases, using both XML dumps available to us and remote APIs.    (016)

7. Actual data processing. Select data from both native databases were
processed and transformed to RDF following the single ontology model.    (017)

At this point day-long hackathon activity was interrupted. Several
problems were identified in the mapping engine of the .15926 Editor,
which led to code and pattern changes during the next two days. In the
end RDF data sources were finally created and verified to be compliant
to the ontology model as initially developed.    (018)

Two more tasks remain unfinished from the original hackathon schedule
and will be addressed to some extent by the .15926 team during the
development of the next release of .15926 Editor:
   - data source comparison and data merge (requires analysis of
product title variability);
   - Linked Data page configuration and web access to merged data for
visual review and comparison.    (019)

Initial data, patterns and adapter code are available in the project
repository, anyone is welcome to test our environment and repeat the
work described above, changing, augmenting or completing it. We'll be
glad to answer questions and help with deeper understanding of our
technologies and results.    (020)

Victor Agroskin
TechInvestLab.ru    (021)

