What do we need to get SemWeb working for
Industry.
1) An examination of methods for certifying that
the user semantics for terms matches the definition of the term. Data exchange
experience shows that the user view of what terms means is always local to a
particular business culture, which may be confined to a single department.
Standard example - how many man-hours in a man year? When I talk to the pay
department, there should be about 2,200; when I talk to my line manager, there
should be 1,650, and to my project manager, 1,500. And, btw, there are 10 months
in an EU year.
2) In particular domains, an agreed set of
certification methods (following from 1) - something like an ISO 9000 audit. The
rigour of any particular method will be a trade-off between audit cost and risk.
In pizza sales, there is unlikely to be any audit, whereas if we were to trade
aircraft parts openly, a very high level of conformance checking is needed. (I
looked at certification requirements in a paper for the SIMDAT project on cloud
computing and SOAs - and this topic appears to be one in which investments are
being made.)
3) A standard for communicating certification
conformance. There is not much point in a business investing in the Semweb to do
business electronically if, before you can do business, the commercial
department have to do a manual check whether the business is ISO 9000 certified.
(Also considered in a SIMDAT paper).
4) A trust infrastructure to support assertions
about certification. This can probably draw on work on security trust
infrastructures (e.g. the TRUSTCOM project).
5) The development of methods, methodologies and
criteria for constructing ontologies, and, in particular for identifying the
terms in an ontology (cf Chris Partridge's work - is it the last word on the
subject?). It is my view that the terms needed for a business ontology are
precisely those that apply at decision points in business processes, and which
parametrise the process or select between alternative processes. User domain
ontologies are only relevant to the extent that businesses interoperate, and
upper ontologies are therefore relevant only as guides to constructing user
domain ontologies where the scope of interoperation is not known in advance.
Most of what passes for a taxonomy is a set of heuristics to guide the user to
find the right terms, and is not relevant to the user domain ontology (but see
6)
6) The development of heuristics to guide the
users to the right terms - this is primarily a human factors study. I would
expect most user domain ontologies to be supported by multiple heuristic
taxonomies to match the cultural habits of particular groups of users.
7) The development of domain standards against
which business can be certified. It seems likely that a user domain ontology
developed without considering 1 to 5 above is likely to fail or be replaced
relatively rapidly. I conventional technologies, the Oil and Gas area (ISO 15926
I think) and some areas of industrial products (ISO 10303 series) provide
examples, with the CAx-IF providing a certification like function for design
geometry software.
8) The development of persistence criteria for
assertions, and a way of communicating them. It takes a finite time to assemble
and process a set of assertions. We know that the assertions business make
changes over time and therefore we need assurance that the assertions we rely on
for a business transaction remain valid, at least for the length of the
transaction. This could be analogous to record locking in conventional database
systems, where, for example, the person buying the last ticket on an aeroplane
takes a lock on the record until they pay or decline, preventing anyone else
buying the ticket, and ensuring only one person can buy it. There are many more
cases than simple transaction locking.
9) Methods for detecting and dealing with viral
assertions - i.e. false assertions inserted into a trusted source. The problem
is not simply to correct the assertions, but to propagate a warning to anyone
who may have used the assertion (and may have cached the
result).