What do we need to get
SemWeb working for Industry.
 
1) An examination of methods
for certifying that the user semantics for terms matches the definition of the
term. Data exchange experience shows that the user view of what terms means is
always local to a particular business culture, which may be confined to a
single department. Standard example - how many man-hours in a man year? When I
talk to the pay department, there should be about 2,200; when I talk to my line
manager, there should be 1,650, and to my project manager, 1,500. And, btw,
there are 10 months in an EU year.
 
2) In particular domains, an
agreed set of certification methods (following from 1) - something like an ISO
9000 audit. The rigour of any particular method will be a trade-off between
audit cost and risk. In pizza sales, there is unlikely to be any audit, whereas
if we were to trade aircraft parts openly, a very high level of conformance
checking is needed. (I looked at certification requirements in a paper for the
SIMDAT project on cloud computing and SOAs - and this topic appears to be one
in which investments are being made.)
 
3) A standard for
communicating certification conformance. There is not much point in a business
investing in the Semweb to do business electronically if, before you can do
business, the commercial department have to do a manual check whether the
business is ISO 9000 certified. (Also considered in a SIMDAT paper).
 
4) A trust infrastructure to
support assertions about certification. This can probably draw on work on
security trust infrastructures (e.g. the TRUSTCOM project).
 
5) The development of
methods, methodologies and criteria for constructing ontologies, and, in
particular for identifying the terms in an ontology (cf Chris Partridge's work
- is it the last word on the subject?). It is my view that the terms needed for
a business ontology are precisely those that apply at decision points in
business processes, and which parametrise the process or select between
alternative processes. User domain ontologies are only relevant to the extent
that businesses interoperate, and upper ontologies are therefore relevant only
as guides to constructing user domain ontologies where the scope of
interoperation is not known in advance. Most of what passes for a taxonomy is a
set of heuristics to guide the user to find the right terms, and is not
relevant to the user domain ontology (but see 6)
 
6) The development of
heuristics to guide the users to the right terms - this is primarily a human
factors study. I would expect most user domain ontologies to be supported by
multiple heuristic taxonomies to match the cultural habits of particular groups
of users.
 
7) The development of domain
standards against which business can be certified. It seems likely that a user
domain ontology developed without considering 1 to 5 above is likely to fail or
be replaced relatively rapidly. I conventional technologies, the Oil and Gas
area (ISO 15926 I think) and some areas of industrial products (ISO 10303
series) provide examples, with the CAx-IF providing a certification like
function for design geometry software.
 
8) The development of
persistence criteria for assertions, and a way of communicating them. It takes
a finite time to assemble and process a set of assertions. We know that the
assertions business make changes over time and therefore we need assurance that
the assertions we rely on for a business transaction remain valid, at least for
the length of the transaction. This could be analogous to record locking in
conventional database systems, where, for example, the person buying the last
ticket on an aeroplane takes a lock on the record until they pay or decline,
preventing anyone else buying the ticket, and ensuring only one person can buy
it. There are many more cases than simple transaction locking.
 
9) Methods for detecting and
dealing with viral assertions - i.e. false assertions inserted into a trusted
source. The problem is not simply to correct the assertions, but to propagate a
warning to anyone who may have used the assertion (and may have cached the
result).