In case someone is interested, this is how we have set up our system
1. Everything is an entity - including relationships.
2. An entity has no name, only an auto-generated unique numeric ID that is meaningless outside of the data-store.
3. An entity is meaningless if it does not have relations.
4. The naming of any entity involves a relation to a language entity and a script/syntactic role data element.
5. All relations are basically represented as triplets of entities: source, relation, value.
6. Each relationship triplet has also an optional reference to a source entity as well as start and stop dates for which the relation is active.
7. Entities inherit properties from ‘hyponymy’ (class) and ‘part of’ relations.
8. Entities (typically classes) have a set of relations that allow composition of a structure (mandatory/optional values that may be inherited as value (fixed for children) or as parameter (children have a value))
9. There is a special SYNTRANS relation defined that allows to link synonymous compositions to be linked. Literal translations between different languages are considered SYNTRANS relations.
10. There is a category of ‘syntactical’ relations defined with which ‘head’ ‘modifier’ compositions can be created
11. There is a category of ‘semantic’ relations defined that can be dynamically expanded (they are entities, and they fall under the same SYNTRANS equivalency processing). These relations all follow the construction : (preposition)-> (entity) *->(postposition)* *=optional
With that you can store/map/search just about anything. I am working on the formulation of proper English and the corresponding proper German presentation of any semantic network identified. Proof of concept data is mapped in 64 or so other languages, including Arabic, Hebrew, Cyrillic, Greek and a few other scripts, a number of coding systems, chemical formulas …
Due to its nature you can identify a partial concept with a phrase and the system will offer ‘known’ related concepts. Once you pick one it supplements known information from knowledge and allows following links. All presented information is generated from the underlying semantic network that represents the concept, in the language that the browser it set to.
The system is able to circumscribe terms based on the composition of the concept.
We have no naming issues of relationships, just call them what they are, that gets close enough for refinement.
The decision if there is a need to separate noun/adjectival/verbal forms of a concept is still out. There seem to be rules that exist between them:
(adj1) := (pertaining to) (noun)
(verb) := (making of)(noun)
(adj2) := (in process of) (verb)
Pertaining, making, process… are examples, not sure yet if these develop into categories. We have these relations currently as separate entities with relations to each other but could also develop a procedure to present the variations from a single concept.
There is an interesting meta data of which concepts do or do not have various syntactical forms.
We are working with about 3 million triplets. Yes there are circular references which cause headaches, and there is definitely the problem of getting back too much information.
Need to have some form of transactional system + arbitration to let the general public on this.
IMPORTANT FYI: the meaning of a concept is located in the intersection between two languages!
In other words how someone would translate an English term into another Language defines what the translator meant.