Mike Bergman released the first draft of its UMBEL ontology. Me and some other people helped him to come up with that new ontology.What is UMBEL? UMBEL is a lightweight subject reference structure. People can see it as a pool of subjects. Subjects are related together at a synonymy level; so, subjects of related meaning will be binded together.
The objectives of this new ontology are:
- A reference umbrella subject binding ontology, with its own pool of high-level binding subjects
- Lightweight mechanisms for binding subject-specific ontologies to this structure
- A standard listing of subjects that can be refererenced by resources described by other ontologies (e.g., dc:subject)
- Provision of a registration and look-up service for finding appropriate subject ontologies
- Identification of existing data sets for high-level subject extraction
- Codification of high-level subject structure extraction techniques
- Identification and collation of tools to work with this subject structure, and
- A public Web site for related information, collaboration and project coordination.
Given these objectives, I see a couple of main applications where such ontology could be used:
- Helping systems to find data sources for a given ontology. UMBEL is much more than a subject structure. In fact, UMBEL will bind subjects, with related ontologies and data sources for these related ontologies. So, for a given subject, people will be able to find related ontologies, and then related data sources.
- Acting as a subject reference backbone. So, it could be use by people to links resources, using dc:subject, to its subject resource (the UMBEL subject proxy resource), etc.
- Could be used by user interface to help them with handling subjects (keywords) references to find related ontologies (that have the power to describe these subjects).
- Eventually it should be used by PingtheSemanticWeb to bind pinged data to the subject reference structure.
- And probably many others.
Creation of the Ontology
A procedure will be created to automatically generate the ontology. The gross idea is to reuse existing knowledge bases to create the set of subjects, and their relationship, that will create the ontology. So, the idea is to come up with a representative, not too general, not too specialized, set of subjects. For that, we will play with knowledge bases such as WordNet, Wikipedia, Dmoz, etc. We will try to find out how we could prune unnecessary subjects out of them, how we could create such a subject reference framework by taking a look at the intersection of each data set, etc. The procedure is not yet developed, but the first experiments will look like that.
As explained in the draft:
The acceptance of the actual subjects and their structure is one key to the acceptance — and thus use and usefulness — of the UMBEL ontology. (The other key is simplicity and ease-of-use or tools.) A suitable subject structure must be adaptable and self-defining. It should reflect expressions of actual social usage and practice, which of course changes over time as knowledge increases and technologies evolve.
A premise of the UMBEL project is that suitable subject content and structures already exist within widely embraced knowledge bases. A further premise is that the ongoing use of these popular knowledge bases will enable them to grow and evolve as societal needs
and practices grow and evolve.
The major starting point for the core subject pool is WordNet. It is universally accepted, has complete noun and class coverage, has an excellent set of synonyms, and has frequency statistics. It also has data regarding hierarchies and relationships useful to the UMBEL look-up reference structure, the ‘unofficial’ complement to the core ontology.
A second obvious foundation to building a subject structure is Wikipedia. Wikipedia’s topic coverage has been built entirely from the bottom up by 75,000 active contributors writing articles on nearly 1.8 million subjects in English alone, with versions in other
degrees of completeness for about 100 different languages. There is also a wealth of internal structure within Wikipedia’s templates.
These efforts suggest a starting formula for the UMBEL project of W + W + S + ? (for WordNet + Wikipedia + SKOS + other?). Other potential data sets with rich subject coverage include existing library classification systems, upper-level ontologies such as SUMO, Proton or DOLCE, the contributor-built Open Directory Project, subject ‘primitives’ in other languages such as Chinese, or the other sources listed in Appendix 2 – Candidate Subject Data Sets.
Though the choice of the contributing data sets from which the UMBEL subject structure is to be built will never be unanimous, using sources that have already been largely selected by large portions of the Web-using public will go a long ways to establishing authoritativeness. Moreover, since the subject structure is only intended as a lightweight reference — and not a complete closed-world definition — the UMBEL project is also setting realistic thresholds for acceptance.
If you are interested in such an ontology project, please join us on the mailing list of the ontology’s development group, ask questions, writes comments and suggestions.
Next step is to start creating a first version of the subject proxies.