Role and Use of Ontologies in the Open Semantic Framework

Ontologies are to the Open Semantic Framework what humans were to the Mechanical Turk. The hidden human in the Mechanical Turk was orchestrating all and every chess move. However, to the observers, the automated chess machine was looking just like it: a new kind of intelligent machine. We were in 1770.

Ontologies plays exactly the same role for the Open Semantic Framework (OSF): they orchestrate all and every moves for all the pieces within OSF. They are what instructs structWSF, the Semantic Components, conStruct, and all other derivate pieces of user interfaces how to behave.

In this (lengthy) blog post, I will present the main ontologies that have an impact on different parts of OSF. We will see how different ontology classes and properties, and how the description of the records indexed in the system, can impact the behaviors of OSF.

In addition to this post, Mike has also published a blog post today that overviews the overall OSF ontology modularization and architecture.

Continue reading “Role and Use of Ontologies in the Open Semantic Framework”

What is an Ontology?

An ontology is the definition of a vocabulary, and the rules for combining its terms, used to describe things that needs to be communicated.

This is yet another tentative definition of what is an ontology applied for the semantic web. Before explaining that definition, I would like to continue by stating what I think is the main purpose of an ontology:

An ontology as for main purpose to communicate coherent and consistent information.

Different Kinds of Ontologies

Over the years, I tended to use the word “vocabulary,” along with the word “ontology,” in different blog posts and technical documents. However, the usage of each word may not always have been clear. Is an vocabulary an ontology? Is an ontology a vocabulary? Are these concepts synonymous? There is an important distinction to make: an ontology can be a vocabulary, but an ontology is much more than a simple vocabulary.

Ontologies can describe all kind of well-known knowledge representation structures, some simple, and others much more complex. Here is a small list of some of them:

  • lexicons
  • taxonomies, or
  • higher order knowledge description frameworks

In its most basic usage, an ontology will define a vocabulary. It will simply define the terms (words) that belongs to that vocabulary without saying anything regarding the usage of these words.

Then, an ontology could evolve into a taxonomy by defined hierarchical relationships between the terms that compose the vocabulary.

Finally, it can evolve further to become a higher order knowledge description framework that defines more complex usage rules such as: usage restrictions, all kind of relationships between described entities, etc. New knowledge could also be inferred. It is why I say that an ontology is not strictly a simple vocabulary, but that it powerful knowledge description framework.

Knowledge Base

As we saw above, the main purpose of an ontology is to be able to create a coherent and consistent knowledge base of information that can get communicated. So an ontology is a kind of language that let you create knowledge bases that are consistent, coherent and where new knowledge can be inferred. That is done by following the usage rules defined in the ontology.

However, there is another important aspect to take into account: an ontology will describe knowledge that is coherent and consistent, but according to the own World view of that ontology. This means that two ontologies, describing the same domain of knowledge, could consistently and coherently describe information according to their view of the World.

Let’s take an example. Let’s say that two book stores developed their own ontologies to describe the books they sell. Both companies sell books. There are good chances that they will use the same vocabulary to describe their books. However, the usage rules between these terms may differ between the two book stores. One of the book stores could say that a proceeding is a specialized kind of book. But the other book store could say that no, a proceeding is not a specialized kind of book, but that it is a document just like a book. So, both would describe a proceeding as a document, but one would have different interpretation rules about what a book really is. As you see, both book stores use the same vocabulary to define their library of books, but they interpret their meaning differently. If the two stores would have to exchange information about books in the future, they won’t have many difficulties because they are probably sharing the same vocabulary, but the interpretation of that information may differ. The result of these potential differences in their interpretations may be where a book will be classified into the store; or how their customers could search for a specific book, using different filtering criterias; etc.

This is not different than what happens in our daily lives: is there a day in your life when you don’t hear people arguing about different point of views? It is exactly the same thing that happens here. We potentially all live and see and the exact same events, images, sound, etc.; but we may all have a different interpretation of these things.

Ontologies in the Open Semantic Framework?

Ontologies are so flexible that we choose to make ontologies the “brain” of the Open Semantic Framework.

We wanted to use the most flexible knowledge description framework that would enable us to integrate any possible information sources that have been describe using any existing kind of simple, or really complex, knowledge representation structures such as simple: lexicons, taxonomies, relational schemas, etc. By using ontologies as its central piece, OSF is a flexibly data integration framework that can consolidate information from various, heterogeneous, sources of information.

If we remember the definition we started with, ontologies are not just about describing terms and their relationships in a coherent and consistent way. The ultimate purpose is to communicate that information. It is what the structWSF part of the Open Semantic Framework does: it let any kind of system that have access to the Internet to send, receive and manipulate information in multiple formats from a series of web service endpoints.

More Reading

Finally, I would suggest you to read Mike’s Intrepid Guide to Ontologies to have a better understanding of where ontologies come from, how they works, what other formats exists, what are the different approaches to ontologies and what tools currently exists to work with ontologies.

UMBEL Blooms with New Colors

We are happy to announce the new, intermediary, UMBEL version 0.80. This is a major upgrade of the UMBEL ontology: both its vocabulary and its reference structure have been greatly enhanced, an upper structure called the SuperTypes has been added and everything got updated to OWL 2. You can read more about the overall changes on Mike’s blog post.

In this blog post I will focus on two topics: using some existing tools and frameworks to view and manage the reference concepts structure, and how one can use and leverage the coherency of the reference structure.

Navigating and Updating the Reference Structure

One thing that was lacking with the previous version of UMBEL was to have access to a user interface tool that would let you navigate and update the reference structure as you want. Because of the way the conceptual structure was created, it was hard for tools such as Protégé to load it because of all the individuals that were created (such as the SemSet individuals, etc.).

As stated in Mike’s blog post, we made significant changes to the UMBEL vocabulary, and how we instantiate the reference structure. Along with the OWL 2 upgrade, we made sure that the Protégé version 4.1 and the latest version of the OWLAPI could easily load both the UMBEL vocabulary and the reference structure.

Reasoning

One of the major additions to UMBEL v080 is the SuperTypes upper structure, an organizational layer above the UMBEL reference structure. We created these SuperTypes because we found that we could effectively cluster most UMBEL reference concepts into a small set of mostly distinct upper concepts (33 in fact, 29 of which are designed as disjoint).

This new SuperTypes structure helps us mine external sources of information by leveraging related concepts in the reference structure. Moreover, SuperTypes also help us perform easier, simpler, better and faster reasoning over the entire 21 K reference concepts structure.

Thus, SuperTypes provide a new tool to help determine if the UMBEL reference structure is consistent and coherent within itself. This is important, of course, to ensure that linkages between UMBEL and external ontologies is consistent and coherent as well.

So far, the entire reference concepts structure has been tested for its coherency according to the restrictions we defined at the level of the SuperTypes upper structure. Using different reasoners such as Pellet, Fact++ and Hermit (available by default with Protégé 4.1), we made sure that all the statements made between all the RefConcept classes and individuals, and all the statements made between these and the SuperTypes upper structure, are consistent within themselves. This method enabled us to find and fix some early assignment issues.

This new upper structure, along with its now consistent reference structure, helps provide confidence that statements based on UMBEL reference concepts are also consistent. And, all of this is made more testable by virtue of being able to use the OWL API and Protégé with its embedded reasoners.

How is Coherency Tested?

This is the core question. In fact, the more informative answer to this question will be part of a forthcoming blog post. But let’s start here.

The current way to check if the structure is coherent is by making sure that we don’t have an individual that belongs to two different SuperTypes that are stated to be disjoint. What we did with the SuperType upper structure is really simple: we categorized each and every RefConcept (using rdfs:subClassOf) under a SuperType. Most of the SuperTypes are disjoint: this means that if an individual is of rdf:type for two SuperTypes that are stated to be disjoint, then you will end-up with an incoherent structure because you are making a statement that is not permitted by the reference structure.

So, the way to check if your statements are coherent according to this structure, is to make your statements (right now, in terms of individual instantiation), and then to check using a reasoner such as Pellet. There is now a general testing structure to see if any ontology is coherent with respect to the UMBEL reference structure.

In the next blog post in this series, I will tell you how to use exactly the same method for coherency testing, but now for testing if linkages between external ontologies and the UMBEL reference structure are consistent. In that case, you will make the class-to-class assertions you want, and then you will instantiate individuals of these classes, then run the reasoner. Then, the reasoner will tell you if your ontology is still consistent according to the structure and the new statements you created.

Next Step

In parallel with these tutorials, we are also working hard on the next version of UMBEL. As outlined in the Next Changes section of the new UMBEL website, the next step is to release UMBEL v1.0, with a set of new features, before Christmas.

Different World Views (TBox) for the same Structs (ABox)

Mike continues his series of blog posts that talks about the distinction between ABoxes (the assertions box; the data instances box) and TBoxes (the terminologies box; the data schemas box). Mike suggests to people to make a distinction between the data instances (individuals) that belongs to the ABox, and the vocabularies (schemas, ontologies; whatever how you call these formal specifications of conceptualizations) that belongs to the TBox.

I wanted to hammer an important point that emerged in our recent discussions about these specific questions: the TBox defines the language used to describe different kind of things and the ABox is the actual description of these things. However, there is an important distinction to make here: there is a difference between using some properties to describe a thing and understanding the meaning of the use of these properties to describe these things.

Let’s take the use-case of two systems that exchange data. The data instances that will be transmitted between the two systems will be exactly the same: their ABox description will be the same; they will use the same properties and the same values to describe the same things. However, nothing tells us how each of these properties will be processed, understood and managed by these two systems. Each system has its own Worldview. This mean that their TBox (the meaning of classes and properties used to describe data instances) will probably be different, and so, interpreted and handled differently.

I think the fact that two systems may process the same information differently is the lesser evil. This is no different than how humans communicate. Different people have different Worldviews that will dictate how they will see and reason over things. One person can see a book and think at it as a piece of art where another person can say: “Great! I finally have something to start that damned fire!”. The description of the thing (the book) didn’t change; but its meaning changed from one person to another. Exactly the same thing applies to systems that are exchanging data instances.

This is really important since considerations of the TBox (how data instances are interpreted) shouldn’t be bound to the considerations of the ABox (the actual data instances that are transmitted). Otherwise no systems will ever be able to exchange data considering that they will most than likely always share different Worldviews for the same data (they will handle and reason over data instances differently).

I think this is a really important thing to keep in mind going forward because there won’t ever be a single set of ontologies to describe everything on the semantic web. There will be multiple ontologies that will describe the same things, and there will be an endless number of versions of these ontologies (there are already many). And finally, the cherry on the cake, how these ontologies are handled and implemented in systems is different!

But take care here; this doesn’t mean that we can’t exchange meaningful data between different systems. This only means that different Worldviews exist, which means that care should also be given to not mix data with the interpretation of concepts.  This is yet another  reason why we have to split apart concerns between the ABoxes and the TBoxes.

Exploding DBpedia’s Domain using UMBEL

A couple of challenges I have found with DBpedia is that it is hard for a system to interact with the dataset and it is hard to figure out how to interpret information instantiated in it. It is hard to know what properties are used to describe individuals; and hard to know what the classes refer to. It is also hard for standalone and agent software to understand the nature of the individuals that are instantiated by DBpedia because the classes they belong to are generally unknown or poorly defined.

In the following blog post I suggest to use a method known as “exploding the domain” to try to overcome these difficulties of using and understanding DBpedia. This adds still further usefulness to DBpedia’s considerable value. This demonstration is based on the UMBEL subject concept structure.

As I will demonstrate below, this method consists of contextualizing classes in a coherent framework to explode their domains. By exploding the domain of a class, we link it to other classes that are defined by external ontologies. By exploding the domain of a class by linking it to externally defined classes, we also help standalone and agent software to understand the meaning for that class (at least if they understand the meaning of the classes that have been linked to it). Note that we are able to explode the domains by linking classes using only three properties: rdfs:subClassOf, owl:equivalentClass and umbel:isAligned.

First of all, let me give some background information about how DBpedia individuals and UMBEL named entities have been created, and how both datasets have been linked together.

How DBpedia individuals are instantiated

DBpedia is a dataset that is based on the well known Wikipedia encyclopedia. Basically DBpedia creates one individual for each Wikipedia page. Most of the individuals that are instantiated in this way are what we call a “named entity” in UMBEL’s parlance.

But to be instantiated, an individual has to belong to a class. DBpedia chooses to use Yago‘s classification system (that is based on WordNet) to instantiate those DBpedia individuals. This means that all DBpedia individuals belong to at least (theoretically) one Yago class. This means that all DBpedia individuals are instances of Yago classes (and in some rarer cases, they are also instances of classes defined in external ontologies).

How UMBEL named entities have been created

For its part, UMBEL’s named entities dictionaries come from different data sources. Currently, most all public UMBEL named entities also come from Yago (example: Aristotle), but many also come from the DBTune dataset (example: Pete Baron) or others. (UMBEL’s design allows more named entities to be plugged into the system as additional dictionaries at will.)

However, unlike DBpedia, we do not use Yago’s classification system to instantiate these named entities. And unlike Yago, we do not use the WordNet classes to instantiate the named entities either.

The current UMBEL subject concept structure is based on OpenCyc. This means that the relations between the classes that instantiate the UMBEL named entities come from the Cyc knowledge base.

So while we use Yago’s named entities (from Wikipedia) as a starting basis, we instantiate them using the UMBEL subject concept classes instead of the WordNet classes. So, basically, we have switched the WordNet conceptual framework for the UMBEL (or OpenCyc) one.

But, how did we create these UMBEL named entities, instantiated using UMBEL subject concept classes and based on Yago? Here is the linkage path:

Yago classes –> WordNet synsets <– Cyc collections <– OpenCyc classes <– UMBEL subject concept classes

Et voilà !

How UMBEL named entities are linked to DBpedia individuals

OK, so now how do we link UMBEL named entities to DBpedia individuals? It is simple. Remember that DBpedia individuals have been created from Wikipedia pages. Also remember that Yago individuals come from the same Wikipedia pages. We can then make the link between the individuals from DBpedia and the individuals from Yago based on Wikipedia URLs.

Exactly the same logic applies for linking DBpedia individuals to UMBEL named entities.

The end result of this linkage is that we have UMBEL named entities that are the same as DBpedia individuals. The difference is that the UMBEL named entities are now instances of UMBEL subject concepts: a totally different conceptual structure.

Remember that these named entities are contextualized in a coherent conceptual framework. And this characteristic means a lot for what is yet to come.

Web services to search and visualize these named entities

We created two new web services on the UMBEL web services home page (the user interface to these web services; the endpoints will be released later) to help people interact with these named entities:

  1. The “Search Named Entities Dictionaries” web service
  2. The “Named Entity Detailed Report” web service

The first web service lets you search amongst all publicly available UMBEL named entities dictionaries.

The second web service lets you visualize detailed information about any named entity.

This information page shows you the full scope of information about a named entity: which class it belongs to (subject concept classes as well as external classes); which other individuals, from other datasets, are identical to them; examples of web services that get queried with information about this named entity; etc.

Exploding the domain of Plato

Now that this background information has been established, let’s take a look at what is happening when we link DBpedia individuals to UMBEL named entities: how that actually works to explode the domain.

Let’s take the example of dbpedia:Plato. This individual is currently defined in DBpedia as:

  • yago:AncientGreekPhysicists
  • yago:PhilosophersOfLanguage
  • yago:PhilosophersOfLaw
  • yago:PoliticalPhilosophers
  • yago:AncientGreekVegetarians
  • yago:AcademicPhilosophers
  • yago:Philosopher110423589

Fine, but what does this mean? What if my system doesn’t know any of these classes? We, as humans, know that Plato is a person, a human being. But it is totally another story for a software agent.

What we want to do here is to explode Plato’s domain to try to find a meaning that my software system can understand.

In UMBEL, the “Plato” named entity is defined as an umbel:Person and an umbel:Intellectual. If you take a look at the detailed report for these two subject concepts, you will be able to see in the section “Broader Subject Concepts” the super-classes that Plato belongs to. So we know that Plato is a social being, a homo sapiens, etc. This is basically what happens with Yago too, except that the conceptual structure (the way to describe the entity) differs.

However one thing that is happening is that we exploded Plato’s domain with classes defined in external ontologies. As you can notice in the sections “Broader External Classes” and “Equivalent External Classes”, Plato is also a: foaf:Person, a foaf:Agent and a cyc:Person.

This means that if my software agent doesn’t know what a “yago:Person100007846” means; it alternatively may know what a foaf:Person or a foaf:Agent means. And if it knows what it means, then it will be able to properly manipulate it: to display it in a special way; to refer to it as a person; so to do whatever it can with information about a “person”.

This exploding the domain works because these external ontologies classes have been referentially linked to a coherent conceptual structure.

The inference path

Let’s take a look at the fundamental reasons why the scenario above works.

First, you, and your system, have to trust the UMBEL named entities dictionaries and the UMBEL subject concept structure to perform the inference that I will explain below. If you and your system trust these linkage assertions, then you will be able to act according to the knowledge that has been inferred.

DBpedia individuals are linked to UMBEL named entities using the owl:sameAs property. This means that DBpedia individual A is identical (same semantic meaning) as the UMBEL named entity B. They both refer to the same individual.

This means that if B is defined as being of rdf:type sc:Person (“sc” stands for Subject Concept), then we can infer that A is defined as being of rdf:type sc:Person too.

If sc:Person is owl:equivalentClass with foaf:Person, we can infer that umbel:B is a foaf:Person, so that dbpedia:A is a foaf:Person too!

We can see similar examples for exploding the domains:

Exploring ConceptualWorks, PeriodicalSeries and NewspaperSeries

In my “UMBEL as a Coherent Framework to Support Ontology Development” blog post from last week, I showed how UMBEL subject concepts acted to create context for linked classes defined in external ontologies. Since DBpedia individuals are instances of classes, and that some of these classes are linked to UMBEL, these subject concept classes also give context to those individuals!

As some examples, go ahead and take a look at the “Named Entities for …” section of these detailed report pages:

The partial list of named entities that are returned by the detailed report viewer shows named entities that mainly come form Wikipedia (so that have links to DBpedia). These subject concepts gives a coherent context to those DBpedia individuals.

You should quickly notice, for example, that dbpedia:Kansas_City_Times is not only a sc:NewspaperSeries, a sc:PeriodicalSeries and a sc:ConceptualWork. You also notice that it is a frbr:Work, a bibo:Periodical and a bibo:Newspaper.

The context created by these UMBEL subject concepts gives not only new power to linked external classes, but also to their instances, such as these DBpedia individuals!

Conclusion

Contexts created by UMBEL subject concepts emerge by the power of linkage that exists between all the subject concepts, and the linkage between those subject concepts classes with classes defined in external ontologies. These contexts are consistent because of the coherence of the structure that is powered by OpenCyc (Cyc).

So far, most Linked Data has been about the “things” or named entities of the world, organized according to either Wikipedia categories or WordNet. These structures may have some internal structural consistency, but were never designed to play the role as a coherent reference framework. The coherence of UMBEL (based on the coherence of Cyc) is a powerful contextual lever for bringing order to this chaos.

Once information gets linked to a coherent framework such as UMBEL, things start to happen; powerful things. And, with each new linkage and relation to additional external ontologies, that power increases exponentially.

I wrote this blog post to show again the power of exploding the domain using DBpedia as an example, and how UMBEL can help to use and to leverage such big datasets.