#umbel – Frederick Giasson

New UMBEL 1.50 Ships With 20 Linked Ontologies

May 12, 2016November 17, 2016 Frederick Giasson

I am proud to announce the immediate release of UMBEL version 1.50. This is a major effort that took a year to release.

What is UMBEL?

Let’s start by explaining what is UMBEL for the ones that never encountered this project before. UMBEL stands for “Upper Mapping and Binding Exchange Layer“. It is a conceptual structure that is designed to help content interoperate between systems.

UMBEL is a coherent general structure of 34 000 reference concepts which provides a scaffolding to link and interoperate other datasets and domain vocabularies. The conceptual structure is organized in a structure of 31 mostly disjoint SuperType.

UMBEL is written in OWL 2 and SKOS.

Continue reading “New UMBEL 1.50 Ships With 20 Linked Ontologies” →

Major UMBEL Release: 1.10

September 9, 2014 Frederick Giasson

After more than 2 years, we are now finally releasing a new version of the UMBEL ontology and reference concept structure. One might think that we haven’t worked on the project all that time, but it is not strictly true.

We did improve the mapping to external vocabularies/ontologies, we worked much on linking Wikipedia pages to the UMBEL structure, but we haven’t had time to release a new version… until now!

For people new to the ontology, UMBEL is a general reference structure of about 28,000 reference concepts, which provides a scaffolding to link and interoperate other datasets and domain vocabularies. Its main purpose is to have a coherent conceptual structure that we can use to link and interoperate unrelated data sources. But it can also be used as a conceptual structure to be used to describe information like any other ontologies.

What is new with the ontology?

The major change in UMBEL is not the structure itself, but the piece of software used to generate it. In fact, the previous system we developed for generating UMBEL was about 7 years old. It was a bit clunky and really not that easy to work with.

Based on our prior experience with UMBEL, we choose to dump it and to create a brand new UMBEL reference structure generator. This new generator has been developed in Clojure and uses the latest version of the OWL API. It makes the management of the structure much simpler, which means that it will help in releasing new UMBEL version more regularly. We also have a suite of tools to analyze the structure and to pinpoint possible issues.

Other than that, we updated the Schema.org, DBpedia Ontology and Geonames Ontology mappings to UMBEL. This is a major effort undertaken by Mike for this new version. The mappings are composed of:

754 rdfs:subClassOf relationships between Schema.org classes and UMBEL reference concepts
688 rdfs:subClassOf relationships between DBpedia Ontology classes and UMBEL reference concepts
682 rdfs:subClassOf relationships between Geonames Ontology classes and UMBEL reference concepts

These new mappings will help manage data instances that use these external ontologies/schemas in a broader conceptual structure (which is UMBEL). This enables us to be able to reason over this external data using the UMBEL conceptual structure even if these external data sources didn’t originally use UMBEL to describe their data. That is one of the main features of UMBEL.

We also managed to add a few hundred UMBEL reference concepts. Most of them were added to create these new linkages with the external ontologies. Others have been added because they were improving the overall structure.

A few weeks back, we found an issue with the umbel:superClassOf assignations, which has also now been resolved in version 1.10.

In the previous versions of UMBEL, the preferred labels were not unique. There were a few hundred of the concepts that were having the same preferred labels. This was not an issue in itself, but this was not a best practice to create an ontology. We managed to remove all these non-distinct preferred labels and to make all of them unique.

We added a few skos:broader and skos:narrower relationships between some of the reference concepts. In the previous versions, all the relationships were skos:broaderTransitive and skos:narrowerTransitive properties only.

Finally we made sure that the entire UMBEL reference structure (Core + the Geo module) was absent of any inconsistencies and that it was satisfiable.

What is new with the portal and web services?

This new version of UMBEL also led us to create a few new features to the UMBEL website. The most apparent feature is the new External Linkage section that may appear at the top of a reference concept page (obviously, it will not appear if there are no external links for a given reference concept). This section shows you the linkage between the UMBEL reference concept and other external classes:

Another feature that you will notice on this screenshot is the Core blue tag at the right of the URI of the reference concept. This tag is used to tell you from where the reference concept is coming. Another tag that you may encounter is the green Geo tag, which tells you that the reference concept comes from the UMBEL Geo module. The same tags appear in the search resultsets:

What is next?

Because UMBEL is an ontology, by nature it will always evolve over time. Things change, and the way we see the World can always improve.

For the next version of UMBEL, we will analyze the entire UMBEL reference concept structure using different algorithms, heuristics and other techniques to analyze the conceptual structure and to find conceptual gaps in it. The goal of this analysis is to tighten the structure, to have a better conceptual hierarchy and a more fine-grained one.

Other things we want to do in other coming versions are to improve the Super Types structure of UMBEL. As you may know, many of the Super Types are non disjoint because some of the concepts belong to multiple Super Type classes. What we want to do here is to create new Super Types classes that are the intersection between two, or more, Super Types that will be used to categorize these concepts that belong to multiple Super Types. That way, we will end-up with a better classification of the UMBEL reference concepts from a Super Types standpoint.

Another thing we want to do related to the UMBEL web services is to update them such that you can query the linkage to the external ontologies. For now, you can see the linkage when querying the sub-classes and super-classes of a reference concept. But you cannot query the web services this way: give me all the sub-classes-of the http://schema.org/FireStation class, for example.

As you can see, the UMBEL ontology and web services will continue to evolve over time to enable new ways to leverage the conceptual structure and external data sources.

UMBEL: New Shortest Path Web Service & Tag Web Documents

July 30, 2014 Frederick Giasson

We just released a new UMBEL ontology graph analysis web service endpoint: the Shortest Path web service endpoint.

The Shortest Path Web service is used to get the shortest path between two UMBEL reference concepts by following the path of a transitive property. The concepts that belong to that path will be returned by the server.

This web service is similar to the degree web service endpoint but the actual path is shown. This web service is (marginally more useful) than degree. So if you don’t need to know the actual concepts that participate in the shortest path between two concepts, then you should be using the degree web service endpoint instead.

The graph created by the UMBEL reference concepts ontology is a mostly an directed acyclic graph (DAG). This means that a given pair of concepts is not necessarily linked via all the properties. In these cases, the shortest path returns an error message rather than the path concepts.

Intended Users

This new web service endpoint is intended for users that want to perform graph/network analysis tasks on the UMBEL web service endpoint.

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON or in EDN (Extensible Data Notation).

This endpoint will return a vector (so the order of the results is important) of concepts that participate into the shortest path. For each concept, its URI and preferred label are returned.

The Online Tool

We also provide an online shortest path tool that people can use to experience interacting with the web service.

The user first needs to select the two concepts for which he wants to find the shortest path between the two. Then he has to select the transitive property he want to use to find the path.

Once the user clicks the Get Shortest Path button, he will get list of concepts, and the order, that compose the path.

If no path exists between the two concepts for the selected property, an error message is displayed to the user.

Tagging Web Documents with the UMBEL Taggers

Another improvement included with this release is the enhancement of the UMBEL taggers¹². It is now possible to tag any document accessible on the Web. The only thing you have to do is to provide a URL where the tagger will find the document to download and tag.

The user interface for the taggers also was modified to expose this new functionality. You now have the choice to give a text or a URL as input to the endpoints:

New UMBEL Concept Noun Tagger Web Service & Other Improvements

July 21, 2014 Frederick Giasson

Last week, we released the UMBEL Concept Plain Tagger web service endpoint. Today we are releasing the UMBEL Concept Noun Tagger.

This noun tagger uses UMBEL reference concepts to tag an input text, and is based on the plain tagger, except as noted below.

The noun tagger uses the plain labels of the reference concepts as matches against the nouns of the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text except if you specify the usage of the stemmer. Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow.

Stemming Option

This web service endpoint does have a stemming option. If the option is specified, then the input text will be stemmed and the matches will be made against an index where all the preferred and alternative labels have been stemmed as well. Then once the matches occurs, the tagger will recompose the text such that unstemmed versions of the input text and the tagged reference concepts are presented to the user.

Depending on the use case. users may prefer turning on or off the stemming option on this web service endpoint.

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the noun tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

Other UMBEL Website Improvements

We also did some more improvements to the UMBEL website.

Search Autocompletion Mode

First, we created a new autocomplete option on the UMBEL Search web service endpoint. Often people know the concept they want to look at, but they don’t want to go to a search results page to select that concept. What they want is to get concept suggestions instantly based on the letters they are typing in a search box.

Such a feature requires a special kind of search which we call an “autocompletion search”. We added that special mode to the existing UMBEL search web service endpoint. Such a search query takes about 30ms to process. Most of that time is due to the latency of the network since the actual search function takes about 0.5 millisecond the complete.

To use that new mode, you only have to append /autocomplete to the base search web service endpoint URL.

Search Autocompletion Widget

Now that we have this new autocomplete mode for the Search endpoint, we also leveraged it to add autocompletion behavior on the top navigation search box on the UMBEL website.

Now, when you start typing characters in the top search box, you will get a list of possible reference concept matches based on the preferred labels of the concepts. If you select one of them, you will be redirected to their description page.

Tagged Concepts Within Concept Descriptions

Finally, we improved the quality of the concept description reading experience by linking concepts that were mentioned in the descriptions to their respective concept pages. You will now see hyperlinks in the concept descriptions that link to other concepts.

New UMBEL Web Services

June 30, 2014 Frederick Giasson

umbel_logo_260_160 I am happy to announce the immediate availability of a brand new UMBEL website and a new set of eight UMBEL web services.

UMBEL (Upper Mapping and Binding Exchange Layer) is a general reference structure of 28,000 concepts, which provides a scaffolding to link and interoperate other datasets and domain vocabularies. This project is now six years old.

I would recommend that your read Mike’s blog post about this new release if you want more background information about UMBEL and to have a better understanding of how it can help you integrate, manage, publish and reason over your data.

In this blog post, I will focus on the technical aspects of this new web site and the new set of web service endpoints.

Toward a Better Web Experience

The Web is changing fast. Techniques for developing web sites are constantly and quickly evolving. People uses all kind of devices with different sizes of screens to consume Web content. Websites are more and more responsive by their clever architecture design, and their simpler user interfaces. This is the kind of website we wanted to create for the new UMBEL website.

Clojure Web Service Endpoints at the Core

The core of the new UMBEL website are the new web services. As soon as you are performing a search, or looking at the description of a reference concept or a super type, your browser is making a series of asynchronous queries to the UMBEL web service endpoints.

The average query time is about 60 milliseconds for any of the web service query. This means that a web page is fully loaded within 300 to 500 milliseconds where most of the time is spent downloading the web filesÂ (the JavaScript, CSS, HTML and image files) and not querying the web service endpoints. Bearing in mind that the website currently run on a small server with a single core and 1.8G of RAM, these are really good performance figures.

We are initially releasing 8 web service endpoints (with more to follow). They have been created to help developers quickly start using the reference structure without having to download and deploy the entire structure on their own infrastructure. The 8 web services are:

All these web services are calculating the results at runtime. For example, if you want to find the degree between two reference concepts, then the degree is calculated at runtime. It is the same for all the web services that does inferencing like the Get narrower concepts or Get broader concepts web service endpoints.

What we did to get these excellent performance measures is to use Clojure as the programming language and framework to develop the new web service endpoints. Then we define the UMBEL structure as Clojure code.

Each web service endpoint is comprised of simple pure functions that perform calculations on the UMBEL graph of 28 000 nodes. None of the functions are more than 30 lines of code (per endpoint) which greatly simplifies their creation, debugging, maintenance and optimization. Then we use contributed libraries such as Ring and Compojure to manage the creation of the web service endpoints, and Clucy/Lucene for the search engine.

The web services can easily be scaled horizontally since everything is self contained in a single WAR file that can be deployed on new servers in a few clicks. Then the new servers can participate into a cluster of UMBEL web service servers.

Another advantage of using this technology stack for creating the UMBEL web service endpoints is that UMBEL is not just a reference structure nor a set of web service endpoints. It is also a programming API that could be used in any Clojure or Java applications. The UMBEL reference structure, along with all the functions that uses it will be available as a JAR file. That way, UMBEL become portable. It could be used as a library in any JVM application without requiring it to send queries to external web services, or to create complex stacks to deploy and use the UMBEL reference structure in different applications.

Bootstrap as the HTML/CSS/JavaScript Framework

The previous UMBEL website was using Drupal 6. For the ones that were using it, it was sometimes clunky, less responsive and more heavy weight. The problem is that we were not requiring a full CMS system for developing a simple UMBEL website that is only informational.

We wanted a responsive experience for the UMBEL user. We wanted to have the fastest experience possible and we wanted to have this experience on any kind of device: desktop computers, tables, mobile phones, etc.

This is why we choose to develop the new UMBEL website using Twitter’s Bootstrap HTML, CSS and JavaScript framework. This is a framework that anybody can use to quickly create simple, beautiful and modern websites. It uses a grid system to create responsive user interfaces on any kind of device (screen size). That way, UMBEL users have the same kind of experience whether they are using a normal desktop screen, a tablet of their mobile phone.

This choice enabled us to create a simple, modern, nice looking and responsive website for UMBEL.

Introduction to the UMBEL Web Services

Now let’s take the time to introduce each of the UMBEL web service endpoint. The first thing to know is that the UMBEL web service endpoints are free to use, have no usage limits and there is no throttling.

Search Concept Web Service

The Search Web service is used to find UMBEL reference concepts that match a search string. This is the primary tool for finding available concepts in the reference structure. It supports the Lucene query syntax and search queries can be constrained on different fields like the preferred label, alternative labels, descriptions and URI.

Get Concept Web Service

The Get Concept Web service is used to get the full description of a UMBEL Reference Concept. By querying this Web service endpoint, you will get the preferred label, all the alternative labels (namely, the items in the semset), the sub/super classes of the concept, the broader/narrower concepts and the description of that concept.

This is the Web service endpoint that should be used to get the direct relationships with any other reference concept.

Reference concepts descriptions are available as N-Triples, RDF+XML, structJSON or Clojure code.

Get Super Type Web Service

The Get Super Type Web service is used to get the full description of a UMBEL Super Type. By querying this Web service endpoint, you will get the preferred label, all of the alternative labels, the description, and the disjoint super types of a target super type.

Get Narrower Concept Web Service

The Get Narrower Concept Web service is used to get the list of all the narrower concepts of a given reference concept. This processing is done by inference, which means that if A -> B -> C are narrower concepts, then the narrower concepts of A are both B and C, which is what will be returned by the endpoint.

Get Broader Concept Web Service

The Get Broader Concept Web service is used to get the list of all the broader concepts for a given reference concept. This processing is done by inference, which means that if A -> B -> C are broader concepts, then the broader concepts of C are both A and B, which is thus what will be returned by the endpoint.

The broader reference concepts do not include the super type as their top concept (use the Get Super-Class-Of web service endpoint for that).

Get Sub Classes Web Service

The Get Sub Classes Web service is used to get the list of all the sub classes of a given reference concept. This processing is done by inference, which means that if A -> B -> C are sub classes, then the sub classes of A are both B and C, which is what will be returned by the endpoint.

Get Super Classes Web Service

The Get Super Classes Web service is used to get the list of all the super classes of a given reference concept. This processing is done by inference, which means that if A -> B -> C are super classes, then the super classes of C are both A and B, which is what will be returned by the endpoint.

The super classes do include the super types as their top concept (use the Get Super-Class-Of web service endpoint for that).

Degree Web Service

The Degree Web service is used to get the degree (measure of distance) between two UMBEL reference concepts by following the path of a transitive property.

Conclusion

This new website along with these new web service endpoints are still using the UMBEL reference structure version 1.05. However, in the coming month or two, a new version of the reference structure should be released. The structure itself won’t change much except the introduction of a few new reference concepts. But new mechanisms (mostly related to attributes) will be introduced. It will also come with a brand new mapping with external data schemas and data sources such as Schema.org, Wikipedia, etc.

On my side, I will start writing more about UMBEL. New web service endpoints will be released over time. The API available to use, manage and leverage the structure will constantly expand.

On the other side, I will write about how the UMBEL reference structure can be used, how it can be leveraged to integrate data sources, to expend search queries, etc.

Frederick Giasson

Machine Learning, Engineering & Data

Tag: #umbel

New UMBEL 1.50 Ships With 20 Linked Ontologies

What is UMBEL?

Major UMBEL Release: 1.10

What is new with the ontology?

What is new with the portal and web services?

What is next?

UMBEL: New Shortest Path Web Service & Tag Web Documents

Intended Users

The Web Service Endpoint

The Online Tool

Tagging Web Documents with the UMBEL Taggers

New UMBEL Concept Noun Tagger Web Service & Other Improvements

Intended Users

Stemming Option

The Web Service Endpoint

The Online Tool

Other UMBEL Website Improvements

Search Autocompletion Mode

Search Autocompletion Widget

Tagged Concepts Within Concept Descriptions

New UMBEL Web Services

Toward a Better Web Experience

Clojure Web Service Endpoints at the Core

Bootstrap as the HTML/CSS/JavaScript Framework

Introduction to the UMBEL Web Services

Search Concept Web Service

Get Concept Web Service

Get Super Type Web Service

Get Narrower Concept Web Service

Get Broader Concept Web Service

Get Sub Classes Web Service

Get Super Classes Web Service

Degree Web Service

Conclusion