Archive for the 'Open Semantic Framework' Category

Open Semantic Framework 3.0.1 Released

I am happy to announce the immediate availability of the Open Semantic Framework version 3.0.1. This new version includes a set of fixes to different components of the framework in the last few months.The biggest addition is the new OSF Installer which will deploy OSF on Ubuntu LTS 14.04 servers. triple_120

A Community Effort

This new release of the OSF Installer is an effort of the growing Open Semantic Framework community. The upgrade of the installer to deploy the OSF stack on the lastest Ubuntu Long Term Support (LTS) version 14.04 has been created by William (Bill) Anderson.

Samar Acharya also suggested to decouple the PHP5 Debian packages from the core OSF Installer repository to cope with the support of future version of Ubuntu or other Linux distributions. This led to the creation of the new OSF-Installer-Ext repository, which is only used to host these distribution specific files like the PHP5 Debian files.

Upgrading Existing Installations

Existing OSF installations can be upgraded using the OSF Installer. The first thing is to upgrade the installer itself:

# Upgrade the OSF Installer
./usr/share/osf-installer/upgrade.sh

Then you can upgrade the components using the following commands:

# Upgrade the OSF Web Services
./usr/share/osf-installer/osf --upgrade-osf-web-services="3.0.1"

# Upgrade the OSF WS PHP API
./usr/share/osf-installer/osf --upgrade-osf-ws-php-api="3.0.1"

# Upgrade the OSF Tests Suites
./usr/share/osf-installer/osf --upgrade-osf-tests-suites="3.0.1"

# Upgrade the Datasets Management Tool
./usr/share/osf-installer/osf --upgrade-osf-datasets-management-tool="3.0.1"

# Upgrade the Data Validator Tool
./usr/share/osf-installer/osf --upgrade-osf-data-validator-tool="3.0.1"

New UMBEL Concept Tagger Web Service

We just released a new UMBEL web service endpoint and online tool: the Concept Tagger Plain. umbel_ws

This plain tagger uses UMBEL reference concepts to tag an input text. The OBIE (Ontology-Based Information Extraction) method is used, driven by the UMBEL reference concept ontology. By plain we mean that the words (tokens) of the input text are matched to either the preferred labels or alternative labels of the reference concepts. The simple tagger is merely making string matches to the possible UMBEL reference concepts.

This tagger uses the plain labels of the reference concepts as matches against the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text (like stemming, etc.). Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the initial input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow (see conclusion).

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

reference_concept_tagger_uiEDN and ClojureScript

An interesting thing about this user interface is that it has been implemented in ClojureScript and the data serialization exchanged between this user interface and the tagger web service endpoint is in EDN. What is interesting about that is that when the UI receives the resultset from the endpoint, it only has to evaluate the EDN code using the ClojureScript reader (cljs.reader/read-string) to consider the output of the web service endpoint as native data to the application.

No parsing of non-native data format is necessary, which makes the code of the UI simpler and makes the data manipulation much more natural to the developer since no external API is necessary.

What is Next?

This is the first of a series of tagging web service endpoints that will be released. Our intent is to release UMBEL tagging services that have different level of sophistication. Depending on how someone wants to use UMBEL, he will have access to different tagging services that he could use and supplement with their own techniques to end up with their desired results.

The next taggers (not in order) that are planned to be released are:

  • Plaintagger – no weighting or classification except by occurrence count
    • Entity plain tagger (using the Wikidata dictionary)
    • Scones plain tagger – concept + entity
  • Nountagger – with POS, only tags the nouns; generally, the preferred, simplest baselinetagger
    • Concept noun tagger
    • Entity noun tagger
    • Scones noun tagger
  • N-gramtagger – a phrase-basedtagger
    • Concept n-gram tagger
    • Entity n-gram tagger
    • Scones n-gram tagger
  • Completetagger – combinations of above with different machine learning techniques
    • Concept complete tagger
    • Entity complete tagger
    • Scones complete tagger.

So, we welcome you to try out the system online and we welcome your comments and suggestions.

New UMBEL Web Services

umbel_logo_260_160I am happy to announce the immediate availability of a brand new UMBEL website and a new set of eight UMBEL web services.

UMBEL (Upper Mapping and Binding Exchange Layer) is a general reference structure of 28,000 concepts, which provides a scaffolding to link and interoperate other datasets and domain vocabularies. This project is now six years old.

I would recommend that your read Mike’s blog post about this new release if you want more background information about UMBEL and to have a better understanding of how it can help you integrate, manage, publish and reason over your data.

In this blog post, I will focus on the technical aspects of this new web site and the new set of web service endpoints.

Toward a Better Web Experience

The Web is changing fast. Techniques for developing web sites are constantly and quickly evolving. People uses all kind of devices with different sizes of screens to consume Web content. Websites are more and more responsive by their clever architecture design, and their simpler user interfaces. This is the kind of website we wanted to create for the new UMBEL website.

Clojure Web Service Endpoints at the Core

The core of the new UMBEL website are the new web services. As soon as you are performing a search, or looking at the description of a reference concept or a super type, your browser is making a series of asynchronous queries to the UMBEL web service endpoints.

The average query time is about 60 milliseconds for any of the web service query. This means that a web page is fully loaded within 300 to 500 milliseconds where most of the time is spent downloading the web files (the JavaScript, CSS, HTML and image files) and not querying the web service endpoints. Bearing in mind that the website currently run on a small server with a single core and 1.8G of RAM, these are really good performance figures.

We are initially releasing 8 web service endpoints (with more to follow). They have been created to help developers quickly start using the reference structure without having to download and deploy the entire structure on their own infrastructure. The 8 web services are:

  1. Search concept
  2. Get concept
  3. Get super type
  4. Get narrower concepts
  5. Get broader concepts
  6. Get sub-classes
  7. Get super-classes
  8. Degree

All these web services are calculating the results at runtime. For example, if you want to find the degree between two reference concepts, then the degree is calculated at runtime. It is the same for all the web services that does inferencing like the Get narrower concepts or Get broader concepts web service endpoints.

What we did to get these excellent performance measures is to use Clojure as the programming language and framework to develop the new web service endpoints. Then we define the UMBEL structure as Clojure code.

Each web service endpoint is comprised of simple pure functions that perform calculations on the UMBEL graph of 28 000 nodes. None of the functions are more than 30 lines of code (per endpoint) which greatly simplifies their creation, debugging, maintenance and optimization. Then we use contributed libraries such as Ring and Compojure to manage the creation of the web service endpoints, and Clucy/Lucene for the search engine.

The web services can easily be scaled horizontally since everything is self contained in a single WAR file that can be deployed on new servers in a few clicks. Then the new servers can participate into a cluster of UMBEL web service servers.

Another advantage of using this technology stack for creating the UMBEL web service endpoints is that UMBEL is not just a reference structure nor a set of web service endpoints. It is also a programming API that could be used in any Clojure or Java applications. The UMBEL reference structure, along with all the functions that uses it will be available as a JAR file. That way, UMBEL become portable. It could be used as a library in any JVM application without requiring it to send queries to external web services, or to create complex stacks to deploy and use the UMBEL reference structure in different applications.

Bootstrap as the HTML/CSS/JavaScript Framework

The previous UMBEL website was using Drupal 6. For the ones that were using it, it was sometimes clunky, less responsive and more heavy weight. The problem is that we were not requiring a full CMS system for developing a simple UMBEL website that is only informational.

We wanted a responsive experience for the UMBEL user. We wanted to have the fastest experience possible and we wanted to have this experience on any kind of device: desktop computers, tables, mobile phones, etc.

This is why we choose to develop the new UMBEL website using Twitter’s Bootstrap HTML, CSS and JavaScript framework. This is a framework that anybody can use to quickly create simple, beautiful and modern websites. It uses a grid system to create responsive user interfaces on any kind of device (screen size). That way, UMBEL users have the same kind of experience whether they are using a normal desktop screen, a tablet of their mobile phone.

This choice enabled us to create a simple, modern, nice looking and responsive website for UMBEL.

Introduction to the UMBEL Web Services

Now let’s take the time to introduce each of the UMBEL web service endpoint. The first thing to know is that the UMBEL web service endpoints are free to use, have no usage limits and there is no throttling.

Search Concept Web Service

The Search Web service is used to find UMBEL reference concepts that match a search string. This is the primary tool for finding available concepts in the reference structure. It supports the Lucene query syntax and search queries can be constrained on different fields like the preferred label, alternative labels, descriptions and URI.

Get Concept Web Service

The Get Concept Web service is used to get the full description of a UMBEL Reference Concept. By querying this Web service endpoint, you will get the preferred label, all the alternative labels (namely, the items in the semset), the sub/super classes of the concept, the broader/narrower concepts and the description of that concept.

This is the Web service endpoint that should be used to get the direct relationships with any other reference concept.

Reference concepts descriptions are available as N-Triples, RDF+XML, structJSON or Clojure code.

Get Super Type Web Service

The Get Super Type Web service is used to get the full description of a UMBEL Super Type. By querying this Web service endpoint, you will get the preferred label, all of the alternative labels, the description, and the disjoint super types of a target super type.

Get Narrower Concept Web Service

The Get Narrower Concept Web service is used to get the list of all the narrower concepts of a given reference concept. This processing is done by inference, which means that if A -> B -> C are narrower concepts, then the narrower concepts of A are both B and C, which is what will be returned by the endpoint.

Get Broader Concept Web Service

The Get Broader Concept Web service is used to get the list of all the broader concepts for a given reference concept. This processing is done by inference, which means that if A -> B -> C are broader concepts, then the broader concepts of C are both A and B, which is thus what will be returned by the endpoint.

The broader reference concepts do not include the super type as their top concept (use the Get Super-Class-Of web service endpoint for that).

Get Sub Classes Web Service

The Get Sub Classes Web service is used to get the list of all the sub classes of a given reference concept. This processing is done by inference, which means that if A -> B -> C are sub classes, then the sub classes of A are both B and C, which is what will be returned by the endpoint.

Get Super Classes Web Service

The Get Super Classes Web service is used to get the list of all the super classes of a given reference concept. This processing is done by inference, which means that if A -> B -> C are super classes, then the super classes of C are both A and B, which is what will be returned by the endpoint.

The super classes do include the super types as their top concept (use the Get Super-Class-Of web service endpoint for that).

Degree Web Service

The Degree Web service is used to get the degree (measure of distance) between two UMBEL reference concepts by following the path of a transitive property.

Conclusion

This new website along with these new web service endpoints are still using the UMBEL reference structure version 1.05. However, in the coming month or two, a new version of the reference structure should be released. The structure itself won’t change much except the introduction of a few new reference concepts. But new mechanisms (mostly related to attributes) will be introduced. It will also come with a brand new mapping with external data schemas and data sources such as Schema.org, Wikipedia, etc.

On my side, I will start writing more about UMBEL. New web service endpoints will be released over time. The API available to use, manage and leverage the structure will constantly expand.

On the other side, I will write about how the UMBEL reference structure can be used, how it can be leveraged to integrate data sources, to expend search queries, etc.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about my semantic Web researches and related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 69 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN