Tag Archive for 'webservice'

Open Semantic Framework 3.2 Released

Structured Dynamics is happy to announce the immediate availability of the Open Semantic Framework version 3.2. This is the second important OSF release in a month and a half. triple_120

This new major release of OSF changes the way the web services communicate with the triple store. Originally, OSF web services were using a ODBC channel to communicate with the triple store (Virtuoso). This new release uses the SPARQL HTTP endpoints of the triple store to send queries to it. This is the only changes that occurs in this new version, but as you will see bellow, this is a major one.

Why switching to HTTP?

The problem with using ODBC as the primary communication channel between the OSF web services and the triple store is that it was adding a lot of complexity into OSF. Because the UnixODBC drivers that are shipped with Ubuntu had issues with Virtuoso, we had to use the iODBC drivers to make sure that everything was working properly. This situation forced us to recompile PHP5 such that it uses iODBC instead of UnixODBC as the ODBC drivers for PHP5.

This was greatly complexifying the deployment of OSF since we couldn’t use the default PHP5 packages that shipped with Ubuntu, but had to maintain our own ones that were working with iODBC.

The side effect of this is that system administrators couldn’t upgrade their Ubuntu instances normally since PHP5 needed to be upgraded using particular packages created for that purpose.

Now that OSF doesn’t use ODBC to communicate with the triple store, all this complexity goes away since no special handling is now required. All of the default Ubuntu packages can be used like system administrators normally do.

With this new version, the installation and deployment of a OSF instance has been greatly simplified.

Supports New Triple Stores

Another problem with using ODBC is that it was limiting the number of different triple stores that could be used for operating OSF. In fact, people could only use Virtuoso with their OSF instance.

This new release opens new opportunities. OSF still ships with Virtuoso Open Source as its default triple store, however any triple store that has the following characteristics could replace Virtuoso in OSF:

  1. It has a SPARQL HTTP endpoint
  2. It supports SPARQL 1.1 and SPARQL Update 1.1
  3. It supports SPARQL Update queries that can be sent to the SPARQL HTTP endpoint
  4. It supports the SPARQL 1.1 Query Results JSON Format
  5. It supports the SPARQL 1.1 Graph Store HTTP Protocol via a HTTP endpoint (optional, only required by the Datasets Management Tool)

Deploying a new OSF 3.2 Server

Using the OSF Installer

OSF 3.2 can easily be deployed on a Ubuntu 14.04 LTS server using the osf-installer application. It can easily be done by executing the following commands in your terminal:

mkdir -p /usr/share/osf-installer/

cd /usr/share/osf-installer/

wget https://raw.github.com/structureddynamics/Open-Semantic-Framework-Installer/3.2/install.sh

chmod 755 install.sh

./install.sh

./osf-installer --install-osf -v

Using a Amazon AMI

If you are an Amazon AWS user, you also have access to a free AMI that you can use to create your own OSF instance. The full documentation for using the OSF AMI is available here.

Upgrading Existing Installations

It is not possible to automatically upgrade previous versions of OSF to OSF 3.2. It is possible to upgrade a older instance of OSF to OSF version 3.2, but only manually. If you have this requirement, just let me know and I will write about the upgrade steps that are required to upgrade these instances to OSF version 3.2.

Security

Now that the triple store’s SPARQL HTTP endpoint requires it to be enabled with SPARQL Update rights, it is more important than ever to make sure that the SPARQL HTTP endpoint of the triple store is only available to the OSF web services.

This can be done by properly configuring your firewall or proxy such that only local traffic, or traffic coming from the OSF web service processes, can reach the endpoint.

The SPARQL endpoint that should be exposed to the outside World is OSF’s SPARQL endpoint, which adds an authentication layer above the triple store’s endpoint, and restricts potentially armful SPARQL queries.

Conclusion

This new version of the Open Semantic Framework greatly simplifies its deployment and its maintenance. It also enables other triple stores that exist on the market to be used for OSF instead of Virtuoso Open Source.

New UMBEL Concept Noun Tagger Web Service & Other Improvements

Last week, we released the UMBEL Concept Plain Tagger web service endpoint. Today we are releasing the UMBEL Concept Noun Tagger. umbel_ws

This noun tagger uses UMBEL reference concepts to tag an input text, and is based on the plain tagger, except as noted below.

The noun tagger uses the plain labels of the reference concepts as matches against the nouns of the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text except if you specify the usage of the stemmer. Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow.

Stemming Option

This web service endpoint does have a stemming option. If the option is specified, then the input text will be stemmed and the matches will be made against an index where all the preferred and alternative labels have been stemmed as well. Then once the matches occurs, the tagger will recompose the text such that unstemmed versions of the input text and the tagged reference concepts are presented to the user.

Depending on the use case. users may prefer turning on or off the stemming option on this web service endpoint.

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the noun tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

umbel_tagger_noun

Other UMBEL Website Improvements

We also did some more improvements to the UMBEL website.

Search Autocompletion Mode

First, we created a new autocomplete option on the UMBEL Search web service endpoint. Often people know the concept they want to look at, but they don’t want to go to a search results page to select that concept. What they want is to get concept suggestions instantly based on the letters they are typing in a search box.

Such a feature requires a special kind of search which we call an “autocompletion search”. We added that special mode to the existing UMBEL search web service endpoint. Such a search query takes about 30ms to process. Most of that time is due to the latency of the network since the actual search function takes about 0.5 millisecond the complete.

To use that new mode, you only have to append /autocomplete to the base search web service endpoint URL.

Search Autocompletion Widget

Now that we have this new autocomplete mode for the Search endpoint, we also leveraged it to add autocompletion behavior on the top navigation search box on the UMBEL website.

Now, when you start typing characters in the top search box, you will get a list of possible reference concept matches based on the preferred labels of the concepts. If you select one of them, you will be redirected to their description page.

concept_autocomplete

Tagged Concepts Within Concept Descriptions

Finally, we improved the quality of the concept description reading experience by linking concepts that were mentioned in the descriptions to their respective concept pages. You will now see hyperlinks in the concept descriptions that link to other concepts.

linked_concepts

New UMBEL Concept Tagger Web Service

We just released a new UMBEL web service endpoint and online tool: the Concept Tagger Plain. umbel_ws

This plain tagger uses UMBEL reference concepts to tag an input text. The OBIE (Ontology-Based Information Extraction) method is used, driven by the UMBEL reference concept ontology. By plain we mean that the words (tokens) of the input text are matched to either the preferred labels or alternative labels of the reference concepts. The simple tagger is merely making string matches to the possible UMBEL reference concepts.

This tagger uses the plain labels of the reference concepts as matches against the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text (like stemming, etc.). Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the initial input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow (see conclusion).

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

reference_concept_tagger_uiEDN and ClojureScript

An interesting thing about this user interface is that it has been implemented in ClojureScript and the data serialization exchanged between this user interface and the tagger web service endpoint is in EDN. What is interesting about that is that when the UI receives the resultset from the endpoint, it only has to evaluate the EDN code using the ClojureScript reader (cljs.reader/read-string) to consider the output of the web service endpoint as native data to the application.

No parsing of non-native data format is necessary, which makes the code of the UI simpler and makes the data manipulation much more natural to the developer since no external API is necessary.

What is Next?

This is the first of a series of tagging web service endpoints that will be released. Our intent is to release UMBEL tagging services that have different level of sophistication. Depending on how someone wants to use UMBEL, he will have access to different tagging services that he could use and supplement with their own techniques to end up with their desired results.

The next taggers (not in order) that are planned to be released are:

  • Plaintagger – no weighting or classification except by occurrence count
    • Entity plain tagger (using the Wikidata dictionary)
    • Scones plain tagger – concept + entity
  • Nountagger – with POS, only tags the nouns; generally, the preferred, simplest baselinetagger
    • Concept noun tagger
    • Entity noun tagger
    • Scones noun tagger
  • N-gramtagger – a phrase-basedtagger
    • Concept n-gram tagger
    • Entity n-gram tagger
    • Scones n-gram tagger
  • Completetagger – combinations of above with different machine learning techniques
    • Concept complete tagger
    • Entity complete tagger
    • Scones complete tagger.

So, we welcome you to try out the system online and we welcome your comments and suggestions.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about data mining, data integration, data publishing, the semantic Web, my researches and other related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 87 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN