Archive for the 'Clojure' Category

Open Semantic Framework 3.1 Released

Structured Dynamics is happy to announce the immediate availability of the Open Semantic Framework version 3.1. This new version includes a set of fixes to different components of the framework in the last few months. The biggest change is deployment of OSF using Virtuoso Open Source version 7.1.0. triple_120

We also created a new API for Clojure developers called: clj-osf. Finally we created a new Open Semantic Framework web portal that better describes the project and is hopefully easier to use and more modern.

Quick Introduction to the Open Semantic Framework

What is the Open Semantic Framework?

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components. OSF is designed as an integrated content platform accessible via the Web, which provides needed knowledge management capabilities to enterprises. OSF is made available under the Apache 2 license.

OSF can integrate and manage all types of content – unstructured documents, semi-structured files, spreadsheets, and structured databases – using a variety of best-of-breed data indexing and management engines. All external content is converted to the canonical RDF data model, enabling common tools and methods for tagging and managing all content. Ontologies provide the schema and common vocabularies for integrating across diverse datasets. These capabilities can be layered over existing information assets for unprecedented levels of integration and connectivity. All information within OSF may be powerfully searched and faceted, with results datasets available for export in a variety of formats and as linked data.

A new Open Semantic Framework website

The OSF 3.1 release also triggered the creation of a new website for the project. We wanted something leaner and more modern and that is what I think we delivered. We also reworked the content, we wrote about a series of usecases 1 2 3 4 5 6 and we better aggregated and presented information for each web service endpoint.

A new OSF sandbox

We also created an OSF sandbox where people can test each web service endpoint and test how each functionality works. All of the web services are open to users. The sandbox is not meant to be stable considering that everybody have access to all endpoints. However, the sandbox server will be recreated on a periodic basis. If the sandbox is totally broken and users experiment issues, they can always request a re-creation of the server directly on the OSF mailing list.

Each of the web service pages on the new OSF portal has a Sandbox section where you see some code examples of how to use the endpoint and how to send requests to the sandbox. Here are the instructions to use the sandbox server.

A new OSF API for Clojure: clj-osf

The OSF release 3.1 also includes a new API for Clojure developers: clj-osf.

clj-osf is a Domain Specific Language (DSL) that should lower the threshold to use the Open Semantic Framework.

To use the DSL, you only have to configure your application to use a specific OSF endpoint. Here is an example of how to do this for the Sandbox server:

;; Define the OSF Sandbox credentials (or your own):
(require '[clj-osf.core :as osf])

(osf/defosf osf-test-endpoint {:protocol :http
                               :domain "sandbox.opensemanticframework.org"
                               :api-key "EDC33DA4D977CFDF7B90545565E07324"
                               :app-id "administer"})

(osf/defuser osf-test-user {:uri "http://sandbox.opensemanticframework.org/wsf/users/admin"})

Then you can send simple OSF web service queries. Here is an example that sends a search query to return records of type foaf:Person that also match the keyword “bob”:

(require '[clj-osf.search :as search])

(search/search
 (search/query "bob")
 (search/type-filters ["http://xmlns.com/foaf/0.1/Person"]))

A complete set of clj-osf examples is available on the OSF wiki.

Finally the complete clj-osf DSL documentation is available here.

A community effort

This new release of the OSF Installer is another effort of the growing Open Semantic Framework community. The upgrade of the installer to deploy the OSF stack using Virtuoso Open Source version 7.1.0 has been created by William (Bill) Anderson.

Deploying a new OSF 3.1 Server

Using the OSF Installer

OSF 3.1 can easily be deployed on a Ubuntu 14.04 LTS server using the osf-installer application. It can easily be done by executing the following commands in your terminal:

mkdir -p /usr/share/osf-installer/

cd /usr/share/osf-installer/

wget https://raw.github.com/structureddynamics/Open-Semantic-Framework-Installer/3.1/install.sh

chmod 755 install.sh

./install.sh

./osf-installer --install-osf -v

Using a Amazon AMI

If you are an Amazon AWS user, you also have access to a free AMI that you can use to create your own OSF instance. The full documentation for using the OSF AMI is available here.

Upgrading Existing Installations

Existing OSF installations can be upgraded using the OSF Installer. However, note that the upgrade won’t deploy Virtuoso Open Source 7.1.0 for you. All the code will be upgraded, but Virtuoso will remain the version you were last using on your instance. All the code of OSF 3.1 is compatible with previous versions of Virtuoso, but you won’t benefit the latest improvements to Virtuoso (in terms of performances) and its latest SPARQL 1.1 implementations. If you want to upgrade Virtuoso to version 7.1.0 on an existing OSF instance you will have to do this by hands.

To upgrade the OSF codebase, the first thing is to upgrade the installer itself:

# Upgrade the OSF Installer
./usr/share/osf-installer/upgrade.sh

Then you can upgrade the components using the following commands:

# Upgrade the OSF Web Services
./usr/share/osf-installer/osf --upgrade-osf-web-services="3.1.0"

# Upgrade the OSF WS PHP API
./usr/share/osf-installer/osf --upgrade-osf-ws-php-api="3.1.0"

# Upgrade the OSF Tests Suites
./usr/share/osf-installer/osf --upgrade-osf-tests-suites="3.1.0"

# Upgrade the Datasets Management Tool
./usr/share/osf-installer/osf --upgrade-osf-datasets-management-tool="3.1.0"

# Upgrade the Data Validator Tool
./usr/share/osf-installer/osf --upgrade-osf-data-validator-tool="3.1.0"

clj-turtle: A Clojure Domain Specific Language (DSL) for RDF/Turtle

Some of my recent work leaded me to heavily use Clojure to develop all kind of new capabilities for Structured Dynamics. The ones that knows us, knows that every we do is related to RDF and OWL ontologies. All this work with Clojure is no exception.

Recently, while developing a Domain Specific Language (DSL) for using the Open Semantic Framework (OSF) web service endpoints, I did some research to try to find some kind of simple Clojure DSL that I could use to generate RDF data (in any well-known serialization). After some time, I figured out that no such a thing was currently existing in the Clojure ecosystem, so I choose to create my simple DSL for creating RDF data.

The primary goal of this new project was to have a DSL that users could use to created RDF data that could be feed to the OSF web services endpoints such as the CRUD: Create or CRUD: Update endpoints.

What I choose to do is to create a new project called clj-turtle that generates RDF/Turtle code from Clojure code. The Turtle code that is produced by this DSL is currently quite verbose. This means that all the URIs are extended, that the triple quotes are used and that the triples are fully described.

This new DSL is mean to be a really simple and easy way to create RDF data. It could even be used by non-Clojure coder to create RDF/Turtle compatible data using the DSL. New services could easily be created that takes the DSL code as input and output the RDF/Turtle code. That way, no Clojure environment would be required to use the DSL for generating RDF data.

Installation

For people used to Clojure and Leinengen, you can easily install clj-turtle using Linengen. The only thing you have to do is to add Add [clj-turtle "0.1.3"] as a dependency to your project.clj.

Then make sure that you downloaded this dependency by running the lein deps command.

API

The whole DSL is composed of simply six operators:

  • rdf/turtle
    • Used to generate RDF/Turtle serialized data from a set of triples defined by clj-turtle.
  • defns
    • Used to create/instantiate a new namespace that can be used to create the clj-turtle triples
  • rei
    • Used to reify a clj-turtle triple
  • iri
    • Used to refer to a URI where you provide the full URI as an input string
  • literal
    • Used to refer to a literal value
  • a
    • Used to specify the rdf:type of an entity being described

Usage

Working with namespaces

The core of this DSL is the defns operator. What this operator does is to give you the possibility to create the namespaces you want to use to describe your data. Every time you use a namespace, it will generate a URI reference in the triple(s) that will be serialized in Turtle.

However, it is not necessary to create a new namespace every time you want to serialize Turtle data. In some cases, you may not even know what the namespace is since you have the full URI in hands already. It is why there is the iri function that let you serialize a full URI without having to use a namespace.

Namespaces are just shorthand versions of full URIs that mean to make your code cleaner an easier to read and maintain.

Syntactic rules

Here are the general syntactic rules that you have to follow when creating triples in a (rdf) or (turtle) statement:

  1. Wrap all the code using the (rdf) or the (turtle) operator
  2. Every triple need to be explicit. This means that every time you want to create a new triple, you have to mention the subject, predicate and the object of the triple
  3. A fourth “reification” element can be added using the rei operator
  4. The first parameter of any function can be any kind of value: keywords, strings, integer, double, etc. They will be properly serialized as strings in Turtle.

Strings and keywords

As specified in the syntactic rules, at any time, you can use a string, a integer, a double a keyword or any other kind of value as input of the defined namespaces or the other API calls. You only have to use the way that is more convenient for you or that is the cleanest for your taste.

More about reification

Note: RDF reification is quite a different concept than Clojure’s reify macro. So carefully read this section to understand the meaning of the concept in this context.

In RDF, reifying a triple means that we want to add additional information about a specific triple. Let’s take this example:

(rdf
  (foo :bar) (iron :prefLabel) (literal "Bar"))

In this example, we have a triple that specify the preferred label of the :bar entity. Now, let’s say that we want to add “meta” information about that specific triple, like the date when this triple got added to the system for example.

That additional information is considered the “fourth” element of a triple. It is defined like this:

(rdf
    (foo :bar) (iron :prefLabel) (literal "Bar") (rei
                                                   (foo :date) (literal "2014-10-25" :type :xsd:dateTime)))

What we do here is to specify additional information regarding the triple itself. In this case, it is the date when the triple got added into our system.

So, reification statements are really “meta information” about triples. Also not that reification statements doesn’t change the semantic of the description of the entities.

Examples

Here is a list of examples of how this DSL can be used to generate RDF/Turtle data:

Create a new namespace

The first thing we have to do is define the namespaces we will want to use in our code.

(defns iron "http://purl.org/ontology/iron#")
(defns foo "http://purl.org/ontology/foo#")
(defns owl "http://www.w3.org/2002/07/owl#")
(defns rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
(defns xsd "http://www.w3.org/2001/XMLSchema#")

Create a simple triple

The simplest example is to create a simple triple. What this triple does is to define the preferred label of a :bar entity:

(rdf
  (foo :bar) (iron :prefLabel) (literal "Bar"))

Output:

<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Bar""" .

Create a series of triples

This example shows how we can describe more than one attribute for our bar entity:

(rdf
  (foo :bar) (a) (owl :Thing)
  (foo :bar) (iron :prefLabel) (literal "Bar")
  (foo :bar) (iron :altLabel) (literal "Foo"))

Output:

<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Bar""" .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#altLabel> """Foo""" .

Note: we prefer having one triple per line. However, it is possible to have all the triples in one line, but this will produce less readable code:

Just use keywords

It is possible to use keywords everywhere, even in (literals)

(rdf
  (foo :bar) (a) (owl :Thing)
  (foo :bar) (iron :prefLabel) (literal :Bar)
  (foo :bar) (iron :altLabel) (literal :Foo))

Output:

<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/2002/07/owl#Thing> .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Bar""" .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#altLabel> """Foo""" .

Just use strings

It is possible to use strings everywhere, even in namespaces:

(rdf
  (foo "bar") (a) (owl "Thing")
  (foo "bar") (iron :prefLabel) (literal "Bar")
  (foo "bar") (iron :altLabel) (literal "Foo"))

Output:

<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/2002/07/owl#Thing> .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Bar""" .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#altLabel> """Foo""" .

Specifying a datatype in a literal

It is possible to specify a datatype for every (literal) you are defining. You only have to use the :type option and to specify a XSD datatype as value:

(rdf
  (foo "bar") (foo :probability) (literal 0.03 :type :xsd:double))

Equivalent codes are:

(rdf
  (foo "bar") (foo :probability) (literal 0.03 :type (xsd :double)))
(rdf
  (foo "bar") (foo :probability) (literal 0.03 :type (iri "http://www.w3.org/2001/XMLSchema#double")))

Output:

<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/foo#probability> """0.03"""^^xsd:double .

Specifying a language for a literal

It is possible to specify a language string using the :lang option. The language tag should be a compatible ISO 639-1 language tag.

(rdf
  (foo "bar") (iron :prefLabel) (literal "Robert" :lang :fr))

Output:

<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Robert"""@fr .

Defining a type using the an operator

It is possible to use the (a) predicate as a shortcut to define the rdf:type of an entity:

(rdf
  (foo "bar") (a) (owl "Thing"))

Output:

<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/2002/07/owl#Thing> .

This is a shortcut for:

(rdf
  (foo "bar") (rdf :type) (owl "Thing"))

Specifying a full URI using the iri operator

It is possible to define a subject, a predicate or an object using the (iri) operator if you want to defined them using the full URI of the entity:

(rdf
  (iri "http://purl.org/ontology/foo#bar") (iri "http://www.w3.org/1999/02/22-rdf-syntax-ns#type) (iri http://www.w3.org/2002/07/owl#Type))

Output:

<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Type> .

Simple reification

It is possible to reify any triple suing the (rei) operator as the fourth element of a triple:

(rdf
  (foo :bar) (iron :prefLabel) (literal "Bar") (rei
                                                 (foo :date) (literal "2014-10-25" :type :xsd:dateTime)))

Output:

<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Bar""" .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://purl.org/ontology/foo#bar> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://purl.org/ontology/iron#prefLabel> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> """Bar""" .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://purl.org/ontology/foo#date> """2014-10-25"""^^xsd:dateTime .

Reify multiple properties

It is possible to add multiple reification statements:

(rdf
(foo :bar) (iron :prefLabel) (literal "Bar") (rei
                                               (foo :date) (literal "2014-10-25" :type :xsd:dateTime)
                                               (foo :bar) (literal 0.37)))

Output:

<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Bar""" .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://purl.org/ontology/foo#bar> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://purl.org/ontology/iron#prefLabel> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> """Bar""" .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://purl.org/ontology/foo#date> """2014-10-25"""^^xsd:dateTime .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://purl.org/ontology/foo#bar> """0.37""" .

Using clj-turtle with clj-osf

clj-turtle is meant to be used in Clojure code to simplify the creation of RDF data. Here is an example of how clj-turtle can be used to generate RDF data to feed to the OSF Crud: Create web service endpoint via the clj-osf DSL:

[require '[clj-osf.crud :as crud])

(crud/create
  (crud/dataset "http://test.com/datasets/foo")
  (crud/document
    (rdf
      (iri link) (a) (bibo :Article)
      (iri link) (iron :prefLabel) (literal "Some article")))
  (crud/is-rdf-n3)
  (crud/full-indexation-mode))

Using the turtle alias operator

Depending on your taste, it is possible to use the (turtle) operator instead of the (rdf) one to generate the RDF/Turtle code:

(turtle
  (foo "bar") (iron :prefLabel) (literal "Robert" :lang :fr))

Output:

<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Robert"""@fr .

Merging clj-turtle

Depending the work you have to do in your Clojure application, you may have to generate the Turtle data using a more complex flow of operations. However, this is not an issue for clj-turtle since the only thing you have to do is to concatenate the triples you are creating. You can do so using a simple call to the str function, or you can create more complex processing using loopings, mappings, etc that end up with a (apply str) to generate the final Turtle string.

(str
  (rdf
    (foo "bar") (a) (owl "Thing"))
  (rdf
    (foo "bar") (iron :prefLabel) (literal "Bar")
    (foo "bar") (iron :altLabel) (literal "Foo")))

Output:

<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/2002/07/owl#Thing> .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> """Bar""" .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#altLabel> """Foo""" .

Conclusion

As you can see now, this is a really simple DSL for generating RDF/Turtle code. Even if simple, I find it quite effective by its simplicity. However, even if it quite simple and has a minimum number of operators, this is flexible enough to be able to describe any kind of RDF data. Also, thanks to Clojure, it is also possible to write all kind of code that would generate DSL code that would be executed to generate the RDF data. For example, we can easily create some code that iterates a collection to produce one triple per item of the collection like this:

(->> {:label "a"
      :label "b"
      :label "c"}
     (map
      (fn [label]
        (rdf
         (foo :bar) (iron :prefLabel) (literal label))))
     (apply str))

That code would generate 3 triples (or more if the input collection is bigger). Starting with this simple example, we can see how much more complex processes can leverage clj-turtle for generating RDF data.

A future enhancement to this DSL would be to add a syntactic rule that gives the opportunity to the user to only have to specify the suject of a triple the first time it is introduced to mimic the semi-colon of the Turtle syntax.

New UMBEL Concept Tagger Web Service

We just released a new UMBEL web service endpoint and online tool: the Concept Tagger Plain. umbel_ws

This plain tagger uses UMBEL reference concepts to tag an input text. The OBIE (Ontology-Based Information Extraction) method is used, driven by the UMBEL reference concept ontology. By plain we mean that the words (tokens) of the input text are matched to either the preferred labels or alternative labels of the reference concepts. The simple tagger is merely making string matches to the possible UMBEL reference concepts.

This tagger uses the plain labels of the reference concepts as matches against the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text (like stemming, etc.). Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the initial input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow (see conclusion).

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

reference_concept_tagger_uiEDN and ClojureScript

An interesting thing about this user interface is that it has been implemented in ClojureScript and the data serialization exchanged between this user interface and the tagger web service endpoint is in EDN. What is interesting about that is that when the UI receives the resultset from the endpoint, it only has to evaluate the EDN code using the ClojureScript reader (cljs.reader/read-string) to consider the output of the web service endpoint as native data to the application.

No parsing of non-native data format is necessary, which makes the code of the UI simpler and makes the data manipulation much more natural to the developer since no external API is necessary.

What is Next?

This is the first of a series of tagging web service endpoints that will be released. Our intent is to release UMBEL tagging services that have different level of sophistication. Depending on how someone wants to use UMBEL, he will have access to different tagging services that he could use and supplement with their own techniques to end up with their desired results.

The next taggers (not in order) that are planned to be released are:

  • Plaintagger – no weighting or classification except by occurrence count
    • Entity plain tagger (using the Wikidata dictionary)
    • Scones plain tagger – concept + entity
  • Nountagger – with POS, only tags the nouns; generally, the preferred, simplest baselinetagger
    • Concept noun tagger
    • Entity noun tagger
    • Scones noun tagger
  • N-gramtagger – a phrase-basedtagger
    • Concept n-gram tagger
    • Entity n-gram tagger
    • Scones n-gram tagger
  • Completetagger – combinations of above with different machine learning techniques
    • Concept complete tagger
    • Entity complete tagger
    • Scones complete tagger.

So, we welcome you to try out the system online and we welcome your comments and suggestions.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about data mining, data integration, data publishing, the semantic Web, my researches and other related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 73 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN