Big Structures: Where the Semantic Web Meets Artificial Intelligence

Mike Bergman just published the second part1 of his series of blog posts that summarize the evolution of the Semantic Web in the last decade, and how our experience of the last 7 years of research in that field has led to these observations.

The second part of that series is: Big Structure: At The Nexus of Knowledge Bases, the Semantic Web and Artificial Intelligence.

He continues to outline some issues with the Semantic Web, but more importantly how it fits in a much broader ecosystem, namely KBAI (Knowledge Based AI). He explains the difference between data integration and data interoperability and how these problems could benefit leveraging a sub-set of the Artificial Intelligence domain related to data interoperability:


ai_data_interoperability
These two blog posts set the foundation and the direction where Structured Dynamics is heading in the coming years and where we will focus our research projects and how we will help our clients with their data integration and interoperability issues.

We welcome hearing from you!

New UMBEL Concept Noun Tagger Web Service & Other Improvements

Last week, we released the UMBEL Concept Plain Tagger web service endpoint. Today we are releasing the UMBEL Concept Noun Tagger. umbel_ws

This noun tagger uses UMBEL reference concepts to tag an input text, and is based on the plain tagger, except as noted below.

The noun tagger uses the plain labels of the reference concepts as matches against the nouns of the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text except if you specify the usage of the stemmer. Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow.

Stemming Option

This web service endpoint does have a stemming option. If the option is specified, then the input text will be stemmed and the matches will be made against an index where all the preferred and alternative labels have been stemmed as well. Then once the matches occurs, the tagger will recompose the text such that unstemmed versions of the input text and the tagged reference concepts are presented to the user.

Depending on the use case. users may prefer turning on or off the stemming option on this web service endpoint.

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the noun tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

umbel_tagger_noun

Other UMBEL Website Improvements

We also did some more improvements to the UMBEL website.

Search Autocompletion Mode

First, we created a new autocomplete option on the UMBEL Search web service endpoint. Often people know the concept they want to look at, but they don’t want to go to a search results page to select that concept. What they want is to get concept suggestions instantly based on the letters they are typing in a search box.

Such a feature requires a special kind of search which we call an “autocompletion search”. We added that special mode to the existing UMBEL search web service endpoint. Such a search query takes about 30ms to process. Most of that time is due to the latency of the network since the actual search function takes about 0.5 millisecond the complete.

To use that new mode, you only have to append /autocomplete to the base search web service endpoint URL.

Search Autocompletion Widget

Now that we have this new autocomplete mode for the Search endpoint, we also leveraged it to add autocompletion behavior on the top navigation search box on the UMBEL website.

Now, when you start typing characters in the top search box, you will get a list of possible reference concept matches based on the preferred labels of the concepts. If you select one of them, you will be redirected to their description page.

concept_autocomplete

Tagged Concepts Within Concept Descriptions

Finally, we improved the quality of the concept description reading experience by linking concepts that were mentioned in the descriptions to their respective concept pages. You will now see hyperlinks in the concept descriptions that link to other concepts.

linked_concepts

New UMBEL Web Services

umbel_logo_260_160I am happy to announce the immediate availability of a brand new UMBEL website and a new set of eight UMBEL web services.

UMBEL (Upper Mapping and Binding Exchange Layer) is a general reference structure of 28,000 concepts, which provides a scaffolding to link and interoperate other datasets and domain vocabularies. This project is now six years old.

I would recommend that your read Mike’s blog post about this new release if you want more background information about UMBEL and to have a better understanding of how it can help you integrate, manage, publish and reason over your data.

In this blog post, I will focus on the technical aspects of this new web site and the new set of web service endpoints.

Toward a Better Web Experience

The Web is changing fast. Techniques for developing web sites are constantly and quickly evolving. People uses all kind of devices with different sizes of screens to consume Web content. Websites are more and more responsive by their clever architecture design, and their simpler user interfaces. This is the kind of website we wanted to create for the new UMBEL website.

Clojure Web Service Endpoints at the Core

The core of the new UMBEL website are the new web services. As soon as you are performing a search, or looking at the description of a reference concept or a super type, your browser is making a series of asynchronous queries to the UMBEL web service endpoints.

The average query time is about 60 milliseconds for any of the web service query. This means that a web page is fully loaded within 300 to 500 milliseconds where most of the time is spent downloading the web files (the JavaScript, CSS, HTML and image files) and not querying the web service endpoints. Bearing in mind that the website currently run on a small server with a single core and 1.8G of RAM, these are really good performance figures.

We are initially releasing 8 web service endpoints (with more to follow). They have been created to help developers quickly start using the reference structure without having to download and deploy the entire structure on their own infrastructure. The 8 web services are:

  1. Search concept
  2. Get concept
  3. Get super type
  4. Get narrower concepts
  5. Get broader concepts
  6. Get sub-classes
  7. Get super-classes
  8. Degree

All these web services are calculating the results at runtime. For example, if you want to find the degree between two reference concepts, then the degree is calculated at runtime. It is the same for all the web services that does inferencing like the Get narrower concepts or Get broader concepts web service endpoints.

What we did to get these excellent performance measures is to use Clojure as the programming language and framework to develop the new web service endpoints. Then we define the UMBEL structure as Clojure code.

Each web service endpoint is comprised of simple pure functions that perform calculations on the UMBEL graph of 28 000 nodes. None of the functions are more than 30 lines of code (per endpoint) which greatly simplifies their creation, debugging, maintenance and optimization. Then we use contributed libraries such as Ring and Compojure to manage the creation of the web service endpoints, and Clucy/Lucene for the search engine.

The web services can easily be scaled horizontally since everything is self contained in a single WAR file that can be deployed on new servers in a few clicks. Then the new servers can participate into a cluster of UMBEL web service servers.

Another advantage of using this technology stack for creating the UMBEL web service endpoints is that UMBEL is not just a reference structure nor a set of web service endpoints. It is also a programming API that could be used in any Clojure or Java applications. The UMBEL reference structure, along with all the functions that uses it will be available as a JAR file. That way, UMBEL become portable. It could be used as a library in any JVM application without requiring it to send queries to external web services, or to create complex stacks to deploy and use the UMBEL reference structure in different applications.

Bootstrap as the HTML/CSS/JavaScript Framework

The previous UMBEL website was using Drupal 6. For the ones that were using it, it was sometimes clunky, less responsive and more heavy weight. The problem is that we were not requiring a full CMS system for developing a simple UMBEL website that is only informational.

We wanted a responsive experience for the UMBEL user. We wanted to have the fastest experience possible and we wanted to have this experience on any kind of device: desktop computers, tables, mobile phones, etc.

This is why we choose to develop the new UMBEL website using Twitter’s Bootstrap HTML, CSS and JavaScript framework. This is a framework that anybody can use to quickly create simple, beautiful and modern websites. It uses a grid system to create responsive user interfaces on any kind of device (screen size). That way, UMBEL users have the same kind of experience whether they are using a normal desktop screen, a tablet of their mobile phone.

This choice enabled us to create a simple, modern, nice looking and responsive website for UMBEL.

Introduction to the UMBEL Web Services

Now let’s take the time to introduce each of the UMBEL web service endpoint. The first thing to know is that the UMBEL web service endpoints are free to use, have no usage limits and there is no throttling.

Search Concept Web Service

The Search Web service is used to find UMBEL reference concepts that match a search string. This is the primary tool for finding available concepts in the reference structure. It supports the Lucene query syntax and search queries can be constrained on different fields like the preferred label, alternative labels, descriptions and URI.

Get Concept Web Service

The Get Concept Web service is used to get the full description of a UMBEL Reference Concept. By querying this Web service endpoint, you will get the preferred label, all the alternative labels (namely, the items in the semset), the sub/super classes of the concept, the broader/narrower concepts and the description of that concept.

This is the Web service endpoint that should be used to get the direct relationships with any other reference concept.

Reference concepts descriptions are available as N-Triples, RDF+XML, structJSON or Clojure code.

Get Super Type Web Service

The Get Super Type Web service is used to get the full description of a UMBEL Super Type. By querying this Web service endpoint, you will get the preferred label, all of the alternative labels, the description, and the disjoint super types of a target super type.

Get Narrower Concept Web Service

The Get Narrower Concept Web service is used to get the list of all the narrower concepts of a given reference concept. This processing is done by inference, which means that if A -> B -> C are narrower concepts, then the narrower concepts of A are both B and C, which is what will be returned by the endpoint.

Get Broader Concept Web Service

The Get Broader Concept Web service is used to get the list of all the broader concepts for a given reference concept. This processing is done by inference, which means that if A -> B -> C are broader concepts, then the broader concepts of C are both A and B, which is thus what will be returned by the endpoint.

The broader reference concepts do not include the super type as their top concept (use the Get Super-Class-Of web service endpoint for that).

Get Sub Classes Web Service

The Get Sub Classes Web service is used to get the list of all the sub classes of a given reference concept. This processing is done by inference, which means that if A -> B -> C are sub classes, then the sub classes of A are both B and C, which is what will be returned by the endpoint.

Get Super Classes Web Service

The Get Super Classes Web service is used to get the list of all the super classes of a given reference concept. This processing is done by inference, which means that if A -> B -> C are super classes, then the super classes of C are both A and B, which is what will be returned by the endpoint.

The super classes do include the super types as their top concept (use the Get Super-Class-Of web service endpoint for that).

Degree Web Service

The Degree Web service is used to get the degree (measure of distance) between two UMBEL reference concepts by following the path of a transitive property.

Conclusion

This new website along with these new web service endpoints are still using the UMBEL reference structure version 1.05. However, in the coming month or two, a new version of the reference structure should be released. The structure itself won’t change much except the introduction of a few new reference concepts. But new mechanisms (mostly related to attributes) will be introduced. It will also come with a brand new mapping with external data schemas and data sources such as Schema.org, Wikipedia, etc.

On my side, I will start writing more about UMBEL. New web service endpoints will be released over time. The API available to use, manage and leverage the structure will constantly expand.

On the other side, I will write about how the UMBEL reference structure can be used, how it can be leveraged to integrate data sources, to expend search queries, etc.

Revision of Serializing RDF Data as Clojure Code Specification

In my previous blog post RDF Code: Serializing RDF Data as Clojure Code I did outline a first version of what a RDF serialization could look like if it would be serialized using Clojure code. However, after working with this proposal for two weeks, I found a few issues with the initial assumptions that I made that turned out to be bad design decisions in terms of Clojure code.

This blog post will discuss these issues, and I will update the initial set of rules that I defined in my previous blog post. Going forward, I will use the current rules as the way to serialize RDF data as Clojure code.

What Was Wrong

After two weeks of using the previous set of serializations rules and developing all kind of functions that uses that codes in the context of UMBEL graph traversal and analysis I found the following issues:

  1. Keys and values should be Vars
  2. Ontologies should all be in the same namespace (and not in different namespaces)
  3. The prefix/entity separator for the RDF resources should be a colon and not a slash

These are the three serialization rules that changed after working with the previous version of the proposal. Now, let’s see what caused these changes to occur.

Keys and Values as Vars

The major change is that when we serialize RDF data as Clojure map structures, the keys, and values that are not strings, should be Vars.

There are three things that I didn’t properly evaluated when I first outlined the specification:

  1. The immutable nature of the Clojure data structures
  2. The dependency between ontologies
  3. The non-cyclical namespaces dependency rule imposed by Clojure

In the previous proposal, every RDF property were Clojure functions and they were also the keys of the Clojure maps that were used to serialize the RDF resources. That was working well. However, there was a side effect to this decision: everything was fine until the function’s internal ID changed.

The issue here is that when we work with Clojure maps, we are working with immutable data structures. This means that even if I create a RDF record like this:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def mike {uri “http://foo.com/datasets/people/mike”
rdf/type foaf/+person
iron/pref-label “Mike”
foaf/knows [“http://foo.com/datasets/people/fred”]})[/raw]
[/cc]

And that somehow, in the compilation process the RDF ontology file get re-compiled, then the internal ID of the rdf/type property (function) will change. That means that if I create another record like this:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def mike-2 {uri “http://foo.com/datasets/people/mike”
rdf/type foaf/+person
iron/pref-label “Mike”
foaf/knows [“http://foo.com/datasets/people/fred”]})[/raw]
[/cc]

that uses the same rdf/type function, then these two records would refer to different rdf/type functions since it changed between the time I created the mike and the mike-2 resources. That may not look like an issue since both functions does exactly the same thing. However, this is an issue since for multiple tasks to manipulate and query RDF data rely on comparing these keys (so, these functions). That means that unexpected behaviors can happen and may even looks like random.

The issue here was that we were not referring to the Var that point to the function, but the function itself. By using the Var as the keys and values of the map, then we fix this inconsistency issue. What happens is that all the immutable data structure we are creating are referring to the Var which point to the function. That way, when we evaluate the Var, we will get reference to the same function whatever when it got created (before or after the creation of mike and/or mike-2). Here is what the mike records looks like with this modification:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def mike {#’uri “http://foo.com/datasets/people/mike”
#’rdf/type #’foaf:+person
#’iron/pref-label “Mike”
#’foaf/knows [“http://foo.com/datasets/people/fred”]})[/raw]
[/cc]

We use the #' macro reader to specify that we use the Var as the key and values of the map and not the actual functions or other values referenced by that Var.

The second and third issues I mentioned are tightly related. In a RDF & OWL world, there are multiple examples of ontologies that re-use external ontologies to describe their own semantic. There are cases where an ontology A use classes and properties from an ontology B and where the ontology B use classes and properties from an ontology A. They cross-use each other. Such usage cycles exists in RDF & OWL and are not that uncommon neither.

The problem with that is that at first, I was considering that each OWL ontologies that were to be defined as Clojure code would be in their own Clojure namespace. However, if you are a Clojure coder, you can envision the issue that is coming: if two ontologies cross-use each other, then it means that you have to create a namespace dependency cycles in your Clojure code… and you know that this is not possible because this is restricted by the compiler. This means that everything works fine until this happens.

To overcome that issue, we have to consider that all the ontologies belong to the same namespace (like clojure.core). However, in my next blog post that will focus on these ontologies description I will show how we can split the ontologies in multiple files while keeping them in the same namespace.

Now that we should have all the ontologies in the same namespace, and that we cannot use the namespaced symbols of Clojure anymore, I made the decision to use the more conventional way to write namespaced properties and classes in other RDF serializations which is to delimit the ontology’s prefix with a colon like that:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def mike {#’uri “http://foo.com/datasets/people/mike”
#’rdf:type #’foaf:+person
#’iron:pref-label “Mike”
#’foaf:knows [“http://foo.com/datasets/people/fred”]})[/raw]
[/cc]

Revision of the RDF Code Rules

Now let’s revise the set of rules that I defined in the previous blog post:

  1. A RDF resource is defined as a Clojure map where:
    1. Every key is a Var that point to a function
    2. Every value is a:
      1. string
        1. A string is considered a literal if the key is a owl:DatatypeProperty
        2. A string is considered a URI if the key is a owl:ObjectProperty
      2. map
        1. A map represent a literal if the value key is present
        2. A map represent a reference to another resource if the uri key is present
        3. A map is invalid if it doesn’t have a uri nor a value key
      3. vector
        1. A vector refer to multiple values. Values of a vector can be stringsmaps, symbols or Vars
      4. symbol
        1. A symbol can be created to simplify the serialization. However, these symbols have to reference a string or a var object
      5. var
        1. A var reference another entity

In addition to these rules, there are some more specific rules such as:

  1. The value of a uri key is always a string
  2. If the #’rdf:type key is not defined for a resource, then the resource is considered to be of type #’owl:+thing (since everything is at least an instance of the owl:Thing class in OWL)

Finally, there are two additional classes and datatypes creation conventions:

  1. The name of the classes starts with a + sign, like: #’owl:+thing
  2. The name of the datatypes starts with a * sign, like: #’xsd:*string

As you can see, the rules that govern the serialization of RDF data as Clojure code are minimal and should be simple to understand for someone who is used to Clojure code and that tried to write a few resource examples using this format. Now, let’s apply these rules with a series of examples.

Note 1: in the examples of this blog post, I am referring to Vars like #’uri, #’value, #’lang, #’datatype, etc. To make the rules simpler to read and understand, consider that these Vars are defined in the user‘s namespace. However, they are vars that are defined in the rdf.core namespace that will be made publicly available later.

Note 2: All the properties and classes resource Vars have been defined in the same namespace. They should be included with :require or :use like (:use [ontologies.core]) from the ns function of the Clojure source code file that define this RDF resource. We will discuss about these namespaces in a subsequent blog post.

Revision of Serializing RDF Code in N-Triples

The serialize-ntriples function got modified to comply with the new set of rules:

[cc lang=’lisp’ line_numbers=’false’]
[raw](declare serialize-ntriples-map-value serialize-ntriples-string-value is-datatype-property?)

(defn serialize-ntriples
[resource]
(let [n3 (atom “”)
iri (get resource #’rdf.core/uri)]
(doseq [[property prop-vals] resource]
(let [property-uri (get (meta property) #’rdf.core/uri)]
; Don’t do anything with the “uri” key
(if (not= property #’rdf.core/uri)
(if (vector? prop-vals)
; Here the value is a vector of maps or values
(doseq [v prop-vals]
(let [val (if (var? v) @v v)]
(if (map? val)
; The value of the vector is a map
(reset! n3 (str @n3 (serialize-ntriples-map-value val iri property-uri)))
(if (string? val)
; The value of the vector is a string
(reset! n3 (str @n3 (serialize-ntriples-string-value val iri property-uri property)))))))
(let [vals (if (var? prop-vals) @prop-vals prop-vals)]
(if (map? vals)
; The value of the property is a map
(reset! n3 (str @n3 (serialize-ntriples-map-value vals iri property-uri)))
(if (string? vals)
; The value of the property is some kind of literal
(reset! n3 (str @n3 (serialize-ntriples-string-value vals iri property-uri property))))))))))
@n3))

(defn- serialize-ntriples-map-value
[m iri property-uri]
(if (not (nil? (get m #’rdf.core/uri)))
; The value is a reference to another resource
(format “<%s> <%s> <%s> .\n” iri property-uri (get m #’rdf.core/uri))
(if (not (nil? (get m #’rdf.core/value)))
; The value is some kind of literal
(let [value (get m #’rdf.core/value)
lang (if (get m #’rdf.core/lang) (str “@” (get m #’rdf.core/lang)) “”)
datatype (if (get m #’rdf.core/datatype) (str “^^<” (get (deref (get m #’rdf.core/datatype)) #’rdf.core/uri) “>”) “”)]
(format “<%s> <%s> \”\”\”%s\”\”\”%s%s .\n” iri property-uri value lang datatype))
(if (string? m)
; The value of the sector is some kind of literal
(format “<%s> <%s> \”\”\”%s\”\”\” .\n” iri property-uri m)))))

(defn- serialize-ntriples-string-value
[s iri property-uri property]
; The value of the vector is a string
(if (true? (is-datatype-property? property))
; The property referring to this value is a owl:DatatypeProperty
(format “<%s> <%s> \”\”\”%s\”\”\” .\n” iri property-uri s)
; The property referring to this value is a owl:ObjectProperty
(format “<%s> <%s> <%s> .\n” iri property-uri s)))

(defn is-datatype-property?
[property]
(if (= (-> property
meta
(get #’ontologies.core/rdf:type)
deref
(get #’rdf.core/uri))
(-> #’ontologies.core/owl:+datatype-property
deref
(get #’rdf.core/uri)))
(eval true)
(eval false)))
[/raw]
[/cc]

Serializing a RDF Resource

Now let’s serialize a new RDF resource using the new set of rules:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {#’uri “http://foo.com/datasets/people/fred”
#’rdf:type [#’foaf:+person #’owl:+thing]
#’iron:pref-label “Fred”
#’iron:alt-label {#’value “Frederick”
#’lang “en”}
#’foaf:skypeID {#’value “frederick.giasson”
#’datatype #’xsd/*string}
#’foaf:knows [{#’uri “http://foo.com/datasets/people/bob”}
mike
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

One drawback with these new rules (even if essential) is that they complexify the writing of the RDF resources because of the (heavy) usage of the #' macro.

However, on the other hand, they may looks like more familiar to people used to RDF serializations because of the usage of the colon instead of the slash to split the ontology prefix with the ending of the URI.

What we have above, is how the RDF data is represented in Clojure. However, there is a possibility to make this serialization less compact by creating a macro that would change the input map and automatically inject the usage of the #' reader macro into the map structures that define the RDF resources.

Here is the r macro (“r” stands for Resource) that does exactly this:

[cc lang=’lisp’ line_numbers=’false’]
[raw](defmacro r
[form]
(-> (walk/postwalk
(fn [x]
(if (and (symbol? x) (-> x
eval
string?
not))
`(var ~x)
x))
form)))[/raw]
[/cc]

Then you can use it to define all the RDF resources you want to create:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred (r {uri “http://foo.com/datasets/people/fred”
rdf:type [foaf:+person owl:+thing]
iron:pref-label “Fred”
iron:alt-label {value “Frederick”
lang “en”}
foaf:skypeID {value “frederick.giasson”
datatype xsd/*string}
foaf:knows [{uri “http://foo.com/datasets/people/bob”}
mike
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

That structure is equivalent to the other one because the r macro will add the #' reader macro calls to change the input map before creating the resource’s Var.

By using the r macro, we can see that the serialization is made much simpler, and that at the end, it is more natural to people used to other RDF serializations.

Conclusion

I used the initial specification in the context of creating a new series of web services for the UMBEL project. This heavy usage of this kind of RDF data leaded to discover the issues I covered in this blog post. Now that these issues are resolved, I am confident that we can move forward in the series of blog posts that covers how (and why!) using Clojure code to serialize RDF data.

The next blog post will cover how to manage the ontologies used to instantiate these RDF resources.

RDF Code: Serializing RDF Data as Clojure Code

RDF Code is a specification to serialize RDF data as Clojure code1. This blog post introduce the first version of this new RDF serialization format. I will outline the rules that specify how such RDF data should be serialized using the Clojure programming language.

This specification may change over time. However, this is the specification that will be used for the future blog posts that I will write about this subject, and for the code that will be released.

I am also expecting feedbacks and propositions to make this serialization easier to use, simpler to define and cleaner.

What we do with this serialization is to write RDF resources as Clojure maps. This is not about defining a DSL to manipulate this data, but really to define the core Clojure structure that will be manipulated by Clojure functions and applications. Eventually a (or multiple) DSL could be created to help users and developers using this RDF data in their Clojure application. But this is not the current focus.

A Complete RDF Resource

Before outlining all the rules to create well-formed RDF data as Clojure code, let’s take a look at a resource that uses all the serialization features2. Note that this is the more complex it can be. Below we will see how we can normalize the usage of the serialization rules to end up with clean and easy to read RDF data as Clojure code.

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label “Fred”
iron/alt-label {value “Frederick”
lang “en”}
foaf/skypeID {value “frederick.giasson”
datatype xsd/*string}
foaf/knows [{uri “http://foo.com/datasets/people/bob”}
mike
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

This code shows how to:

  • Serialize a single resource as a Clojure map
  • How to define a URI for that resource
  • How to define one or multiple rdf:type for a resource
  • How to define one or multiple values for a owl:DatatypeProperty
  • How to define one or multiple values for a owl:ObjectProperty
  • How to define the language of a Literal
  • How to define the datatype of a Literal

As you can see, such a RDF serialization format is expressive enough to be able to express any RDF triples. It also has syntactic rules that help reading and writing RDF data in that format.

Note that since this RDF data is also Clojure code, it means that the serialization format has been highly influenced by Clojure’s own syntax and coding style3. RDF data serialized using this format also needs to be valid Clojure code. Now, let’s outline the rules that govern the creation of such RDF data, then let explains all these rules using simple RDF code examples.

RDF Code Rules

Here is the list of all the rules that govern the creation of RDF data serialized as Clojure code:

  1. A RDF resource is defined as a Clojure map where:
    1. Every key is a symbol that reference a function
    2. Every value is a:
      1. string
        1. A string is considered a literal if the key is a owl:DatatypeProperty
        2. A string is considered a URI if the key is a owl:ObjectProperty
      2. map
        1. A map represent a literal if the value key is present
        2. A map represent a reference to another resource if the uri key is present
        3. A map is invalid if it doesn’t have a uri nor a value key
      3. vector
        1. A vector refer to multiple values. Values of a vector can be strings, maps or symbols
      4. symbol
        1. A symbol can be created to simplify the serialization. However, these symbols have to reference a string or a map object

In addition to these rules, there are some more specific rules such as:

  1. The value of a uri key is always a string
  2. If the rdf/type key is not defined for a resource, then the resource is considered to be of type owl:Thing (since everything is at least an instance of the owl:Thing class in OWL)

Finally, there are two additional classes and datatypes creation conventions:

  1. The name of the classes starts with a + sign, like: owl/+thing
  2. The name of the datatypes starts with a * sign, like: xsd/*string

As you can see, the rules that govern the serialization of RDF data as Clojure code are minimal and should be simple to understand for someone who is used to Clojure code and that tried to write a few resource examples using this format. Now, let’s apply these rules with a series of examples.

Note 1: in the examples of this blog post, I am referring to symbols like uri, value, lang, datatype, etc. To make the rules simpler to read and understand, consider that these symbols are defined in the user‘s namespace. However, they are symbols that are defined in the rdf.core namespace that will be made publicly available later.

Note 2: I am also referring to namespaced symbols like rdf/type, iron/pref-label, etc. These symbols are defined in their respective namespaces. They have been defined such as (:require [ontologies.rdf :as rdf] [ontologies.iron :as iron]) into the ns function of the Clojure source code file that define this RDF resource. We will discuss about these namespaces in a subsequent blog post.

Serializing RDF Code in N-Triples

Before starting to list all the examples, let’s define a Clojure function that we will use to convert the RDF code as N-Triples4. N-Triples is just a list of <subject> <predicate> <object> triples that describes the RDF resources we are describing. N-Triples is the simplest and most verbose RDF serialization that currently exists. What this serialize-ntriples function does is to take a Clojure map that represent a RDF resource and return a string that represents the serialized N-Triples.

What is important with that code is that is shows how the rules we outlined above got implemented to serialize RDF code as N-Triples. Such serializer function could be created to serialize Turtle and XML RDF serializations as well. Note that you won’t be able to use the serialize-ntriples function because you are missing the ontologies files. I will make them available in a subsequent blog post that will explain how properties, classes and datatypes are created and used in this context.

[cc lang=’lisp’ line_numbers=’false’]
[raw](declare serialize-ntriples-map-value serialize-ntriples-string-value is-datatype-property?)

(defn serialize-ntriples
[resource]
(let [n3 (atom “”)
iri (get resource rdf.core/uri)]
(doseq [[property vals] resource]
(let [property-uri (get (meta property) rdf.core/uri)]
; Don’t do anything with the “uri” key
(if (not= property rdf.core/uri)
(if (vector? vals)
; Here the value is a vector of maps or values
(doseq [val vals]
(if (map? val)
; The value of the vector is a map
(reset! n3 (str @n3 (serialize-ntriples-map-value val iri property-uri)))
(if (string? val)
; The value of the vector is a string
(reset! n3 (str @n3 (serialize-ntriples-string-value val iri property-uri property))))))
(if (map? vals)
; The value of the property is a map
(reset! n3 (str @n3 (serialize-ntriples-map-value vals iri property-uri)))
(if (string? vals)
; The value of the property is some kind of literal
(reset! n3 (str @n3 (serialize-ntriples-string-value vals iri property-uri property)))))))))
@n3))

(defn- serialize-ntriples-map-value
[m iri property-uri]
(if (not (nil? (get m rdf.core/uri)))
; The value is a reference to another resource
(format “<%s> <%s> <%s> .\n” iri property-uri (get m rdf.core/uri))
(if (not (nil? (get m rdf.core/value)))
; The value is some kind of literal
(let [value (get m rdf.core/value)
lang (if (get m rdf.core/lang) (str “@” (get m rdf.core/lang)) “”)
datatype (if (get m rdf.core/datatype) (str “^^<” (get (get m rdf.core/datatype) rdf.core/uri) “>”) “”)]
(format “<%s> <%s> \”\”\”%s\”\”\”%s%s .\n” iri property-uri value lang datatype))
(if (string? m)
; The value of the sector is some kind of literal
(format “<%s> <%s> \”\”\”%s\”\”\” .\n” iri property-uri m)))))

(defn- serialize-ntriples-string-value
[s iri property-uri property]
; The value of the vector is a string
(if (true? (is-datatype-property? property))
; The property referring to this value is a owl:DatatypeProperty
(format “<%s> <%s> \”\”\”%s\”\”\” .\n” iri property-uri s)
; The property referring to this value is a owl:ObjectProperty
(format “<%s> <%s> <%s> .\n” iri property-uri s)))

(defn is-datatype-property?
[property]
(if (= (get (get (meta property) rdf/type) rdf.core/uri) (get ontologies.owl/+datatype-property rdf.core/uri))
(eval true)
(eval false)))[/raw]
[/cc]

The Simplest Resource

Here is the simplest resource that can be written:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type “http://xmlns.com/foaf/0.1/Person”
iron/pref-label “Fred”
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

It produces these triples:

[raw]

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

[/raw]

In this example, all the references to the classes and to other resources are made using strings that represent URIs. Then all the literal values are normal strings as well. The vector is composed of a list of URIs without mixing different type of values.

Using Classes Symbols

In this example, we will use the symbol that reference a class resource we have defined in an ontology:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type foaf/+person
iron/pref-label “Fred”
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

The same set of triples as the previous example will be generated by serialize-ntriples.

What is interesting with this new example is that it is more appealing to human readers. Instead of having a full URI string, we are seeing a symbol which is more pleasant to the eyes.

But the real benefit is no that it is more pleasant to the eyes, but that it really refers to something. It refers to a class resource. This means that we have a docstring for that class, that we can check the code that describes the class to see all its characteristics, that you will be able to auto-complete it in your IDE, etc.

Multiple Types

It is possible to define multiple types for a resource. The only thing you have to do is to use a vector as the value of the rdf/type key:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label “Fred”
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

This code will generate the following triples:

[raw]

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

[/raw]

Specifying a Datatype for a Literal

It is also possible to define a datatype for a literal. What you have to do is to use a map with a value and a datatype key:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label {value “Fred”
datatype xsd/*string}
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

The triples that will be generated are:

[raw]

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred"""^^<http://www.w3.org/2001/XMLSchema#string> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

[/raw]

Specifying a Language Tag for a Literal

It is also possible to define a language tag for a literal. What you have to do is to use a map with a value and a lang key:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label {value “Fred”
lang “en”}
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

This code will produce the following triples:

[raw]

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred"""@en .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

[/raw]

Referring to a URI Using a Map

It is possible to explicit the fact that a literal value is a URI reference. It is possible to do that by using a map and by specifying the uri key. Some people may prefer that approach because it makes the fact that a literal is a URI explicit without having to know the nature of the key (i.e. if the property is a datatype or an object property):

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label {value “Fred”
lang “en”}
foaf/skypeID “frederick.giasson”
foaf/knows [{uri “http://foo.com/datasets/people/bob”}
{uri “http://foo.com/datasets/people/mike”}
{uri “http://foo.com/datasets/people/teo”}]})[/raw]
[/cc]

The same triples will be generated as the example above.

Mixing Values in a Vector

It is possible to mix the values in a vector. In the following example, we are using literals as URIs, and maps that refer to URIs as well. The rules permit this kind of values mixing within a vector and the software that manipulate this kind of RDF data should be agnostic to this:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label {value “Fred”
lang “en”}
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
{uri “http://foo.com/datasets/people/mike”}
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]

The same triples will be generated as the example above.

Using Symbols to Get Cleaner Code

Because of the way Clojure works, you can define new symbols that will refer to the values to add to this structure. Let’s go wild, and do define a symbol for each value of the resource we are describing:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred-uri “http://foo.com/datasets/people/fred”)

(def foaf_person foaf/+person)

(def owl_thing owl/+thing)

(def fred-label {value “Fred”
lang “en”})

(def fred-skype-id “frederick.giasson”)

(def bob-uri “http://foo.com/datasets/people/bob”)

(def mike-uri {uri “http://foo.com/datasets/people/mike”})

(def teo-uri “http://foo.com/datasets/people/teo”)

(def fred {uri fred-uri
rdf/type [foaf/+person owl/+thing]
iron/pref-label fred-label
foaf/skypeID fred-skype-id
foaf/knows [bob-uri
mike-uri
teo-uri]})[/raw]
[/cc]

The triples generated will be the same as the example above.

What happens here is that the Clojure reader substitutes the symbols by the actual things they refer to at evaluation time.

This is why this works and it is why it is still valid RDF code according to the rules outlined above. The serialize-ntriples function is not even aware that this data structure has been defined that way. It is the case because the map that it will receive as input will already be evaluated. This means that what it gets as input are the objects referenced by these symbols, and not the symbols themselves.

This mechanism can be leveraged to make the RDF code even more readable if you have patterns that a repeated constantly in a dataset file you are creating.

Conclusion

This is all good, but why yet another serialization for RDF? As I started to outline in my two previous blogposts on this topic, this is not just about serializing RDF in another format. It is about having RDF data that can be evaluated as code (in this case, Clojure code).

Considering RDF properties as functions open-up a World of possibilities. First, it means that by evaluating and compiling this kind of RDF code, you have a data structure that is able to validate itself according to the way the properties are defined into the OWL ontologies used to define their behaviors and semantics.

This blog post focused on how to serialize instances data. In the next blog posts, I will cover:

  • How to reify triples using this serialization
  • How to create the classes, properties and datatypes used by the serialization
  • How to validate the data structure
  • How to manage ontologies as Clojure packages

Feel free to comment this blog post and propose changes to the serialization format or the serialize-ntriples function.