Archive for the 'Clojure' Category

clj-turtle: A Clojure Domain Specific Language (DSL) for RDF/Turtle

Some of my recent work leaded me to heavily use Clojure to develop all kind of new capabilities for Structured Dynamics. The ones that knows us, knows that every we do is related to RDF and OWL ontologies. All this work with Clojure is no exception.

Recently, while developing a Domain Specific Language (DSL) for using the Open Semantic Framework (OSF) web service endpoints, I did some research to try to find some kind of simple Clojure DSL that I could use to generate RDF data (in any well-known serialization). After some time, I figured out that no such a thing was currently existing in the Clojure ecosystem, so I choose to create my simple DSL for creating RDF data.

The primary goal of this new project was to have a DSL that users could use to created RDF data that could be feed to the OSF web services endpoints such as the CRUD: Create or CRUD: Update endpoints.

What I choose to do is to create a new project called clj-turtle that generates RDF/Turtle code from Clojure code. The Turtle code that is produced by this DSL is currently quite verbose. This means that all the URIs are extended, that the triple quotes are used and that the triples are fully described.

This new DSL is mean to be a really simple and easy way to create RDF data. It could even be used by non-Clojure coder to create RDF/Turtle compatible data using the DSL. New services could easily be created that takes the DSL code as input and output the RDF/Turtle code. That way, no Clojure environment would be required to use the DSL for generating RDF data.


For people used to Clojure and Leinengen, you can easily install clj-turtle using Linengen. The only thing you have to do is to add Add [clj-turtle "0.1.3"] as a dependency to your project.clj.

Then make sure that you downloaded this dependency by running the lein deps command.


The whole DSL is composed of simply six operators:

  • rdf/turtle
    • Used to generate RDF/Turtle serialized data from a set of triples defined by clj-turtle.
  • defns
    • Used to create/instantiate a new namespace that can be used to create the clj-turtle triples
  • rei
    • Used to reify a clj-turtle triple
  • iri
    • Used to refer to a URI where you provide the full URI as an input string
  • literal
    • Used to refer to a literal value
  • a
    • Used to specify the rdf:type of an entity being described


Working with namespaces

The core of this DSL is the defns operator. What this operator does is to give you the possibility to create the namespaces you want to use to describe your data. Every time you use a namespace, it will generate a URI reference in the triple(s) that will be serialized in Turtle.

However, it is not necessary to create a new namespace every time you want to serialize Turtle data. In some cases, you may not even know what the namespace is since you have the full URI in hands already. It is why there is the iri function that let you serialize a full URI without having to use a namespace.

Namespaces are just shorthand versions of full URIs that mean to make your code cleaner an easier to read and maintain.

Syntactic rules

Here are the general syntactic rules that you have to follow when creating triples in a (rdf) or (turtle) statement:

  1. Wrap all the code using the (rdf) or the (turtle) operator
  2. Every triple need to be explicit. This means that every time you want to create a new triple, you have to mention the subject, predicate and the object of the triple
  3. A fourth “reification” element can be added using the rei operator
  4. The first parameter of any function can be any kind of value: keywords, strings, integer, double, etc. They will be properly serialized as strings in Turtle.

Strings and keywords

As specified in the syntactic rules, at any time, you can use a string, a integer, a double a keyword or any other kind of value as input of the defined namespaces or the other API calls. You only have to use the way that is more convenient for you or that is the cleanest for your taste.

More about reification

Note: RDF reification is quite a different concept than Clojure’s reify macro. So carefully read this section to understand the meaning of the concept in this context.

In RDF, reifying a triple means that we want to add additional information about a specific triple. Let’s take this example:

  (foo :bar) (iron :prefLabel) (literal "Bar"))

In this example, we have a triple that specify the preferred label of the :bar entity. Now, let’s say that we want to add “meta” information about that specific triple, like the date when this triple got added to the system for example.

That additional information is considered the “fourth” element of a triple. It is defined like this:

    (foo :bar) (iron :prefLabel) (literal "Bar") (rei
                                                   (foo :date) (literal "2014-10-25" :type :xsd:dateTime)))

What we do here is to specify additional information regarding the triple itself. In this case, it is the date when the triple got added into our system.

So, reification statements are really “meta information” about triples. Also not that reification statements doesn’t change the semantic of the description of the entities.


Here is a list of examples of how this DSL can be used to generate RDF/Turtle data:

Create a new namespace

The first thing we have to do is define the namespaces we will want to use in our code.

(defns iron "")
(defns foo "")
(defns owl "")
(defns rdf "")
(defns xsd "")

Create a simple triple

The simplest example is to create a simple triple. What this triple does is to define the preferred label of a :bar entity:

  (foo :bar) (iron :prefLabel) (literal "Bar"))


<> <> """Bar""" .

Create a series of triples

This example shows how we can describe more than one attribute for our bar entity:

  (foo :bar) (a) (owl :Thing)
  (foo :bar) (iron :prefLabel) (literal "Bar")
  (foo :bar) (iron :altLabel) (literal "Foo"))


<> <> <> .
<> <> """Bar""" .
<> <> """Foo""" .

Note: we prefer having one triple per line. However, it is possible to have all the triples in one line, but this will produce less readable code:

Just use keywords

It is possible to use keywords everywhere, even in (literals)

  (foo :bar) (a) (owl :Thing)
  (foo :bar) (iron :prefLabel) (literal :Bar)
  (foo :bar) (iron :altLabel) (literal :Foo))


<> <> <> .
<> <> """Bar""" .
<> <> """Foo""" .

Just use strings

It is possible to use strings everywhere, even in namespaces:

  (foo "bar") (a) (owl "Thing")
  (foo "bar") (iron :prefLabel) (literal "Bar")
  (foo "bar") (iron :altLabel) (literal "Foo"))


<> <> <> .
<> <> """Bar""" .
<> <> """Foo""" .

Specifying a datatype in a literal

It is possible to specify a datatype for every (literal) you are defining. You only have to use the :type option and to specify a XSD datatype as value:

  (foo "bar") (foo :probability) (literal 0.03 :type :xsd:double))

Equivalent codes are:

  (foo "bar") (foo :probability) (literal 0.03 :type (xsd :double)))
  (foo "bar") (foo :probability) (literal 0.03 :type (iri "")))


<> <> """0.03"""^^xsd:double .

Specifying a language for a literal

It is possible to specify a language string using the :lang option. The language tag should be a compatible ISO 639-1 language tag.

  (foo "bar") (iron :prefLabel) (literal "Robert" :lang :fr))


<> <> """Robert"""@fr .

Defining a type using the an operator

It is possible to use the (a) predicate as a shortcut to define the rdf:type of an entity:

  (foo "bar") (a) (owl "Thing"))


<> <> <> .

This is a shortcut for:

  (foo "bar") (rdf :type) (owl "Thing"))

Specifying a full URI using the iri operator

It is possible to define a subject, a predicate or an object using the (iri) operator if you want to defined them using the full URI of the entity:

  (iri "") (iri " (iri


<> <> <> .

Simple reification

It is possible to reify any triple suing the (rei) operator as the fourth element of a triple:

  (foo :bar) (iron :prefLabel) (literal "Bar") (rei
                                                 (foo :date) (literal "2014-10-25" :type :xsd:dateTime)))


<> <> """Bar""" .
<rei:6930a1f93513367e174886cb7f7f74b7> <> <> .
<rei:6930a1f93513367e174886cb7f7f74b7> <> <> .
<rei:6930a1f93513367e174886cb7f7f74b7> <> <> .
<rei:6930a1f93513367e174886cb7f7f74b7> <> """Bar""" .
<rei:6930a1f93513367e174886cb7f7f74b7> <> """2014-10-25"""^^xsd:dateTime .

Reify multiple properties

It is possible to add multiple reification statements:

(foo :bar) (iron :prefLabel) (literal "Bar") (rei
                                               (foo :date) (literal "2014-10-25" :type :xsd:dateTime)
                                               (foo :bar) (literal 0.37)))


<> <> """Bar""" .
<rei:6930a1f93513367e174886cb7f7f74b7> <> <> .
<rei:6930a1f93513367e174886cb7f7f74b7> <> <> .
<rei:6930a1f93513367e174886cb7f7f74b7> <> <> .
<rei:6930a1f93513367e174886cb7f7f74b7> <> """Bar""" .
<rei:6930a1f93513367e174886cb7f7f74b7> <> """2014-10-25"""^^xsd:dateTime .
<rei:6930a1f93513367e174886cb7f7f74b7> <> """0.37""" .

Using clj-turtle with clj-osf

clj-turtle is meant to be used in Clojure code to simplify the creation of RDF data. Here is an example of how clj-turtle can be used to generate RDF data to feed to the OSF Crud: Create web service endpoint via the clj-osf DSL:

[require '[clj-osf.crud :as crud])

  (crud/dataset "")
      (iri link) (a) (bibo :Article)
      (iri link) (iron :prefLabel) (literal "Some article")))

Using the turtle alias operator

Depending on your taste, it is possible to use the (turtle) operator instead of the (rdf) one to generate the RDF/Turtle code:

  (foo "bar") (iron :prefLabel) (literal "Robert" :lang :fr))


<> <> """Robert"""@fr .

Merging clj-turtle

Depending the work you have to do in your Clojure application, you may have to generate the Turtle data using a more complex flow of operations. However, this is not an issue for clj-turtle since the only thing you have to do is to concatenate the triples you are creating. You can do so using a simple call to the str function, or you can create more complex processing using loopings, mappings, etc that end up with a (apply str) to generate the final Turtle string.

    (foo "bar") (a) (owl "Thing"))
    (foo "bar") (iron :prefLabel) (literal "Bar")
    (foo "bar") (iron :altLabel) (literal "Foo")))


<> <> <> .
<> <> """Bar""" .
<> <> """Foo""" .


As you can see now, this is a really simple DSL for generating RDF/Turtle code. Even if simple, I find it quite effective by its simplicity. However, even if it quite simple and has a minimum number of operators, this is flexible enough to be able to describe any kind of RDF data. Also, thanks to Clojure, it is also possible to write all kind of code that would generate DSL code that would be executed to generate the RDF data. For example, we can easily create some code that iterates a collection to produce one triple per item of the collection like this:

(->> {:label "a"
      :label "b"
      :label "c"}
      (fn [label]
         (foo :bar) (iron :prefLabel) (literal label))))
     (apply str))

That code would generate 3 triples (or more if the input collection is bigger). Starting with this simple example, we can see how much more complex processes can leverage clj-turtle for generating RDF data.

A future enhancement to this DSL would be to add a syntactic rule that gives the opportunity to the user to only have to specify the suject of a triple the first time it is introduced to mimic the semi-colon of the Turtle syntax.

New UMBEL Concept Tagger Web Service

We just released a new UMBEL web service endpoint and online tool: the Concept Tagger Plain. umbel_ws

This plain tagger uses UMBEL reference concepts to tag an input text. The OBIE (Ontology-Based Information Extraction) method is used, driven by the UMBEL reference concept ontology. By plain we mean that the words (tokens) of the input text are matched to either the preferred labels or alternative labels of the reference concepts. The simple tagger is merely making string matches to the possible UMBEL reference concepts.

This tagger uses the plain labels of the reference concepts as matches against the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text (like stemming, etc.). Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the initial input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow (see conclusion).

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

reference_concept_tagger_uiEDN and ClojureScript

An interesting thing about this user interface is that it has been implemented in ClojureScript and the data serialization exchanged between this user interface and the tagger web service endpoint is in EDN. What is interesting about that is that when the UI receives the resultset from the endpoint, it only has to evaluate the EDN code using the ClojureScript reader (cljs.reader/read-string) to consider the output of the web service endpoint as native data to the application.

No parsing of non-native data format is necessary, which makes the code of the UI simpler and makes the data manipulation much more natural to the developer since no external API is necessary.

What is Next?

This is the first of a series of tagging web service endpoints that will be released. Our intent is to release UMBEL tagging services that have different level of sophistication. Depending on how someone wants to use UMBEL, he will have access to different tagging services that he could use and supplement with their own techniques to end up with their desired results.

The next taggers (not in order) that are planned to be released are:

  • Plaintagger – no weighting or classification except by occurrence count
    • Entity plain tagger (using the Wikidata dictionary)
    • Scones plain tagger – concept + entity
  • Nountagger – with POS, only tags the nouns; generally, the preferred, simplest baselinetagger
    • Concept noun tagger
    • Entity noun tagger
    • Scones noun tagger
  • N-gramtagger – a phrase-basedtagger
    • Concept n-gram tagger
    • Entity n-gram tagger
    • Scones n-gram tagger
  • Completetagger – combinations of above with different machine learning techniques
    • Concept complete tagger
    • Entity complete tagger
    • Scones complete tagger.

So, we welcome you to try out the system online and we welcome your comments and suggestions.

Validating RDF Data by Evaluating RDF/Clojure Code

I recently started to investigate different ways to serialize RDF triples using Clojure code 1 2 3. I had at least two goals in mind: first, ending up with an RDF serialization format that is valid Clojure code and that could easily be manipulated using core Clojure functions. The second goal was to be able to “execute” the code to validate the data according to the semantics of the ontologies used to define the data.

This blog post focuses on showing how the second goal can be implemented.

Before doing so, let’s take some time to explore what the sayings of ‘Code as Data' and ‘Data as Code' may mean in that context.

Code as Data, Data as Code

What is Code as Data? It means that the program code you write is also data that can be manipulated by a program. In other words, the code you are writing can be used as input [to a macro], which can then be transformed and then evaluated. The code is considered to be data to be manipulated by a macro system to output executable code. The code itself becomes data that can be manipulated with some internal mechanism in the language. But the result of these manipulations is still executable code.

What is Data as Code? It means that you can use a programming language’s code to embed (serialize) data. It means that you can specify your own sublanguage (DSL), translate it into code (using macros) and execute the resulting code.

The initial goal of a RDF/Clojure serialization is to specify a way to write RDF triples (data) as Clojure (code). That code is data that can be manipulated by macros to produce executable code. The evaluation of the resulting code is the validation of the data structures (the graph defined by the triples) according to the semantics defined in the ontologies. This means that validating the graph may also occur by evaluating the resulting code (and running the functions).

Ontology Creation

In my previous blog posts about serializing RDF data as Clojure code, I noted that the properties, classes and datatypes that I was referring to in those blog posts were to be defined elsewhere in the Clojure application and that I would cover it in another blog post. Here it is.

All of the ontology properties, classes and datatypes that we are using to serialize the RDF data are defined as Clojure code. They can be defined in a library, directly in your application’s code or even as data that gets emitted by a web service endpoint that you evaluate at runtime (for data that has not yet been evaluated).

In the tests I am doing, I define RDF properties as Clojure functions; the RDF classes and datatypes are normal records that comply with the same RDF serialization rules as defined for the instance records.

Some users may wonder: why is everything defined as a map but not the properties? Though each property’s RDF description is available as a map, we use it as Clojure meta-data for that function. We consider that properties are functions and not a map. As you will see below, these functions are used to validate the RDF data serialized in Clojure code. That is the reason why they are represented as Clojure functions and not as maps like everything else.

Someone could easily leverage the RDF/Clojure serialization without worrying about the ontologies. He could get the triples that describes the records without worrying about the semantics of the data as represented by the ontologies. However, if that same person would like to reason over the data that is presented to him — if he wants to make sure the data is valid and coherent –then he will require the ontologies descriptions.

Now let’s see how these ontologies are being generated.

Creating OWL Classes

As I said above, an OWL class is nothing but another record. It is described using the same rules as previously defined4. However, it is described using the OWL language and refers to a specific semantic. Creating such a class is really easy. We just have to follow the semantics of the OWL language, and the rules of RDF/Clojure serialization. For example, take this example that creates a simple FOAF person class:

(def foaf:+person
  "The class of all the persons."
  {#'uri ""
   #'rdf:type #'owl:+class
   #'rdfs:label "Person"
   #'rdfs:comment "The class of all the persons."})

As you can see, we are describing the class the same way we were defining normal instance records. However, we are doing it using the OWL language.

Creating OWL Datatypes

Datatypes are also serialized like normal RDF/Clojure records; that is, just like classes. However, since the datatypes are fairly static in the way we define them, I created a simple macro called gen-datatype that can be used to generate datatypes:

(defmacro gen-datatype
  "Create a new datatype that represents a OWL datatype class.
   [name] is the name of the datatype to create.
   Optional parameters are:
     [:uri] this is the URI of the datatype to create
     [:base] this is the URI of base XSD datatype of this new datatype
     [:pattern] this is a regex pattern to use to use to validate that
                a given string represent a value that belongs to that datatype
     [:docstring] the docstring to use when creating this datatype"

  [name &amp; {:keys [uri base pattern docstring]}]
  `(def ~name
     ~(str docstring)
     (merge {#'rdf:type ""}
            (if ~uri {#'rdf.core/uri ~uri})<br />
            (if ~pattern {#'xsp:pattern ~pattern})
            (if ~base {#'xsp:base ~base}))))

You can use this macro like this:

(gen-datatype *full-us-phone-number
              :uri ""
              :pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$"
              :base ""
              :docstring "Datatype representing a phone US phone number")

And it will generate a datatype like this:

{#'ontologies.core/xsp:base ""
 #'ontologies.core/xsp:pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$"
 #'rdf.core/uri ""
 #'ontologies.core/rdf:type ""}

What this datatype defines is a class of literals that represents the full version of an US phone number. I will explain how such a datatype is used to validate RDF data records below.

Creating OWL Properties

Properties are different from classes and datatypes. They are represented as functions in the RDF/Clojure serialization. I created another simple macro called gen-property to generate these OWL properties:

(defmacro gen-property
  "Create a new property that represents a OWL property.
     [name] is the name of the property/function to create. This is the name that will be
            used in your Clojure code.
     [:uri] this is the URI of the property to create
     [:description] this is the description of the property to create
     [:domain] this is the domain of the URI to create. The domain is represented by one or multiple
               classes that represent that domain. If there is more than one class that represent the domain
               you can specify the ^intersection-of or the ^union-of meta-data to specify if the classes
               should be interpreted as a union or an intersection of the set of classes.
     [:range] this is the range of the URI to create. The range is represented by one or multiple
               classes that represent that range. If there is more than one class that represent the range
               you can specify the ^intersection-of or the ^union-of meta-data to specify if the classes
               should be interpreted as a union or an intersection of the set of classes.
     [:sub-class-of] one or multiple classes that are super-classes of this class
     [:equivalent-property] one or multiple classes that are equivalent classes of this class
     [:is-object-property] true if the property being created is an object property
     [:is-datatype-property] true if the property being created is a datatype property
     [:is-annotation-property] true if the property being created is an annotation property
     [:cardinality] cardinality of the property"

  [name &amp; {:keys [uri
  (let [vals (gensym "label-")
        docstring (if description
                    (str description ".\n [" vals "] is the preferred label to specify.")
                    (str ""))
        type (if is-object-property
               (if is-annotation-property
        metadata (merge (if uri {#'rdf.core/uri uri})
                        (if type {#'rdf:type type})
                        (if label {#'iron:pref-label label})
                        (if description {#'iron:description description})
                        (if range {#'rdfs:range range})
                        (if domain {#'rdfs:domain domain})
                        (if cardinality {#'owl:cardinality cardinality}))]
     `(defn ~(with-meta name metadata)
        ~(str docstring)
        ( #'~name ~vals))))

Note that this macro currently only accommodates a subset of the OWL language. For example, there is no way to use the macro to specify cardinality, etc. I only created what was required for writing this blog post.

You can then use this macro to create new properties like this:

(gen-property foo:phone
              :is-datatype-property true
              :label "phone number"
              :uri ""
              :range *full-us-phone-number
              :domain #'owl:+thing
              :cardinality 1)

(gen-property foo:knows
              :is-object-property true
              :label "a person that knows another person"
              :uri ""
              :range #'umbel.ref/umbel-rc:+person
              :domain #'umbel.ref/umbel-rc:+person)

Some other Classes, Datatypes and Properties

So, here is the list of classes, datatypes and properties that will be used later in this blog post for demonstrating how validation occurs in such a framework:

(in-ns 'rdf.core)
(defn uri
    (URI. #^String s)
    (catch Exception e
      (throw (IllegalStateException. (str "Invalid URI: \"" s "\""))))))

(defn datatype
  (if (var? s)
    (if (not= (get @s #'ontologies.core/rdf:type) "")
      (throw (IllegalStateException. (str "Provided value for datatype is not a datatype: \"" s "\""))))
    (throw (IllegalStateException. (str "Provided value for datatype is not a datatype: \"" s "\"")))))

(in-ns 'ontologies.core)

(gen-property iron:pref-label
              :uri ""
              :label "Preferred label"
              :description "Preferred label for describing a resource"
              :domain #'owl:+thing
              :range #'rdfs:*literal
              :is-datatype-property true)

(def owl:+thing
  "The class of OWL individuals."
  {#'uri ""
   #'rdf:type #'rdfs:+class
   #'rdfs:label "Thing"
   #'rdfs:comment "The class of OWL individuals."})

(gen-datatype xsd:*string
              :uri ""
              :docstring "Datatypes that represents all the XSD strings")

Concluding with Ontologies

Ontologies are easy to write in RDF/Clojure. There is a simple set of macros that can be used to help create the ontology classes, properties and datatypes. However, in the future I am anticipating to create a library that would use the OWLAPI to take any OWL ontology and to serialize it using these rules. The output could be Clojure code like this, or JAR libraries. Additionally, some investigation will be done to use more Clojure idiomatic projects like Phil Lord’s Tawny-OWL project.

RDF Data Instantiation Using Clojure Code

Now that we have the classes, datatypes and properties defined in our Clojure application, we can start defining data records like this:

(def valid-record (r {uri ""
                      rdf:type owl:+thing
                      foo:phone ["1-421-353-9057"]
                      iron:pref-label {value "Test cardinality validation"
                                       lang "en"
                                       datatype xsd:*string}}))

Data Validation

Now that we have all of the ontologies defined in our Clojure application, we can start to define records. Let’s start with a record called valid-record that describes something with a phone number and a preferred label. The data is there and available to you. Now, what if I would like to do a bit more than this, what if I would like to validate it?

Validating such a record is as easy as evaluating it. What does that mean? It means that each value of the map that describes the record will be evaluated by Clojure. Since each key refers to a function, then evaluating each value means that we evaluate the function and use the value as specified by the description of the record. Then we iterate over the whole map to validate all of the triples.

To perform this kind of process, we can create a validate-resource function that looks like:

(defn validate-resource [resource]
  (doseq [[property value] resource]
    (do (println (str "validating resource property: " property))
    (if (fn? @property)
      (@property value)))))

You can use it like this:

(validate-resource valid-record)

If no exceptions are thrown, then the record is considered valid according to the ontology specifications. Easy, no? Now let’s take a look at how this works.

If you check the gen-property macro, you will notice that every time a function is evaluated, the #' function is called. What this function does is to perform the validation of the property given the specified value(s). The validation is done according to the description of the property in the ontology specification. Such a validate-property looks like:

(defn validate-property
  "Validate that the values of the property are valid according to the description of that property
   [property] should be the reference to the function, like #'foo-phone
   [values] are the actual values of that property"

  [property values]
    (validate-owl-cardinality property values)
    (validate-rdfs-range property values)))

So what it does is to run a series of other functions to validate different characteristics of a property. For this blog post, we demonstrate how the following characteristics are being validated:

  1. Cardinality of a property
  2. URI validation
  3. Datatype validation
  4. Range validation when the range is a class.

Cardinality Validation

Validating the cardinality of a property means that we check if the number of values of a given property is as specified in the ontology. In this example, we validate the exact cardinality of a property. It could be extended to validate the maximum and minimum cardinalities as well.

The function that validates the cardinality is the validate-owl-cardinality function that is defined as:

(defn validate-owl-cardinality
  [property values]
  (doseq [[meta-key meta-val] (seq (meta property))]
    ; Only validate if there is a owl/cardinality property defined in the metadata
    (if (= meta-key #'ontologies.core/owl:cardinality)
      ; If the value is a string, a var or a map, we check if the cardinality is 1
      (if (or (string? values) (map? values) (var? values))
        (if (not= meta-val 1)
          (throw (IllegalStateException.
                  (format "CARDINALITY VALIDATION ERROR: property %s has 1 values and was expecting %d values" property meta-val))))
        ; If the value is an array, we validate the expected cardinality
        (if (not= (count values) meta-val )
          (throw (IllegalStateException.
                  (format "CARDINALITY VALIDATION ERROR: property %s has %d values and was expecting %d values" property (count values) meta-val))))))))

For each property, it checks to see if the owl:cardinality property is defined. If it is, then it makes sure that the number of values for that property is valid according to what is defined in the ontology. If there is a mismatch, then the validation function will throw an exception and the validation process will stop.

Here is an example of a record that has a cardinality validation error as defined by the property (see the description of the property below):

(def card-validation-test (r {uri ""
                              rdf:type owl:+thing
                              foo:phone ["1-421-353-9057" "(1)-(412)-342-3246"]
                              iron:pref-label {value "Test cardinality validation"
                                               lang "en"
                                               datatype xsd:*string}}))
user> (validate-resource card-validation-test)
IllegalStateException CARDINALITY VALIDATION ERROR: property #'dataset-test.core/foo:phone has 2 values and was expecting 1 values (property.clj:36)

URI Validation

Everything you define in RDF/Clojure has a URI. However, not every string is a valid URI. All of the URIs you may define can be validated as well. When you define a URI, you use the #'rdf.core/uri function to specify the URI. That function is defined as:

(defn uri
    (URI. #^String s)
    (catch Exception e
      (throw (IllegalStateException. (str "Invalid URI: \"" s "\""))))))

As you can see, we are using the function to validate the URI you are defining for your records/classes/properties/datatypes. If you make a mistake when writing a URI, then a validation error will be thrown and the validation process will stop.

Here is an example of a record that has an invalid URI:

(def uri-validation-test (r {uri "-"
                             rdf:type owl:+thing
                             foo:phone "1-421-353-9057"
                             iron:pref-label {value "Test URI validation"
                                              lang "en"
                                              datatype xsd:*string}}))
user> (validate-resource uri-validation-test)
IllegalStateException Invalid URI: "-"  rdf.core/uri (core.clj:16)

Datatype Validation

In OWL, a datatype property is used to refer to literal values that belong to classes of literals (datatypes classes). A datatype class is a class that represents all the literals that belong to that class of literal values as defined by the datatype. For example, the *full-us-phone-number datatype we described above defines the class of all the literals that are full US phone numbers.

Validating the value of a property according to its datatype means that we make sure that the literal value(s) belong to that datatype. Most of the time, people will use the XSD datatypes. If custom datatypes are created, then they will be based on one of the XSD datatypes, and a regex pattern will be defined to specify how the literal should be constructed.

(defn validate-rdfs-range
  [property values]
    ; If the value is a map, then validate the "value", "lang" and "datatype" assertions
    (if (map? values)
      (validate-map-properties values))
    (doseq [[meta-key ranges] (seq (meta property))]
      ; make sure a range is defined for this property
      (if (= meta-key #'ontologies.core/rdfs:range)
        (let [ranges (if (vector? ranges)
                       ^:intersection-of [ranges])]
          (if (true? (:intersection-of (meta ranges)))
            ; consider that all the values of the range is a intersection-of
            (doseq [range ranges]<br />
              (if (is-datatype-property? property)
                ; we are checking the range of a datatype property
                ; @TODO here we have to change that portion to call a function that will do the validation
                ;       according to the existing XSD types, or any custom datatype based on these core
                ;       XSD datatypes. Just like the DVT (Dataset Validation Tool)
                ;       For now, we simply test using a datatype that has a pattern defined.
                (let [pattern (get range #'ontologies.core/xsp:pattern)]
                  (if pattern
                    ; a validation pattern has been defined for this value
                    (if (vector? values)
                      ; Validate all the values of the property according to this Datatype
                      (doseq [v values]
                        (validate-range-pattern v pattern ranges))
                      ; Validate the value according to the datatype
                      (validate-range-pattern values pattern ranges))))
                ; we are checking the range of an object property
                (if (vector? values)
                  (doseq [v values]
                    (validate-range-object v range property))
                  (validate-range-object values range property))))
            ; consider that all the values of the range is an union-of
            (println "@TODO Ranges union validation")))))))

(defn- validate-range-pattern
  [v pattern range]
  (if (string? v)
    (if (nil? (re-seq (java.util.regex.Pattern/compile pattern) v))
      (throw (IllegalStateException.
              (format "Value \"%s\" invalid according to the definition of the datatype \"%s\""  v range))))
    (if (and (map? v) (nil? (validate-map-properties v)))
      (if (nil? (re-seq (java.util.regex.Pattern/compile pattern) (get v 'value)))
        (throw (IllegalStateException.
                (format "Value \"%s\" invalid according to the definition of the datatype \"%s\""  v range)))))))

(defn- validate-map-properties
  (doseq [[p v] m]
        (if (fn? @p)
          (@p v))))

What this function does is to validate the range of a property. It checks what kind of values that exist for the input property according to the RDF/Clojure specification (is it a string, a map, an array, a var, etc.?). Then it checks if the property is an object property or a datatype property. If it is a datatype property, then it checks if a range has been defined for it. If it does, then it validates the value(s) according to the datatype defined in the range of the property.

Here is an example of a few records that have different datatype validation errors:

(def datatype-validation-test (r {uri ""
                                  rdf:type owl:+thing
                                  foo:phone "1-421-353-90573"
                                  iron:pref-label {value "Test cardinality validation"
                                                   lang "en"
                                                   datatype xsd:*string}}))
(def datatype-validation-test-2 (r {uri ""
                                  rdf:type owl:+thing
                                  foo:phone "1-421-353-9057"
                                  iron:pref-label {value "Test datatype validation"
                                                   lang "en"
                                                   datatype "not-a-datatype"}}))

(def xsd:*string-not-a-datatype)

(def datatype-validation-test-3 (r {uri ""
                                    rdf:type owl:+thing
                                    foo:phone "1-421-353-9057"
                                    iron:pref-label {value "Test datatype validation"
                                                     lang "en"
                                                     datatype xsd:*string-not-a-datatype}}))

(def datatype-validation-test-4 (r {uri ""
                                    rdf:type owl:+thing
                                    foo:phone [{value "1-421-353-9057"
                                                datatype xsd:<em>string-not-a-datatype}]
                                    iron:pref-label {value "Test datatype validation"
                                                     lang "en"
                                                     datatype xsd:</em>string}}))
user> (validate-resource datatype-validation-test)
IllegalStateException Value "1-421-353-90573" invalid according to the definition of the datatype "[{#'ontologies.core/xsp:pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$", #'rdf.core/uri "", #'ontologies.core/rdf:type ""}]" (property.clj:150)

user> (validate-resource datatype-validation-test-2)
IllegalStateException Provided value for datatype is not a datatype: "not-a-datatype"  rdf.core/datatype (core.clj:31)

user> (validate-resource datatype-validation-test-3)
IllegalStateException Provided value for datatype is not a datatype: "#'dataset-test.core/xsd:*string-not-a-datatype"  rdf.core/datatype (core.clj:30)

user> (validate-resource datatype-validation-test-4)
IllegalStateException Provided value for datatype is not a datatype: "#'dataset-test.core/xsd:*string-not-a-datatype"  rdf.core/datatype (core.clj:30)

As you can see, the validate-rdfs-range is incomplete regarding datatype validation. I am still updating this function to make sure that we validate all the existing XSD datatypes. Then we have to better validate the custom datatypes to make sure that we consider their xsp:base type, etc. The code that should be created is similar to the one I created for the Data Validation Tool (which is written in PHP).

Range validation when the range is a class

Finally, let’s shows how the range of an object property can be validated. Validating the range of an object property means that we make sure that the record referenced by the object property belongs to the class of the range of the property.

For example, consider a property foo:knows that has a range that specifies that all the values of foo:knows needs to belong to the class umbel-rc:+person. This means that all of the values defined for the foo:knows property for any record needs to refer to a record that is of type umbel-rc:+person. If it is not the case, then there is a validation error.

Here is an example of a record where the foo:knows property is not properly used:

(def wrench (r {uri ""
               rdf:type umbel.ref/umbel-rc:+product
               iron:pref-label "The biggest wrench ever"}))

(def object-range-validation-test (r {uri ""
                                      rdf:type umbel.ref/umbel-rc:+person
                                      foo:knows wrench
                                      iron:pref-label {value "Test object range validation"
                                                       lang "en"
                                                       datatype xsd:*string}}))

Remember we defined the foo:knows property with the range of umbel-rc:+person. However, in the example, the reference is to a wrench record that is of type umbel-rc:+product. Thus, we get a validation error:

user> (validate-resource object-range-validation-test)
IllegalStateException The resource "" referenced by the property "#'dataset-test.core/foo:knows" does not belong to the class "#'umbel.ref/umbel-rc:+person" as defined by the range of the property (property.clj:142)

The function that validates the ranges of the object properties is defined as:

(defn- validate-range-object
  [r range property]
  (do (println range)
  (let [r (if (var? r)
            (deref r)
            (if (map? r)
              (if (string? r)
                ; @TODO get the resource's description from a dataset index
        uri (get (deref (get r #'ontologies.core/rdf:type)) #'rdf.core/uri)
        uri-ending (do (println uri) (if (> (.lastIndexOf uri "/") -1)
                     (subs uri (inc (.lastIndexOf uri "/")))
                     (str "")))
        super-classes (try
                        (read-string (:body (clj-http.client/get (str "" uri-ending)
                                                                 {:headers {"Accept" "application/clojure"}
                                                                  :throw-exceptions false})))
                        (catch Exception e
                          (eval nil)))
        range-uri (get @range #'rdf.core/uri)]
    (if-not (some #{range-uri} super-classes)
      (throw (IllegalStateException. (str "The resource \"" uri "\" referenced by the property \"" property "\" does not belong to the class \"" range "\" as defined by the range of the property" )))))))

Normally, this kind validation should be done using the descriptions of the loaded ontologies. However, for the benefit of this blog post, I used a different way to perform this validation. I purposefully used some UMBEL Reference Concepts as the type of the records I described. Then the object range validation function leverages the UMBEL super-classes web service endpoint to check get the super-classes of a given class.

So what this function does is to check the type of the record(s) referenced by the foo:knows property. Then it checks the type of these record(s). What needs to be validated is whether the type(s) of the referenced record is the same, or is included, in the class defined in the range of the foo:knows property.

In our example, the range is #'umbel-rc:+person. This means that the foo:knows property can only refer to umbel-rc:+person records. In the example where we have a validation error, the type of the wrench record is umbel-rc:+product. What the validation function does is to get the list of all the super classes of the umbel-rc:+product class, and check if it is a sub-class of the umbel-rc:+person class. In this case, it is not, thus an error is thrown.

What is interesting with this example is the UMBEL super-classes web service endpoint does return the list of super classes as Clojure code. Then we use the read-string function to evaluate the list before manipulating it as if it was part of the application’s code.


What is elegant with this kind RDF/Clojure serialization is that the validation of RDF data is the same as evaluating the underlying code (Data as Code). If the data is invalid, then exceptions are thrown and the validation process aborts.

One thing that I yet have to investigate with such a RDF/Clojure serialization is how the semantics of the properties, classes and datatypes could be embedded into the RDF/Clojure records such that we end up with stateful RDF records that embed their own semantic at a specific point in time. This leverage would mean that even if an ontology changes in the future, the records will still be valid according to the original ontology that was used to describe them at a specific point in time (when they got written, when they got emitted by a web service endpoint, etc.).

Also, as some of my readers pointed out with my previous blog post about this subject, the fact that I use vars to serialize the RDF triples means that the serialization won’t produce valid ClojureScript code since vars doesn’t exists in ClojureScript. Paul Gearon was proposing to use keywords as the key instead of vars. Then to get the same effect as with the vars, to use a lookup index to call the functions. This avenue will be investigated as well and should be the topic of a future blog post about this RDF/Clojure serialization.

This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about data mining, data integration, data publishing, the semantic Web, my researches and other related software development.

RSS Twitter LinkedIN


Get every new post on this blog delivered to your Inbox.

Join 73 other followers:

Or subscribe to the RSS feed by clicking on the counter:

RSS Twitter LinkedIN