Clojure, Semantic Web, Structured Dynamics

Validating RDF Data by Evaluating RDF/Clojure Code

I recently started to investigate different ways to serialize RDF triples using Clojure code 1 2 3. I had at least two goals in mind: first, ending up with an RDF serialization format that is valid Clojure code and that could easily be manipulated using core Clojure functions. The second goal was to be able to “execute” the code to validate the data according to the semantics of the ontologies used to define the data.

This blog post focuses on showing how the second goal can be implemented.

Before doing so, let’s take some time to explore what the sayings of ‘Code as Data' and ‘Data as Code' may mean in that context.

Code as Data, Data as Code

What is Code as Data? It means that the program code you write is also data that can be manipulated by a program. In other words, the code you are writing can be used as input [to a macro], which can then be transformed and then evaluated. The code is considered to be data to be manipulated by a macro system to output executable code. The code itself becomes data that can be manipulated with some internal mechanism in the language. But the result of these manipulations is still executable code.

What is Data as Code? It means that you can use a programming language’s code to embed (serialize) data. It means that you can specify your own sublanguage (DSL), translate it into code (using macros) and execute the resulting code.

The initial goal of a RDF/Clojure serialization is to specify a way to write RDF triples (data) as Clojure (code). That code is data that can be manipulated by macros to produce executable code. The evaluation of the resulting code is the validation of the data structures (the graph defined by the triples) according to the semantics defined in the ontologies. This means that validating the graph may also occur by evaluating the resulting code (and running the functions).

Ontology Creation

In my previous blog posts about serializing RDF data as Clojure code, I noted that the properties, classes and datatypes that I was referring to in those blog posts were to be defined elsewhere in the Clojure application and that I would cover it in another blog post. Here it is.

All of the ontology properties, classes and datatypes that we are using to serialize the RDF data are defined as Clojure code. They can be defined in a library, directly in your application’s code or even as data that gets emitted by a web service endpoint that you evaluate at runtime (for data that has not yet been evaluated).

In the tests I am doing, I define RDF properties as Clojure functions; the RDF classes and datatypes are normal records that comply with the same RDF serialization rules as defined for the instance records.

Some users may wonder: why is everything defined as a map but not the properties? Though each property’s RDF description is available as a map, we use it as Clojure meta-data for that function. We consider that properties are functions and not a map. As you will see below, these functions are used to validate the RDF data serialized in Clojure code. That is the reason why they are represented as Clojure functions and not as maps like everything else.

Someone could easily leverage the RDF/Clojure serialization without worrying about the ontologies. He could get the triples that describes the records without worrying about the semantics of the data as represented by the ontologies. However, if that same person would like to reason over the data that is presented to him — if he wants to make sure the data is valid and coherent –then he will require the ontologies descriptions.

Now let’s see how these ontologies are being generated.

Creating OWL Classes

As I said above, an OWL class is nothing but another record. It is described using the same rules as previously defined4. However, it is described using the OWL language and refers to a specific semantic. Creating such a class is really easy. We just have to follow the semantics of the OWL language, and the rules of RDF/Clojure serialization. For example, take this example that creates a simple FOAF person class:

(def foaf:+person
  "The class of all the persons."
  {#'uri "http://xmlns.com/foaf/0.1/Person"
   #'rdf:type #'owl:+class
   #'rdfs:label "Person"
   #'rdfs:comment "The class of all the persons."})

As you can see, we are describing the class the same way we were defining normal instance records. However, we are doing it using the OWL language.

Creating OWL Datatypes

Datatypes are also serialized like normal RDF/Clojure records; that is, just like classes. However, since the datatypes are fairly static in the way we define them, I created a simple macro called gen-datatype that can be used to generate datatypes:

(defmacro gen-datatype
  "Create a new datatype that represents a OWL datatype class.
   [name] is the name of the datatype to create.
   Optional parameters are:
     [:uri] this is the URI of the datatype to create
     [:base] this is the URI of base XSD datatype of this new datatype
     [:pattern] this is a regex pattern to use to use to validate that
                a given string represent a value that belongs to that datatype
     [:docstring] the docstring to use when creating this datatype"

  [name & {:keys [uri base pattern docstring]}]
  `(def ~name
     ~(str docstring)
     (merge {#'rdf:type "http://www.w3.org/TR/rdf-schema#Datatype"}
            (if ~uri {#'rdf.core/uri ~uri})<br />
            (if ~pattern {#'xsp:pattern ~pattern})
            (if ~base {#'xsp:base ~base}))))

You can use this macro like this:

(gen-datatype *full-us-phone-number
              :uri "http://purl.org/ontology/foo#phone-number"
              :pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$"
              :base "http://www.w3.org/2001/XMLSchema#string"
              :docstring "Datatype representing a phone US phone number")

And it will generate a datatype like this:

{#'ontologies.core/xsp:base "http://www.w3.org/2001/XMLSchema#string"
 #'ontologies.core/xsp:pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$"
 #'rdf.core/uri "http://purl.org/ontology/foo#phone-number"
 #'ontologies.core/rdf:type "http://www.w3.org/TR/rdf-schema#Datatype"}

What this datatype defines is a class of literals that represents the full version of an US phone number. I will explain how such a datatype is used to validate RDF data records below.

Creating OWL Properties

Properties are different from classes and datatypes. They are represented as functions in the RDF/Clojure serialization. I created another simple macro called gen-property to generate these OWL properties:

(defmacro gen-property
  "Create a new property that represents a OWL property.
     [name] is the name of the property/function to create. This is the name that will be
            used in your Clojure code.
     [:uri] this is the URI of the property to create
     [:description] this is the description of the property to create
     [:domain] this is the domain of the URI to create. The domain is represented by one or multiple
               classes that represent that domain. If there is more than one class that represent the domain
               you can specify the ^intersection-of or the ^union-of meta-data to specify if the classes
               should be interpreted as a union or an intersection of the set of classes.
     [:range] this is the range of the URI to create. The range is represented by one or multiple
               classes that represent that range. If there is more than one class that represent the range
               you can specify the ^intersection-of or the ^union-of meta-data to specify if the classes
               should be interpreted as a union or an intersection of the set of classes.
     [:sub-class-of] one or multiple classes that are super-classes of this class
     [:equivalent-property] one or multiple classes that are equivalent classes of this class
     [:is-object-property] true if the property being created is an object property
     [:is-datatype-property] true if the property being created is a datatype property
     [:is-annotation-property] true if the property being created is an annotation property
     [:cardinality] cardinality of the property"

  [name &amp; {:keys [uri
                  label
                  description
                  domain
                  range
                  sub-property-of
                  equivalent-property
                  is-object-property
                  is-datatype-property
                  is-annotation-property
                  cardinality]}]
  (let [vals (gensym "label-")
        docstring (if description
                    (str description ".\n [" vals "] is the preferred label to specify.")
                    (str ""))
        type (if is-object-property
               #'owl:+object-property
               (if is-annotation-property
                 #'owl:+annotation-property
                 #'owl:+datatype-property))
        metadata (merge (if uri {#'rdf.core/uri uri})
                        (if type {#'rdf:type type})
                        (if label {#'iron:pref-label label})
                        (if description {#'iron:description description})
                        (if range {#'rdfs:range range})
                        (if domain {#'rdfs:domain domain})
                        (if cardinality {#'owl:cardinality cardinality}))]
     `(defn ~(with-meta name metadata)
        ~(str docstring)
        [~vals]
        (rdf.property/validate-property #'~name ~vals))))

Note that this macro currently only accommodates a subset of the OWL language. For example, there is no way to use the macro to specify cardinality, etc. I only created what was required for writing this blog post.

You can then use this macro to create new properties like this:

(gen-property foo:phone
              :is-datatype-property true
              :label "phone number"
              :uri "http://purl.org/ontology/foo#phone"
              :range *full-us-phone-number
              :domain #'owl:+thing
              :cardinality 1)

(gen-property foo:knows
              :is-object-property true
              :label "a person that knows another person"
              :uri "http://purl.org/ontology/foo#knows"
              :range #'umbel.ref/umbel-rc:+person
              :domain #'umbel.ref/umbel-rc:+person)

Some other Classes, Datatypes and Properties

So, here is the list of classes, datatypes and properties that will be used later in this blog post for demonstrating how validation occurs in such a framework:

(in-ns 'rdf.core)
(defn uri
  [s]
  (try
    (URI. #^String s)
    (catch Exception e
      (throw (IllegalStateException. (str "Invalid URI: \"" s "\""))))))

(defn datatype
  [s]
  (if (var? s)
    (if (not= (get @s #'ontologies.core/rdf:type) "http://www.w3.org/TR/rdf-schema#Datatype")
      (throw (IllegalStateException. (str "Provided value for datatype is not a datatype: \"" s "\""))))
    (throw (IllegalStateException. (str "Provided value for datatype is not a datatype: \"" s "\"")))))

(in-ns 'ontologies.core)

(gen-property iron:pref-label
              :uri "http://purl.org/ontology/iron#prefLabel"
              :label "Preferred label"
              :description "Preferred label for describing a resource"
              :domain #'owl:+thing
              :range #'rdfs:*literal
              :is-datatype-property true)

(def owl:+thing
  "The class of OWL individuals."
  {#'uri "http://www.w3.org/2002/07/owl#Thing"
   #'rdf:type #'rdfs:+class
   #'rdfs:label "Thing"
   #'rdfs:comment "The class of OWL individuals."})

(gen-datatype xsd:*string
              :uri "http://www.w3.org/2001/XMLSchema#string"
              :docstring "Datatypes that represents all the XSD strings")

Concluding with Ontologies

Ontologies are easy to write in RDF/Clojure. There is a simple set of macros that can be used to help create the ontology classes, properties and datatypes. However, in the future I am anticipating to create a library that would use the OWLAPI to take any OWL ontology and to serialize it using these rules. The output could be Clojure code like this, or JAR libraries. Additionally, some investigation will be done to use more Clojure idiomatic projects like Phil Lord’s Tawny-OWL project.

RDF Data Instantiation Using Clojure Code

Now that we have the classes, datatypes and properties defined in our Clojure application, we can start defining data records like this:

(def valid-record (r {uri "http://foo-bar.com/test/"
                      rdf:type owl:+thing
                      foo:phone ["1-421-353-9057"]
                      iron:pref-label {value "Test cardinality validation"
                                       lang "en"
                                       datatype xsd:*string}}))

Data Validation

Now that we have all of the ontologies defined in our Clojure application, we can start to define records. Let’s start with a record called valid-record that describes something with a phone number and a preferred label. The data is there and available to you. Now, what if I would like to do a bit more than this, what if I would like to validate it?

Validating such a record is as easy as evaluating it. What does that mean? It means that each value of the map that describes the record will be evaluated by Clojure. Since each key refers to a function, then evaluating each value means that we evaluate the function and use the value as specified by the description of the record. Then we iterate over the whole map to validate all of the triples.

To perform this kind of process, we can create a validate-resource function that looks like:

(defn validate-resource [resource]
  (doseq [[property value] resource]
    (do (println (str "validating resource property: " property))
    (if (fn? @property)
      (@property value)))))

You can use it like this:

(validate-resource valid-record)

If no exceptions are thrown, then the record is considered valid according to the ontology specifications. Easy, no? Now let’s take a look at how this works.

If you check the gen-property macro, you will notice that every time a function is evaluated, the #'rdf.property/validate-property function is called. What this function does is to perform the validation of the property given the specified value(s). The validation is done according to the description of the property in the ontology specification. Such a validate-property looks like:

(defn validate-property
  "Validate that the values of the property are valid according to the description of that property
   [property] should be the reference to the function, like #'foo-phone
   [values] are the actual values of that property"

  [property values]
  (do
    (validate-owl-cardinality property values)
    (validate-rdfs-range property values)))

So what it does is to run a series of other functions to validate different characteristics of a property. For this blog post, we demonstrate how the following characteristics are being validated:

  1. Cardinality of a property
  2. URI validation
  3. Datatype validation
  4. Range validation when the range is a class.

Cardinality Validation

Validating the cardinality of a property means that we check if the number of values of a given property is as specified in the ontology. In this example, we validate the exact cardinality of a property. It could be extended to validate the maximum and minimum cardinalities as well.

The function that validates the cardinality is the validate-owl-cardinality function that is defined as:

(defn validate-owl-cardinality
  [property values]
  (doseq [[meta-key meta-val] (seq (meta property))]
    ; Only validate if there is a owl/cardinality property defined in the metadata
    (if (= meta-key #'ontologies.core/owl:cardinality)
      ; If the value is a string, a var or a map, we check if the cardinality is 1
      (if (or (string? values) (map? values) (var? values))
        (if (not= meta-val 1)
          (throw (IllegalStateException.
                  (format "CARDINALITY VALIDATION ERROR: property %s has 1 values and was expecting %d values" property meta-val))))
        ; If the value is an array, we validate the expected cardinality
        (if (not= (count values) meta-val )
          (throw (IllegalStateException.
                  (format "CARDINALITY VALIDATION ERROR: property %s has %d values and was expecting %d values" property (count values) meta-val))))))))

For each property, it checks to see if the owl:cardinality property is defined. If it is, then it makes sure that the number of values for that property is valid according to what is defined in the ontology. If there is a mismatch, then the validation function will throw an exception and the validation process will stop.

Here is an example of a record that has a cardinality validation error as defined by the property (see the description of the property below):

(def card-validation-test (r {uri "http://foo-bar.com/test/"
                              rdf:type owl:+thing
                              foo:phone ["1-421-353-9057" "(1)-(412)-342-3246"]
                              iron:pref-label {value "Test cardinality validation"
                                               lang "en"
                                               datatype xsd:*string}}))
user> (validate-resource card-validation-test)
IllegalStateException CARDINALITY VALIDATION ERROR: property #'dataset-test.core/foo:phone has 2 values and was expecting 1 values  rdf.property/validate-owl-cardinality (property.clj:36)

URI Validation

Everything you define in RDF/Clojure has a URI. However, not every string is a valid URI. All of the URIs you may define can be validated as well. When you define a URI, you use the #'rdf.core/uri function to specify the URI. That function is defined as:

(defn uri
  [s]
  (try
    (URI. #^String s)
    (catch Exception e
      (throw (IllegalStateException. (str "Invalid URI: \"" s "\""))))))

As you can see, we are using the java.net.URI function to validate the URI you are defining for your records/classes/properties/datatypes. If you make a mistake when writing a URI, then a validation error will be thrown and the validation process will stop.

Here is an example of a record that has an invalid URI:

(def uri-validation-test (r {uri "-http://foo-bar.com/test/"
                             rdf:type owl:+thing
                             foo:phone "1-421-353-9057"
                             iron:pref-label {value "Test URI validation"
                                              lang "en"
                                              datatype xsd:*string}}))
user> (validate-resource uri-validation-test)
IllegalStateException Invalid URI: "-http://foo-bar.com/test/"  rdf.core/uri (core.clj:16)

Datatype Validation

In OWL, a datatype property is used to refer to literal values that belong to classes of literals (datatypes classes). A datatype class is a class that represents all the literals that belong to that class of literal values as defined by the datatype. For example, the *full-us-phone-number datatype we described above defines the class of all the literals that are full US phone numbers.

Validating the value of a property according to its datatype means that we make sure that the literal value(s) belong to that datatype. Most of the time, people will use the XSD datatypes. If custom datatypes are created, then they will be based on one of the XSD datatypes, and a regex pattern will be defined to specify how the literal should be constructed.

(defn validate-rdfs-range
  [property values]
  (do
    ; If the value is a map, then validate the "value", "lang" and "datatype" assertions
    (if (map? values)
      (validate-map-properties values))
    (doseq [[meta-key ranges] (seq (meta property))]
      ; make sure a range is defined for this property
      (if (= meta-key #'ontologies.core/rdfs:range)
        (let [ranges (if (vector? ranges)
                       ranges
                       ^:intersection-of [ranges])]
          (if (true? (:intersection-of (meta ranges)))
            ; consider that all the values of the range is a intersection-of
            (doseq [range ranges]<br />
              (if (is-datatype-property? property)
                ; we are checking the range of a datatype property
                ; @TODO here we have to change that portion to call a function that will do the validation
                ;       according to the existing XSD types, or any custom datatype based on these core
                ;       XSD datatypes. Just like the DVT (Dataset Validation Tool)
                ;
                ;       For now, we simply test using a datatype that has a pattern defined.
                (let [pattern (get range #'ontologies.core/xsp:pattern)]
                  (if pattern
                    ; a validation pattern has been defined for this value
                    (if (vector? values)
                      ; Validate all the values of the property according to this Datatype
                      (doseq [v values]
                        (validate-range-pattern v pattern ranges))
                      ; Validate the value according to the datatype
                      (validate-range-pattern values pattern ranges))))
                ; we are checking the range of an object property
                (if (vector? values)
                  (doseq [v values]
                    (validate-range-object v range property))
                  (validate-range-object values range property))))
            ; consider that all the values of the range is an union-of
            (println "@TODO Ranges union validation")))))))

(defn- validate-range-pattern
  [v pattern range]
  (if (string? v)
    (if (nil? (re-seq (java.util.regex.Pattern/compile pattern) v))
      (throw (IllegalStateException.
              (format "Value \"%s\" invalid according to the definition of the datatype \"%s\""  v range))))
    (if (and (map? v) (nil? (validate-map-properties v)))
      (if (nil? (re-seq (java.util.regex.Pattern/compile pattern) (get v 'value)))
        (throw (IllegalStateException.
                (format "Value \"%s\" invalid according to the definition of the datatype \"%s\""  v range)))))))

(defn- validate-map-properties
  [m]
  (doseq [[p v] m]
        (if (fn? @p)
          (@p v))))

What this function does is to validate the range of a property. It checks what kind of values that exist for the input property according to the RDF/Clojure specification (is it a string, a map, an array, a var, etc.?). Then it checks if the property is an object property or a datatype property. If it is a datatype property, then it checks if a range has been defined for it. If it does, then it validates the value(s) according to the datatype defined in the range of the property.

Here is an example of a few records that have different datatype validation errors:

(def datatype-validation-test (r {uri "http://foo-bar.com/test/"
                                  rdf:type owl:+thing
                                  foo:phone "1-421-353-90573"
                                  iron:pref-label {value "Test cardinality validation"
                                                   lang "en"
                                                   datatype xsd:*string}}))
(def datatype-validation-test-2 (r {uri "http://foo-bar.com/test/"
                                  rdf:type owl:+thing
                                  foo:phone "1-421-353-9057"
                                  iron:pref-label {value "Test datatype validation"
                                                   lang "en"
                                                   datatype "not-a-datatype"}}))

(def xsd:*string-not-a-datatype)

(def datatype-validation-test-3 (r {uri "http://foo-bar.com/test/"
                                    rdf:type owl:+thing
                                    foo:phone "1-421-353-9057"
                                    iron:pref-label {value "Test datatype validation"
                                                     lang "en"
                                                     datatype xsd:*string-not-a-datatype}}))

(def datatype-validation-test-4 (r {uri "http://foo-bar.com/test/"
                                    rdf:type owl:+thing
                                    foo:phone [{value "1-421-353-9057"
                                                datatype xsd:<em>string-not-a-datatype}]
                                    iron:pref-label {value "Test datatype validation"
                                                     lang "en"
                                                     datatype xsd:</em>string}}))
user> (validate-resource datatype-validation-test)
IllegalStateException Value "1-421-353-90573" invalid according to the definition of the datatype "[{#'ontologies.core/xsp:pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$", #'rdf.core/uri "http://purl.org/ontology/foo#phone-number", #'ontologies.core/rdf:type "http://www.w3.org/TR/rdf-schema#Datatype"}]"  rdf.property/validate-range-pattern (property.clj:150)

user> (validate-resource datatype-validation-test-2)
IllegalStateException Provided value for datatype is not a datatype: "not-a-datatype"  rdf.core/datatype (core.clj:31)

user> (validate-resource datatype-validation-test-3)
IllegalStateException Provided value for datatype is not a datatype: "#'dataset-test.core/xsd:*string-not-a-datatype"  rdf.core/datatype (core.clj:30)

user> (validate-resource datatype-validation-test-4)
IllegalStateException Provided value for datatype is not a datatype: "#'dataset-test.core/xsd:*string-not-a-datatype"  rdf.core/datatype (core.clj:30)

As you can see, the validate-rdfs-range is incomplete regarding datatype validation. I am still updating this function to make sure that we validate all the existing XSD datatypes. Then we have to better validate the custom datatypes to make sure that we consider their xsp:base type, etc. The code that should be created is similar to the one I created for the Data Validation Tool (which is written in PHP).

Range validation when the range is a class

Finally, let’s shows how the range of an object property can be validated. Validating the range of an object property means that we make sure that the record referenced by the object property belongs to the class of the range of the property.

For example, consider a property foo:knows that has a range that specifies that all the values of foo:knows needs to belong to the class umbel-rc:+person. This means that all of the values defined for the foo:knows property for any record needs to refer to a record that is of type umbel-rc:+person. If it is not the case, then there is a validation error.

Here is an example of a record where the foo:knows property is not properly used:

(def wrench (r {uri "http://foo-bar.com/test/bob"
               rdf:type umbel.ref/umbel-rc:+product
               iron:pref-label "The biggest wrench ever"}))

(def object-range-validation-test (r {uri "http://foo-bar.com/test/bob"
                                      rdf:type umbel.ref/umbel-rc:+person
                                      foo:knows wrench
                                      iron:pref-label {value "Test object range validation"
                                                       lang "en"
                                                       datatype xsd:*string}}))

Remember we defined the foo:knows property with the range of umbel-rc:+person. However, in the example, the reference is to a wrench record that is of type umbel-rc:+product. Thus, we get a validation error:

user> (validate-resource object-range-validation-test)
IllegalStateException The resource "http://umbel.org/umbel/rc/Product" referenced by the property "#'dataset-test.core/foo:knows" does not belong to the class "#'umbel.ref/umbel-rc:+person" as defined by the range of the property  rdf.property/validate-range-object (property.clj:142)

The function that validates the ranges of the object properties is defined as:

(defn- validate-range-object
  [r range property]
  (do (println range)
  (let [r (if (var? r)
            (deref r)
            (if (map? r)
              (r)
              (if (string? r)
                ; @TODO get the resource's description from a dataset index
                ({}))))
        uri (get (deref (get r #'ontologies.core/rdf:type)) #'rdf.core/uri)
        uri-ending (do (println uri) (if (> (.lastIndexOf uri "/") -1)
                     (subs uri (inc (.lastIndexOf uri "/")))
                     (str "")))
        super-classes (try
                        (read-string (:body (clj-http.client/get (str "http://umbel.org/ws/super-classes/" uri-ending)
                                                                 {:headers {"Accept" "application/clojure"}
                                                                  :throw-exceptions false})))
                        (catch Exception e
                          (eval nil)))
        range-uri (get @range #'rdf.core/uri)]
    (if-not (some #{range-uri} super-classes)
      (throw (IllegalStateException. (str "The resource \"" uri "\" referenced by the property \"" property "\" does not belong to the class \"" range "\" as defined by the range of the property" )))))))

Normally, this kind validation should be done using the descriptions of the loaded ontologies. However, for the benefit of this blog post, I used a different way to perform this validation. I purposefully used some UMBEL Reference Concepts as the type of the records I described. Then the object range validation function leverages the UMBEL super-classes web service endpoint to check get the super-classes of a given class.

So what this function does is to check the type of the record(s) referenced by the foo:knows property. Then it checks the type of these record(s). What needs to be validated is whether the type(s) of the referenced record is the same, or is included, in the class defined in the range of the foo:knows property.

In our example, the range is #'umbel-rc:+person. This means that the foo:knows property can only refer to umbel-rc:+person records. In the example where we have a validation error, the type of the wrench record is umbel-rc:+product. What the validation function does is to get the list of all the super classes of the umbel-rc:+product class, and check if it is a sub-class of the umbel-rc:+person class. In this case, it is not, thus an error is thrown.

What is interesting with this example is the UMBEL super-classes web service endpoint does return the list of super classes as Clojure code. Then we use the read-string function to evaluate the list before manipulating it as if it was part of the application’s code.

Conclusion

What is elegant with this kind RDF/Clojure serialization is that the validation of RDF data is the same as evaluating the underlying code (Data as Code). If the data is invalid, then exceptions are thrown and the validation process aborts.

One thing that I yet have to investigate with such a RDF/Clojure serialization is how the semantics of the properties, classes and datatypes could be embedded into the RDF/Clojure records such that we end up with stateful RDF records that embed their own semantic at a specific point in time. This leverage would mean that even if an ontology changes in the future, the records will still be valid according to the original ontology that was used to describe them at a specific point in time (when they got written, when they got emitted by a web service endpoint, etc.).

Also, as some of my readers pointed out with my previous blog post about this subject, the fact that I use vars to serialize the RDF triples means that the serialization won’t produce valid ClojureScript code since vars doesn’t exists in ClojureScript. Paul Gearon was proposing to use keywords as the key instead of vars. Then to get the same effect as with the vars, to use a lookup index to call the functions. This avenue will be investigated as well and should be the topic of a future blog post about this RDF/Clojure serialization.

Clojure, Semantic Web, Structured Dynamics

Revision of Serializing RDF Data as Clojure Code Specification

In my previous blog post RDF Code: Serializing RDF Data as Clojure Code I did outline a first version of what a RDF serialization could look like if it would be serialized using Clojure code. However, after working with this proposal for two weeks, I found a few issues with the initial assumptions that I made that turned out to be bad design decisions in terms of Clojure code.

This blog post will discuss these issues, and I will update the initial set of rules that I defined in my previous blog post. Going forward, I will use the current rules as the way to serialize RDF data as Clojure code.

What Was Wrong

After two weeks of using the previous set of serializations rules and developing all kind of functions that uses that codes in the context of UMBEL graph traversal and analysis I found the following issues:

  1. Keys and values should be Vars
  2. Ontologies should all be in the same namespace (and not in different namespaces)
  3. The prefix/entity separator for the RDF resources should be a colon and not a slash

These are the three serialization rules that changed after working with the previous version of the proposal. Now, let’s see what caused these changes to occur.

Keys and Values as Vars

The major change is that when we serialize RDF data as Clojure map structures, the keys, and values that are not strings, should be Vars.

There are three things that I didn’t properly evaluated when I first outlined the specification:

  1. The immutable nature of the Clojure data structures
  2. The dependency between ontologies
  3. The non-cyclical namespaces dependency rule imposed by Clojure

In the previous proposal, every RDF property were Clojure functions and they were also the keys of the Clojure maps that were used to serialize the RDF resources. That was working well. However, there was a side effect to this decision: everything was fine until the function’s internal ID changed.

The issue here is that when we work with Clojure maps, we are working with immutable data structures. This means that even if I create a RDF record like this:

(def mike {uri "http://foo.com/datasets/people/mike"
           rdf/type foaf/+person
           iron/pref-label "Mike"
           foaf/knows ["http://foo.com/datasets/people/fred"]})

And that somehow, in the compilation process the RDF ontology file get re-compiled, then the internal ID of the rdf/type property (function) will change. That means that if I create another record like this:

(def mike-2 {uri "http://foo.com/datasets/people/mike"
             rdf/type foaf/+person
             iron/pref-label "Mike"
             foaf/knows ["http://foo.com/datasets/people/fred"]})

that uses the same rdf/type function, then these two records would refer to different rdf/type functions since it changed between the time I created the mike and the mike-2 resources. That may not look like an issue since both functions does exactly the same thing. However, this is an issue since for multiple tasks to manipulate and query RDF data rely on comparing these keys (so, these functions). That means that unexpected behaviors can happen and may even looks like random.

The issue here was that we were not referring to the Var that point to the function, but the function itself. By using the Var as the keys and values of the map, then we fix this inconsistency issue. What happens is that all the immutable data structure we are creating are referring to the Var which point to the function. That way, when we evaluate the Var, we will get reference to the same function whatever when it got created (before or after the creation of mike and/or mike-2). Here is what the mike records looks like with this modification:

(def mike {#'uri "http://foo.com/datasets/people/mike"
           #'rdf/type #'foaf:+person
           #'iron/pref-label "Mike"
           #'foaf/knows ["http://foo.com/datasets/people/fred"]})

We use the #' macro reader to specify that we use the Var as the key and values of the map and not the actual functions or other values referenced by that Var.

The second and third issues I mentioned are tightly related. In a RDF & OWL world, there are multiple examples of ontologies that re-use external ontologies to describe their own semantic. There are cases where an ontology A use classes and properties from an ontology B and where the ontology B use classes and properties from an ontology A. They cross-use each other. Such usage cycles exists in RDF & OWL and are not that uncommon neither.

The problem with that is that at first, I was considering that each OWL ontologies that were to be defined as Clojure code would be in their own Clojure namespace. However, if you are a Clojure coder, you can envision the issue that is coming: if two ontologies cross-use each other, then it means that you have to create a namespace dependency cycles in your Clojure code… and you know that this is not possible because this is restricted by the compiler. This means that everything works fine until this happens.

To overcome that issue, we have to consider that all the ontologies belong to the same namespace (like clojure.core). However, in my next blog post that will focus on these ontologies description I will show how we can split the ontologies in multiple files while keeping them in the same namespace.

Now that we should have all the ontologies in the same namespace, and that we cannot use the namespaced symbols of Clojure anymore, I made the decision to use the more conventional way to write namespaced properties and classes in other RDF serializations which is to delimit the ontology’s prefix with a colon like that:

(def mike {#'uri "http://foo.com/datasets/people/mike"
           #'rdf:type #'foaf:+person
           #'iron:pref-label "Mike"
           #'foaf:knows ["http://foo.com/datasets/people/fred"]})

Revision of the RDF Code Rules

Now let’s revise the set of rules that I defined in the previous blog post:

  1. A RDF resource is defined as a Clojure map where:
    1. Every key is a Var that point to a function
    2. Every value is a:
      1. string
        1. A string is considered a literal if the key is a owl:DatatypeProperty
        2. A string is considered a URI if the key is a owl:ObjectProperty
      2. map
        1. A map represent a literal if the value key is present
        2. A map represent a reference to another resource if the uri key is present
        3. A map is invalid if it doesn’t have a uri nor a value key
      3. vector
        1. A vector refer to multiple values. Values of a vector can be stringsmaps, symbols or Vars
      4. symbol
        1. A symbol can be created to simplify the serialization. However, these symbols have to reference a string or a var object
      5. var
        1. A var reference another entity

In addition to these rules, there are some more specific rules such as:

  1. The value of a uri key is always a string
  2. If the #’rdf:type key is not defined for a resource, then the resource is considered to be of type #’owl:+thing (since everything is at least an instance of the owl:Thing class in OWL)

Finally, there are two additional classes and datatypes creation conventions:

  1. The name of the classes starts with a + sign, like: #’owl:+thing
  2. The name of the datatypes starts with a * sign, like: #’xsd:*string

As you can see, the rules that govern the serialization of RDF data as Clojure code are minimal and should be simple to understand for someone who is used to Clojure code and that tried to write a few resource examples using this format. Now, let’s apply these rules with a series of examples.

Note 1: in the examples of this blog post, I am referring to Vars like #’uri, #’value, #’lang, #’datatype, etc. To make the rules simpler to read and understand, consider that these Vars are defined in the user‘s namespace. However, they are vars that are defined in the rdf.core namespace that will be made publicly available later.

Note 2: All the properties and classes resource Vars have been defined in the same namespace. They should be included with :require or :use like (:use [ontologies.core]) from the ns function of the Clojure source code file that define this RDF resource. We will discuss about these namespaces in a subsequent blog post.

Revision of Serializing RDF Code in N-Triples

The serialize-ntriples function got modified to comply with the new set of rules:

(declare serialize-ntriples-map-value serialize-ntriples-string-value is-datatype-property?)

(defn serialize-ntriples
  [resource]
  (let [n3 (atom "")
        iri (get resource #'rdf.core/uri)]
    (doseq [[property prop-vals] resource]
      (let [property-uri (get (meta property) #'rdf.core/uri)]
        ; Don't do anything with the "uri" key
        (if (not= property #'rdf.core/uri)
          (if (vector? prop-vals)
            ; Here the value is a vector of maps or values
            (doseq [v prop-vals]
              (let [val (if (var? v) @v v)]
                (if (map? val)
                  ; The value of the vector is a map
                  (reset! n3 (str @n3 (serialize-ntriples-map-value val iri property-uri)))
                  (if (string? val)
                    ; The value of the vector is a string
                    (reset! n3 (str @n3 (serialize-ntriples-string-value val iri property-uri property)))))))
            (let [vals (if (var? prop-vals) @prop-vals prop-vals)]
              (if (map? vals)
                ; The value of the property is a map
                (reset! n3 (str @n3 (serialize-ntriples-map-value vals iri property-uri)))
                (if (string? vals)
                  ; The value of the property is some kind of literal
                  (reset! n3 (str @n3 (serialize-ntriples-string-value vals iri property-uri property))))))))))
    @n3))

(defn- serialize-ntriples-map-value
  [m iri property-uri]
  (if (not (nil? (get m #'rdf.core/uri)))
    ; The value is a reference to another resource
    (format "&lt;%s> &lt;%s> &lt;%s> .\n" iri property-uri (get m #'rdf.core/uri))
    (if (not (nil? (get m #'rdf.core/value)))
      ; The value is some kind of literal
      (let [value (get m #'rdf.core/value)
            lang (if (get m #'rdf.core/lang) (str "@" (get m #'rdf.core/lang)) "")
            datatype (if (get m #'rdf.core/datatype) (str "^^&lt;" (get (deref (get m #'rdf.core/datatype)) #'rdf.core/uri) ">") "")]
        (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\"%s%s .\n" iri property-uri value lang datatype))
      (if (string? m)
        ; The value of the sector is some kind of literal
        (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\" .\n" iri property-uri m)))))

(defn- serialize-ntriples-string-value
  [s iri property-uri property]
  ; The value of the vector is a string
  (if (true? (is-datatype-property? property))
    ; The property referring to this value is a owl:DatatypeProperty
    (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\" .\n" iri property-uri s)
    ; The property referring to this value is a owl:ObjectProperty
    (format "&lt;%s> &lt;%s> &lt;%s> .\n" iri property-uri s)))

(defn is-datatype-property?
  [property]
  (if (= (-> property
             meta
             (get #'ontologies.core/rdf:type)
             deref
             (get #'rdf.core/uri))
         (-> #'ontologies.core/owl:+datatype-property
             deref
             (get #'rdf.core/uri)))
    (eval true)
    (eval false)))

Serializing a RDF Resource

Now let’s serialize a new RDF resource using the new set of rules:

(def fred {#'uri "http://foo.com/datasets/people/fred"
           #'rdf:type [#'foaf:+person #'owl:+thing]
           #'iron:pref-label "Fred"
           #'iron:alt-label {#'value "Frederick"
                             #'lang "en"}
           #'foaf:skypeID {#'value "frederick.giasson"
                           #'datatype #'xsd/*string}
           #'foaf:knows [{#'uri "http://foo.com/datasets/people/bob"}
                         mike
                         "http://foo.com/datasets/people/teo"]})

One drawback with these new rules (even if essential) is that they complexify the writing of the RDF resources because of the (heavy) usage of the #' macro.

However, on the other hand, they may looks like more familiar to people used to RDF serializations because of the usage of the colon instead of the slash to split the ontology prefix with the ending of the URI.

What we have above, is how the RDF data is represented in Clojure. However, there is a possibility to make this serialization less compact by creating a macro that would change the input map and automatically inject the usage of the #' reader macro into the map structures that define the RDF resources.

Here is the r macro (“r” stands for Resource) that does exactly this:

(defmacro r
  [form]
  (-> (walk/postwalk
       (fn [x]
         (if (and (symbol? x) (-> x
                                  eval
                                  string?
                                  not))
           `(var ~x)
           x))
       form)))

Then you can use it to define all the RDF resources you want to create:

(def fred (r {uri "http://foo.com/datasets/people/fred"
               rdf:type [foaf:+person owl:+thing]
               iron:pref-label "Fred"
               iron:alt-label {value "Frederick"
                               lang "en"}
               foaf:skypeID {value "frederick.giasson"
                             datatype  xsd/*string}
               foaf:knows [{uri "http://foo.com/datasets/people/bob"}
                            mike
                            "http://foo.com/datasets/people/teo"]})

That structure is equivalent to the other one because the r macro will add the #' reader macro calls to change the input map before creating the resource’s Var.

By using the r macro, we can see that the serialization is made much simpler, and that at the end, it is more natural to people used to other RDF serializations.

Conclusion

I used the initial specification in the context of creating a new series of web services for the UMBEL project. This heavy usage of this kind of RDF data leaded to discover the issues I covered in this blog post. Now that these issues are resolved, I am confident that we can move forward in the series of blog posts that covers how (and why!) using Clojure code to serialize RDF data.

The next blog post will cover how to manage the ontologies used to instantiate these RDF resources.

Clojure, Programming, Semantic Web, Structured Dynamics

RDF Code: Serializing RDF Data as Clojure Code

RDF Code is a specification to serialize RDF data as Clojure code1. This blog post introduce the first version of this new RDF serialization format. I will outline the rules that specify how such RDF data should be serialized using the Clojure programming language.

This specification may change over time. However, this is the specification that will be used for the future blog posts that I will write about this subject, and for the code that will be released.

I am also expecting feedbacks and propositions to make this serialization easier to use, simpler to define and cleaner.

What we do with this serialization is to write RDF resources as Clojure maps. This is not about defining a DSL to manipulate this data, but really to define the core Clojure structure that will be manipulated by Clojure functions and applications. Eventually a (or multiple) DSL could be created to help users and developers using this RDF data in their Clojure application. But this is not the current focus.

A Complete RDF Resource

Before outlining all the rules to create well-formed RDF data as Clojure code, let's take a look at a resource that uses all the serialization features2. Note that this is the more complex it can be. Below we will see how we can normalize the usage of the serialization rules to end up with clean and easy to read RDF data as Clojure code.

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label "Fred"
           iron/alt-label {value "Frederick"
                           lang "en"}
           foaf/skypeID {value "frederick.giasson"
                         datatype xsd/*string}
           foaf/knows [{uri "http://foo.com/datasets/people/bob"}
                       mike
                       "http://foo.com/datasets/people/teo"]})

This code shows how to:

  • Serialize a single resource as a Clojure map
  • How to define a URI for that resource
  • How to define one or multiple rdf:type for a resource
  • How to define one or multiple values for a owl:DatatypeProperty
  • How to define one or multiple values for a owl:ObjectProperty
  • How to define the language of a Literal
  • How to define the datatype of a Literal

As you can see, such a RDF serialization format is expressive enough to be able to express any RDF triples. It also has syntactic rules that help reading and writing RDF data in that format.

Note that since this RDF data is also Clojure code, it means that the serialization format has been highly influenced by Clojure's own syntax and coding style3. RDF data serialized using this format also needs to be valid Clojure code. Now, let's outline the rules that govern the creation of such RDF data, then let explains all these rules using simple RDF code examples.

RDF Code Rules

Here is the list of all the rules that govern the creation of RDF data serialized as Clojure code:

  1. A RDF resource is defined as a Clojure map where:
    1. Every key is a symbol that reference a function
    2. Every value is a:
      1. string
        1. A string is considered a literal if the key is a owl:DatatypeProperty
        2. A string is considered a URI if the key is a owl:ObjectProperty
      2. map
        1. A map represent a literal if the value key is present
        2. A map represent a reference to another resource if the uri key is present
        3. A map is invalid if it doesn't have a uri nor a value key
      3. vector
        1. A vector refer to multiple values. Values of a vector can be strings, maps or symbols
      4. symbol
        1. A symbol can be created to simplify the serialization. However, these symbols have to reference a string or a map object

In addition to these rules, there are some more specific rules such as:

  1. The value of a uri key is always a string
  2. If the rdf/type key is not defined for a resource, then the resource is considered to be of type owl:Thing (since everything is at least an instance of the owl:Thing class in OWL)

Finally, there are two additional classes and datatypes creation conventions:

  1. The name of the classes starts with a + sign, like: owl/+thing
  2. The name of the datatypes starts with a * sign, like: xsd/*string

As you can see, the rules that govern the serialization of RDF data as Clojure code are minimal and should be simple to understand for someone who is used to Clojure code and that tried to write a few resource examples using this format. Now, let's apply these rules with a series of examples.

Note 1: in the examples of this blog post, I am referring to symbols like uri, value, lang, datatype, etc. To make the rules simpler to read and understand, consider that these symbols are defined in the user's namespace. However, they are symbols that are defined in the rdf.core namespace that will be made publicly available later.

Note 2: I am also referring to namespaced symbols like rdf/type, iron/pref-label, etc. These symbols are defined in their respective namespaces. They have been defined such as (:require [ontologies.rdf :as rdf] [ontologies.iron :as iron]) into the ns function of the Clojure source code file that define this RDF resource. We will discuss about these namespaces in a subsequent blog post.

Serializing RDF Code in N-Triples

Before starting to list all the examples, let's define a Clojure function that we will use to convert the RDF code as N-Triples4. N-Triples is just a list of <subject> <predicate> <object> triples that describes the RDF resources we are describing. N-Triples is the simplest and most verbose RDF serialization that currently exists. What this serialize-ntriples function does is to take a Clojure map that represent a RDF resource and return a string that represents the serialized N-Triples.

What is important with that code is that is shows how the rules we outlined above got implemented to serialize RDF code as N-Triples. Such serializer function could be created to serialize Turtle and XML RDF serializations as well. Note that you won't be able to use the serialize-ntriples function because you are missing the ontologies files. I will make them available in a subsequent blog post that will explain how properties, classes and datatypes are created and used in this context.

(declare serialize-ntriples-map-value serialize-ntriples-string-value is-datatype-property?)

(defn serialize-ntriples
  [resource]
  (let [n3 (atom "")
        iri (get resource rdf.core/uri)]
    (doseq [[property vals] resource]
      (let [property-uri (get (meta property) rdf.core/uri)]
        ; Don't do anything with the "uri" key
        (if (not= property rdf.core/uri)
          (if (vector? vals)
            ; Here the value is a vector of maps or values
            (doseq [val vals]
              (if (map? val)
                ; The value of the vector is a map
                (reset! n3 (str @n3 (serialize-ntriples-map-value val iri property-uri)))
                (if (string? val)
                  ; The value of the vector is a string
                  (reset! n3 (str @n3 (serialize-ntriples-string-value val iri property-uri property))))))
            (if (map? vals)
              ; The value of the property is a map
              (reset! n3 (str @n3 (serialize-ntriples-map-value vals iri property-uri)))
              (if (string? vals)
                ; The value of the property is some kind of literal
                (reset! n3 (str @n3 (serialize-ntriples-string-value vals iri property-uri property)))))))))
    @n3))

(defn- serialize-ntriples-map-value
  [m iri property-uri]
  (if (not (nil? (get m rdf.core/uri)))
    ; The value is a reference to another resource
    (format "&lt;%s> &lt;%s> &lt;%s> .\n" iri property-uri (get m rdf.core/uri))
    (if (not (nil? (get m rdf.core/value)))
      ; The value is some kind of literal
      (let [value (get m rdf.core/value)
            lang (if (get m rdf.core/lang) (str "@" (get m rdf.core/lang)) "")
            datatype (if (get m rdf.core/datatype) (str "^^&lt;" (get (get m rdf.core/datatype) rdf.core/uri) ">") "")]
        (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\"%s%s .\n" iri property-uri value lang datatype))
      (if (string? m)
        ; The value of the sector is some kind of literal
        (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\" .\n" iri property-uri m)))))

(defn- serialize-ntriples-string-value
  [s iri property-uri property]
  ; The value of the vector is a string
  (if (true? (is-datatype-property? property))
    ; The property referring to this value is a owl:DatatypeProperty
    (format "&lt;%s> &lt;%s> \"\"\"%s\"\"\" .\n" iri property-uri s)
    ; The property referring to this value is a owl:ObjectProperty
    (format "&lt;%s> &lt;%s> &lt;%s> .\n" iri property-uri s)))

(defn is-datatype-property?
  [property]
  (if (= (get (get (meta property) rdf/type) rdf.core/uri) (get ontologies.owl/+datatype-property rdf.core/uri))
    (eval true)
    (eval false)))

The Simplest Resource

Here is the simplest resource that can be written:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type "http://xmlns.com/foaf/0.1/Person"
           iron/pref-label "Fred"
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

It produces these triples:

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

In this example, all the references to the classes and to other resources are made using strings that represent URIs. Then all the literal values are normal strings as well. The vector is composed of a list of URIs without mixing different type of values.

Using Classes Symbols

In this example, we will use the symbol that reference a class resource we have defined in an ontology:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type foaf/+person
           iron/pref-label "Fred"
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

The same set of triples as the previous example will be generated by serialize-ntriples.

What is interesting with this new example is that it is more appealing to human readers. Instead of having a full URI string, we are seeing a symbol which is more pleasant to the eyes.

But the real benefit is no that it is more pleasant to the eyes, but that it really refers to something. It refers to a class resource. This means that we have a docstring for that class, that we can check the code that describes the class to see all its characteristics, that you will be able to auto-complete it in your IDE, etc.

Multiple Types

It is possible to define multiple types for a resource. The only thing you have to do is to use a vector as the value of the rdf/type key:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label "Fred"
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

This code will generate the following triples:

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

Specifying a Datatype for a Literal

It is also possible to define a datatype for a literal. What you have to do is to use a map with a value and a datatype key:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label {value "Fred"
                            datatype xsd/*string}
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

The triples that will be generated are:

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred"""^^<http://www.w3.org/2001/XMLSchema#string> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

Specifying a Language Tag for a Literal

It is also possible to define a language tag for a literal. What you have to do is to use a map with a value and a lang key:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label {value "Fred"
                            lang "en"}
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       "http://foo.com/datasets/people/mike"
                       "http://foo.com/datasets/people/teo"]})

This code will produce the following triples:

<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred"""@en .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .

Referring to a URI Using a Map

It is possible to explicit the fact that a literal value is a URI reference. It is possible to do that by using a map and by specifying the uri key. Some people may prefer that approach because it makes the fact that a literal is a URI explicit without having to know the nature of the key (i.e. if the property is a datatype or an object property):

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label {value "Fred"
                            lang "en"}
           foaf/skypeID "frederick.giasson"
           foaf/knows [{uri "http://foo.com/datasets/people/bob"}
                       {uri "http://foo.com/datasets/people/mike"}
                       {uri "http://foo.com/datasets/people/teo"}]})

The same triples will be generated as the example above.

Mixing Values in a Vector

It is possible to mix the values in a vector. In the following example, we are using literals as URIs, and maps that refer to URIs as well. The rules permit this kind of values mixing within a vector and the software that manipulate this kind of RDF data should be agnostic to this:

(def fred {uri "http://foo.com/datasets/people/fred"
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label {value "Fred"
                            lang "en"}
           foaf/skypeID "frederick.giasson"
           foaf/knows ["http://foo.com/datasets/people/bob"
                       {uri "http://foo.com/datasets/people/mike"}
                       "http://foo.com/datasets/people/teo"]})

The same triples will be generated as the example above.

Using Symbols to Get Cleaner Code

Because of the way Clojure works, you can define new symbols that will refer to the values to add to this structure. Let's go wild, and do define a symbol for each value of the resource we are describing:

(def fred-uri "http://foo.com/datasets/people/fred")

(def foaf_person foaf/+person)

(def owl_thing owl/+thing)

(def fred-label {value "Fred"
                 lang "en"})

(def fred-skype-id "frederick.giasson")

(def bob-uri "http://foo.com/datasets/people/bob")

(def mike-uri {uri "http://foo.com/datasets/people/mike"})

(def teo-uri "http://foo.com/datasets/people/teo")

(def fred {uri fred-uri
           rdf/type [foaf/+person owl/+thing]
           iron/pref-label fred-label
           foaf/skypeID fred-skype-id
           foaf/knows [bob-uri
                       mike-uri
                       teo-uri]})

The triples generated will be the same as the example above.

What happens here is that the Clojure reader substitutes the symbols by the actual things they refer to at evaluation time.

This is why this works and it is why it is still valid RDF code according to the rules outlined above. The serialize-ntriples function is not even aware that this data structure has been defined that way. It is the case because the map that it will receive as input will already be evaluated. This means that what it gets as input are the objects referenced by these symbols, and not the symbols themselves.

This mechanism can be leveraged to make the RDF code even more readable if you have patterns that a repeated constantly in a dataset file you are creating.

Conclusion

This is all good, but why yet another serialization for RDF? As I started to outline in my two previous blogposts on this topic, this is not just about serializing RDF in another format. It is about having RDF data that can be evaluated as code (in this case, Clojure code).

Considering RDF properties as functions open-up a World of possibilities. First, it means that by evaluating and compiling this kind of RDF code, you have a data structure that is able to validate itself according to the way the properties are defined into the OWL ontologies used to define their behaviors and semantics.

This blog post focused on how to serialize instances data. In the next blog posts, I will cover:

  • How to reify triples using this serialization
  • How to create the classes, properties and datatypes used by the serialization
  • How to validate the data structure
  • How to manage ontologies as Clojure packages

Feel free to comment this blog post and propose changes to the serialization format or the serialize-ntriples function.

Clojure, Emacs, Semantic Web, Structured Dynamics

Investigating Options to Serialize RDF data as Clojure Code

My initial intuition is that I could serialize RDF data into Clojure code where the OWL semantic of the RDF data is embedded, in some ways, into that code. I want to test how the general saying of homoiconic languages: Data as Code. Code as Data, fits with RDF & OWL.

Another intuition I have is the concept of Portable Data: stateful RDF data which embed its own semantic and which doesn’t rely on external (mostly stateless since we can rarely rely on their stated versions) ontologies. My intuition is that it would be possible to serialize RDF data in such a way that it would be self-aware of its own semantic which means that it would know how it can be interpreted, how it can be used, and how it should be validated. The idea is to end-up with Portable Data snippets that could be exchanged between systems without requiring prior, or post, schemas (ontologies) to interpret that information. Then web service endpoints such as OSF, or any other kind of applications, could emit such Portable Data structures without requiring any subsequent ontologies analysis from their part.

However, before being able to implement and demonstrate these intuitions, the first step is to check what such a RDF serialization may looks like. This is the goal of this blog post.

Serializing RDF Data as Clojure Code

Where to start? There are probably multiple ways to do that. Do we want to do that using a map, a structmap, a records, or…? What I wanted to use (at least initially) is a basic data structure that would give me the flexibility I need to serialize RDF data. I wanted a core structure such that existing Clojure developers could easily manipulate it using the existing Clojure functions and techniques that they are used to use.

The collection I choose to start with is the map. This key/value pair structure is ideal for serializing RDF data. It looks like JSON code, but is even simpler since it doesn’t require commas nor colons in its syntax.

The crux of the map structure is that in a map, the keys can be: keywords, symbols, strings, characters, booleans and numbers. The only things it cannot be are regular expressions and the nilvalue. What should be stated here is that symbols can be a lot of different things. They are names for vars, functions, etc.

This opens a World of possibilities to serialize RDF data as Clojure code. In fact, the keys of the map can virtually be anything: and this is just too nice to be true!

What we will investigate in the remaining of this blog posts are different ways to serialize RDF data as Clojure code. These are the initial tests I did to test my intuitions. All of them works, but only the last one really opens-up a World of possibilities and that enables me to implement my early intuitions.

Quick Introduction to RDF Data

RDF is nothing else than a bunch of triples of the form:

  • <subject> <predicate> <object>

Where the <subject> is the thing (resource, record, entity, etc.) being described, where the <predicate> is the property (attribute, etc.) that describes the subject and where the <object> is the value of the predicate which can be a reference to another subject, a literal value, etc.

Each <subject> do have at least a type. A type is nothing else than a class of things which is defined in a RDFS schema or a OWL ontology.

Then if you wire these triples together, you get a directed graph which we often refer to as a datataset. It is as simple as that. However, I won’t state that RDF is necessarily simple, since its expressivity (a double-edged sword) can make things much more complex.

The semantic of the data lies into the <predicate> and the type. It is the predicate and the type that tell us how to interpret, and use, the data. It is what is used to validate the data for example. That is exactly where Clojure, and its map structure, can help us to create this kind of portable data.

As you will see below, the serialization of RDF data as Clojure code looks like the structJSON RDF serialization format developed by Structured Dynamics and used at the core of the Open Semantic Framework. It is not a coincidence since that simple structure has been highly effective to serialize and transmit RDF information between OSF web services and other applications such as OSF for Drupal and other JavaScript applications.

Leveraging Serialization’s Hierarchy to Create Triples

Before jumping into Clojure, let’s take a quick look at a really simple structJSON record. What I want to show you is how triples can be extracted from such a data structure. It is the same principle that will be used to extract triples from the Clojure serialization:

 "subject": [
      {
        "uri": "http://dataset1.com/record-a/",
        "predicate": [
          {
            "rdfs:type": "http://umbel.org/umbel/rc/Person"
          },
          {
            "iron:prefLabel": "Bob"
          },
          {
            "foaf:knows": {
              "uri": "http://dataset2.com/record-b/"
            }
          }
        }
      ]

What we leverage here to extract triples is the hierarchy nature of the serialization. Here the "subject" key introduce an array of objects. Each object has a "uri" key which is the identifier (<subject> of a triple). Then the "predicate" key introduces a series of attributes for that record. Each element of the array is a predicate with the key is the prefixed version of the RDF <predicate>. Then you have a value for each of these predicate keys. If you read the documentation, you will see that you can get to another level called the reification of that triple (don’t confuse with Clojure’s reification mechanism) that is used to define extra information related to a triple statement. That structJSON code would produce the following ntriples:

<a href="http://dataset1.com/record-a/">http://dataset1.com/record-a/</a> rdfs:type <a href="http://umbel.org/umbel/rc/Person">http://umbel.org/umbel/rc/Person</a> .
<a href="http://dataset1.com/record-a/">http://dataset1.com/record-a/</a> iron:prefLabel "Bob" .
<a href="http://dataset1.com/record-a/">http://dataset1.com/record-a/</a> foaf:knows <a href="http://dataset2.com/record-b/">http://dataset2.com/record-b/</a> .

Serializing RDF using Maps and Keywords Keys

The most intuitive way to serialize RDF data as Clojure maps would be to create a map where all the keys are keywords. An initial test would be:

(def resource {:uri "http://foo.com/1"
               :rdf/type [foaf/Person owl/Thing]
               :iron/prefLabel {:value "Fred"
                                :lang nil
                                :datatype xsd/string}
               :foaf/knows [{:uri "http://foo.com/2"
                             :rei [{:iron/prefLabel [{:value "Bob"
                                                      :lang "en"}
                                                     {:value "Robert"
                                                      :lang "fr"}]}]}
                            {:uri "http://foo.com/3"
                             :rei [:iron/prefLabel "Mike"]}]})

What we did here is to define a map with the symbol resource. This map is composed of a series of keys and values where the keys are keywords, and were the values can be strings, vectors or maps. The basic serialization rules are:

  • Each map has a :uri key that define the URI of the resource being described
  • Each key is a namespaced key where the root of the namespace is the prefix of the ontology where the <predicate> or type is defined
  • If the predicate is a owl:DatatypeProperty, then its value can be:
    • A vector with one or multiple map and/or string
    • A map which can have four keys:
      • :value which specify the actual string value
      • :lang which specify the language of that string
      • :datatype which specify the datatype of the string
      • :rei which specify reification statements for the triple
    • A string which is the actual value without any additional information about that Literal
  • If the predicate is a owl:ObjectProperty, then its value can be:
    • A vector with one or multiple map, string and/or symbol
    • A map which can have two keys:
      • :uri which specify the actual URI of the referenced resource
      • :rei which specify reification statements for the triple
    • A string which represent the URI of the resource to be referenced
    • A symbol which represents the URI string of the resource to be referenced

Namespacing Keywords

One of the important notion is that the keywords used as map keys are namespaced. This means that they are defined, and live, in their own namespace. This is an essential requirement for a RDF serialization since we re-use multiple ontologies that may share the same name for some of the predicates and that we don’t want these keywords to clash. That is why that by convention we do create each of these keywords in their respective ontology’s namespace. An ontology namespace is defined as the prefix used to refer to the ontology (for example, the Bibliographic Ontology‘s prefix is bibo, so :bibo/shortTitle would be the key referring to the property http://purl.org/ontology/bibo/shortTitle).

Usage

Now let see how we can work with such a structure in Clojure:

;; Return the values of the rdf/type property
(:rdf/type resource)
(resource :rdf/type)
(get resource :rdf/type)

;; Return all the properties that describes the resource
(keys resource)

;; Get the URI of the first person known by Fred
(:uri (first (:foaf/knows resource)))

;; Get the French name of the first person known by Fred
(:value (second (:iron/prefLabel (first (:rei (first (:foaf/knows resource)))))))

;; Update the name of Fred to Frederick
(update-in resource [:iron/prefLabel :value] str "erick")

;; Output the difference betweeen the original resource and the updated one
(diff resource (update-in resource [iron/prefLabel value] str "erick"))

;; Find the value of a key
(find resource iron/prefLabel)

;; Select values of multiple keys
(select-keys resource [iron/prefLabel foaf/knows])

;; Merge a resource into another resource. The URI and properties of the later resource are kept into the merged resource
(def res-1 {uri "http://foo.com/datasets/test/1"
            rdf/type owl/Thing
            iron/prefLabel "Preferred Label"})

(def res-2 {uri "http://foo.com/datasets/test/2"
            rdf/type owl/Thing
            iron/altLabel "Alternative Label"})

(merge res-1 res-2)

That is all good and easy. We use Clojure’s core functions and mechanism to easily manipulate RDF data into our application.

However, is this implementing the intuitions I started with? Definitely not. This is more like a conventional serialization format for RDF just like structJSON. The thing here is that if we want to do any kind of validation on this data, if we want the data to be self-aware of its own semantic, then it is not possible when keys are keywords. We would need external mechanisms to create that map structure, then to check what it refers to (the properties, the types, etc.). And then we would have to look them up into their respective ontologies and finally we would have to validate the data structure according to what these ontologies are saying by re-processing that map structure.

This is not quite what I had in mind and what my intuition was telling me.

Serializing RDF using Maps and Symbol Keys

Let push this idea further. What if the keys of the map that represent our RDF data are not keywords, but symbols? Symbols in Clojure name things like vars, functions, etc. Initially, let’s use symbols that refers to the URI (string) of the <predicate> and the types.

The serialization would look like:

(def resource {uri "http://foo.com/1"
               rdf/type [foaf/Person owl/Thing]
               iron/prefLabel {value "Fred"
                               datatype xsd/string}
               foaf/knows [{uri "http://foo.com/2"
                            rei [{iron/prefLabel [{value "Bob"
                                                   lang "en"}
                                                  {value "Robert"
                                                   lang "fr"}]}]}
                           {uri "http://foo.com/3"
                            rei [iron/prefLabel "Mike"]}]})

Now our resource is defined with the same structure, except that the keys are actual symbols. In this second iteration, we will consider that the symbols we defined here are representing a string which is the URI of the predicates or the types.

The real advantage of using symbol over keywords for what we are doing with these RDF serialization is that a symbol can:

  • Have a docstring
  • Have meta-data
  • The evaluation of the symbol will results into getting the actual full URI of the predicates/types

These are obvious enhancements over using keywords. First, by being able to define docstrings, which means that we will be able to document these properties and types such that Clojure IDEs can display the documentation of these symbols while you are writing/editing RDF data in Clojure.

Clojure’s meta-data system will be highly leveraged in the final candidate serialization format that I will cover in another blog post, so I won’t discuss it further for the moment.

Finally, once we evaluate such a map, we get the map along with all the evaluated properties/types which refers to their full URI. The evaluation of such as structure [(eval resource)] looks like:

{uri "http://foo.com/1", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" ["http://xmlns.com/foaf/0.1/Person" "http://www.w3.org/2002/07/owl#Thing"], "http://purl.org/ontology/iron#prefLabel" {value "Fred", datatype "http://www.w3.org/2001/XMLSchema#string"}, "http://xmlns.com/foaf/0.1/knows" [{uri "http://foo.com/2", rei [{"http://purl.org/ontology/iron#prefLabel" [{value "Bob", lang "en"} {value "Robert", lang "fr"}]}]} {uri "http://foo.com/3", rei ["http://purl.org/ontology/iron#prefLabel" "Mike"]}]}

As you can see, we can get the full description of this resource with the full expansion of the URIs referenced by the symbols.

The same parsing rules defined in the previous section applies for this new format that uses symbols instead of keys. The same comments regarding namespaces applies here too.

The usage is nearly identical except that a symbol is not a function like the keys which means that you cannot get the value of a key like this when the key is a symbol:

(rdf/type resource)

What you have to do is to access that using one of the following two methods:

(resource rdf/type)
(get resource rdf/type)

Even if we improved upon using keywords as keys for the map, we still don’t have any kind of embedded semantic or auto-validation capabilities as my intuition was telling me. It remains the same kind of structure without much significant improvements.

Serializing RDF using Maps and Symbol Keys Referring to Functions

Let’s change our mind, and let evolve this idea of symbols: what if the symbols we define in the map are actually functions instead of strings?

What!?!?

A function could be the key of a map in Clojure?

Well not directly, but yes. In Clojure symbols are naming different things such as functions. This is quite an important feature of Clojure: it makes the distinction between how things are named, and these actual things.

This means that what is really used as keys in our map structure is a symbol. However, that symbol happen to refer to a function. So it is not the function itself that is used as a key, but the actual thing that refers to it which is the symbol.

However, the result is the same: if we evaluate the map, we will get a series of symbols that evaluates to functions. That is exactly what we were looking for: that little gem, hanging around, just waiting to be picked-up.

This opens an overwhelming number of possibilities. This means that we have a data structure that can be evaluated to a series of functions and that can be executed. That is exactly what should enable us to define that Portable [RDF] Data serialization format.

That means that we won’t only be able to define RDF triples as Clojure code, but that we could even execute that Clojure code to do different things with the data, such as auto-validating itself, etc.

Finally, what if we consider RDF predicates as Clojure functions? Predicates have all kind of properties and semantics. They can be specified to be used to describe only certain kind of resources, or to refer to specific type of values. Predicates can be symmetric, functional, transitive, etc. What if we simply implement these characterics as Clojure functions? This is what this whole thing is mean to be. When evaluating and “running” that RDF map structure, we would simply execute these functions that define the semantic and characteristics of these predicates. That is exactly where lies my intuitions: we would end-up with a RDF serialization format that “embed” it own semantic and that can be used to self-validate itself by executing the structure. That is what I would refer to as Portable Data: stateful data with embedded stateful semantic.

The initial version of this other revision of the RDF serialization as Clojure code will be outlined in the next blog post since its discussion warrant a full blog post in itself. However I think that you can start understanding where I am heading with these intuitions and why I am using Clojure to test them.

Once an initial version of this serialization will be outlined, we will see how it can be used, what are the benefits, how the idea of Portable Data could be leveraged, how it can help creating and managing data using traditional IDEs such as Emacs. Once the basis will be outlined, we will have all the leisure to explore the benefits of this concept.

Clojure, Programming, Semantic Web, Structured Dynamics

Data as Code. Code as Data: Tighther Semantic Web Development Using Clojure

LhrMyRXKX9w!v!gOqzkEBlYSdf8I have been professionally working in the field of the Semantic Web for more than 7 years now. I have been developing all kind of Ontologies. I have been integrating all kind of datasets from various sources. I have been working with all kind of tools and technologies using all kind of technologies stacks. I have been developing services and user interfaces of all kinds. I have been developing a set of 27 web services packaged as the Open Semantic Framework and re-implemented the core Drupal modules to work with RDF data has I wanted it to. I did write hundred of thousands of line of codes with one goal in mind: leveraging the ideas and concepts of the Semantic Web to make me, other developers, ontologists and data-scientists working more accurately and efficiently with any kind data.

However, even after doing all that, I was still feeling a void: a disconnection between how I was thinking about data and how I was manipulating it using the programming languages I was using, the libraries I was leveraging and the web services that I was developing. Everything is working, and is working really well; I did gain a lot of productivity in all these years. However, I was still feeling that void, that disconnection between the data and the programming language.

Every time I want to work with data, I have to get that data serialized using some format, then I have to parse it using a parser available in the language I am working with. Then the data needs to be converted into an internal structure by the parser. Then I have to use all kind of specialized APIs to work with the data represented by that structure. Then if I want to validate the data that I am working with, I will probably have to use another library that will perform the validation for me which may force me to migrate that data to another system that will make it available to these reasoners and validators. Etc, etc, etc…

All this is working: I have been doing this for years. However, the level of interaction between all these systems is big and the integration take time and resources. Is there a way to do things differently?

The Pink Book

417XBWM48NL._Once I realized that, I started a quest to try to change that situation. I had no idea where I was heading, and what I would find, but I had to change my mind, to change my view-point, to start getting influenced by new ideas and concepts.

What I realized is how disconnected mainstream programming languages may be with the data I was working with. That makes a natural first step to start my investigation. I turned my chair and started to stare at my bookshelves. Then, like the One Ring, there was this little Pink (really pink) book that was staring at me: Lambda-calcul types et modèles. I bought that books probably 10 years ago, then I forgot about it. I always found its cover page weird, and its color awkward. But, because of these uncommon features, I got attracted by it.

Re-reading about lambda-calculus opened my eyes. It leaded me to have a particular interest in homoiconic programming languages such as Lisp and some of its dialects.

Code as Data. Data as Code.

Is this not what I was looking for? Could this not fill the void I was feeling? Is this not where my intuition was heading?

What if the “data” I manipulate is the same as the code I am writing? What if the data that I publish could be the code of a module of an application? What if writing code is no different than creating data? What if data could be self-aware of its own semantic? What if by evaluating data structures, I would validate that data at the same time? What if “parsing” my data is in fact evaluating the code of my application? What if I could reuse the tools and IDEs I use for programming, but for creating, editing and validating data? Won’t all these things make things simpler and make me even more productive to work with data?

My intuition tells me: yes!

We have a saying at Structured Dynamics: the right tool for the right job.

That seems to be the kind of tool I need to fill that void I was feeling. I had the feeling that the distinction between the code and the data should be as minimal as possible and homoiconic languages seems to be the right tool for that job.

Code as Data. Data as Code.

That is all good, but what does that really mean? What are the advantages and benefits?

That is the starting of a journey, and this is what we will discover in the coming weeks and months. Structured Dynamics is starting to invest resources into that new project. We choose to do our work using Clojure instead of other Lisp dialects such as Common Lisp. We choose Clojure for many reason: it is compiled in JVM bytecode. This means that you can re-use any of this code into any other Java applications and this also means that you can re-use any Java libraries natively into Clojure. But we also did use it because of its native way to handle concurrency and parallelism, its unique way to manage metadata within data structures, for its meta-programming capabilities using its macro system that enable us to create DSL, etc.

The goal was to create a new serialization format for RDF and to serialize RDF data as Clojure code. The intuition was that RDF data would then become an integral part of Clojure applications because the data would be the code as well.

The data would be self-aware of its own semantic, which means that by evaluating the Clojure “RDF” code it would also auto-validate itself using its embedded semantic. The RDF data would be in itself an [Clojure] application that would be self-aware of its own semantic and that would know how to validate itself.

That is the crux of my thinking. Then, how could this be implemented?

That is what I will cover in the coming weeks and months. We choose to use Clojure because it seems to be a perfect fit for that job. We will discover the reasons over time. However, the goal of these blog posts is to show how RDF can be serialized into [Clojure] code and the benefits of doing so. It is not about showing all the neat features of, and the wonderful minding behind Clojure. For that, I would strongly suggest you to get started with Clojure by reading the material covered in Tips for Clojure Beginners, and particularly to take a few hours to listen Rich Hickey’s great videos.