RDF Code
is a specification to serialize RDF data as Clojure code. This blog post introduce the first version of this new RDF serialization format. I will outline the rules that specify how such RDF data should be serialized using the Clojure programming language.
This specification may change over time. However, this is the specification that will be used for the future blog posts that I will write about this subject, and for the code that will be released.
I am also expecting feedbacks and propositions to make this serialization easier to use, simpler to define and cleaner.
What we do with this serialization is to write RDF resources as Clojure maps. This is not about defining a DSL to manipulate this data, but really to define the core Clojure structure that will be manipulated by Clojure functions and applications. Eventually a (or multiple) DSL could be created to help users and developers using this RDF data in their Clojure application. But this is not the current focus.
A Complete RDF Resource
Before outlining all the rules to create well-formed RDF data as Clojure code, let’s take a look at a resource
that uses all the serialization features. Note that this is the more complex it can be. Below we will see how we can normalize the usage of the serialization rules to end up with clean and easy to read RDF data as Clojure code.
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label “Fred”
iron/alt-label {value “Frederick”
lang “en”}
foaf/skypeID {value “frederick.giasson”
datatype xsd/*string}
foaf/knows [{uri “http://foo.com/datasets/people/bob”}
mike
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]
This code shows how to:
- Serialize a single
resource
as a Clojure map
- How to define a
URI
for that resource
- How to define one or multiple
rdf:type
for a resource
- How to define one or multiple values for a
owl:DatatypeProperty
- How to define one or multiple values for a
owl:ObjectProperty
- How to define the
language
of a Literal
- How to define the
datatype
of a Literal
As you can see, such a RDF serialization format is expressive enough to be able to express any RDF triples. It also has syntactic rules that help reading and writing RDF data in that format.
Note that since this RDF data is also Clojure code, it means that the serialization format has been highly influenced by Clojure’s own syntax and coding style. RDF data serialized using this format also needs to be valid Clojure code. Now, let’s outline the rules that govern the creation of such RDF data, then let explains all these rules using simple RDF code examples.
RDF Code Rules
Here is the list of all the rules that govern the creation of RDF data serialized as Clojure code:
- A RDF
resource
is defined as a Clojure map
where:
- Every
key
is a symbol
that reference a function
- Every
value
is a:
string
- A
string
is considered a literal
if the key
is a owl:DatatypeProperty
- A
string
is considered a URI
if the key
is a owl:ObjectProperty
map
- A
map
represent a literal
if the value
key is present
- A
map
represent a reference to another resource
if the uri
key is present
- A
map
is invalid if it doesn’t have a uri
nor a value
key
vector
- A vector refer to multiple values. Values of a vector can be
strings
, maps
or symbols
symbol
- A
symbol
can be created to simplify the serialization. However, these symbols have to reference a string
or a map
object
In addition to these rules, there are some more specific rules such as:
- The value of a
uri
key is always a string
- If the
rdf/type
key is not defined for a resource
, then the resource
is considered to be of type owl:Thing
(since everything is at least an instance of the owl:Thing
class in OWL
)
Finally, there are two additional classes
and datatypes
creation conventions:
- The name of the
classes
starts with a +
sign, like: owl/+thing
- The name of the
datatypes
starts with a *
sign, like: xsd/*string
As you can see, the rules that govern the serialization of RDF data as Clojure code are minimal and should be simple to understand for someone who is used to Clojure code and that tried to write a few resource examples using this format. Now, let’s apply these rules with a series of examples.
Note 1: in the examples of this blog post, I am referring to symbols like uri
, value
, lang
, datatype
, etc. To make the rules simpler to read and understand, consider that these symbols are defined in the user
‘s namespace. However, they are symbols that are defined in the rdf.core
namespace that will be made publicly available later.
Note 2: I am also referring to namespaced symbols like rdf/type
, iron/pref-label
, etc. These symbols are defined in their respective namespaces. They have been defined such as (:require [ontologies.rdf :as rdf] [ontologies.iron :as iron])
into the ns
function of the Clojure source code file that define this RDF resource. We will discuss about these namespaces in a subsequent blog post.
Serializing RDF Code in N-Triples
Before starting to list all the examples, let’s define a Clojure function that we will use to convert the RDF code as N-Triples
. N-Triples
is just a list of <subject> <predicate> <object>
triples that describes the RDF resources we are describing. N-Triples
is the simplest and most verbose RDF serialization that currently exists. What this serialize-ntriples
function does is to take a Clojure map
that represent a RDF resource
and return a string
that represents the serialized N-Triples
.
What is important with that code is that is shows how the rules we outlined above got implemented to serialize RDF code as N-Triples. Such serializer function could be created to serialize Turtle and XML RDF serializations as well. Note that you won’t be able to use the serialize-ntriples
function because you are missing the ontologies files. I will make them available in a subsequent blog post that will explain how properties, classes and datatypes are created and used in this context.
[cc lang=’lisp’ line_numbers=’false’]
[raw](declare serialize-ntriples-map-value serialize-ntriples-string-value is-datatype-property?)
(defn serialize-ntriples
[resource]
(let [n3 (atom “”)
iri (get resource rdf.core/uri)]
(doseq [[property vals] resource]
(let [property-uri (get (meta property) rdf.core/uri)]
; Don’t do anything with the “uri” key
(if (not= property rdf.core/uri)
(if (vector? vals)
; Here the value is a vector of maps or values
(doseq [val vals]
(if (map? val)
; The value of the vector is a map
(reset! n3 (str @n3 (serialize-ntriples-map-value val iri property-uri)))
(if (string? val)
; The value of the vector is a string
(reset! n3 (str @n3 (serialize-ntriples-string-value val iri property-uri property))))))
(if (map? vals)
; The value of the property is a map
(reset! n3 (str @n3 (serialize-ntriples-map-value vals iri property-uri)))
(if (string? vals)
; The value of the property is some kind of literal
(reset! n3 (str @n3 (serialize-ntriples-string-value vals iri property-uri property)))))))))
@n3))
(defn- serialize-ntriples-map-value
[m iri property-uri]
(if (not (nil? (get m rdf.core/uri)))
; The value is a reference to another resource
(format “<%s> <%s> <%s> .\n” iri property-uri (get m rdf.core/uri))
(if (not (nil? (get m rdf.core/value)))
; The value is some kind of literal
(let [value (get m rdf.core/value)
lang (if (get m rdf.core/lang) (str “@” (get m rdf.core/lang)) “”)
datatype (if (get m rdf.core/datatype) (str “^^<” (get (get m rdf.core/datatype) rdf.core/uri) “>”) “”)]
(format “<%s> <%s> \”\”\”%s\”\”\”%s%s .\n” iri property-uri value lang datatype))
(if (string? m)
; The value of the sector is some kind of literal
(format “<%s> <%s> \”\”\”%s\”\”\” .\n” iri property-uri m)))))
(defn- serialize-ntriples-string-value
[s iri property-uri property]
; The value of the vector is a string
(if (true? (is-datatype-property? property))
; The property referring to this value is a owl:DatatypeProperty
(format “<%s> <%s> \”\”\”%s\”\”\” .\n” iri property-uri s)
; The property referring to this value is a owl:ObjectProperty
(format “<%s> <%s> <%s> .\n” iri property-uri s)))
(defn is-datatype-property?
[property]
(if (= (get (get (meta property) rdf/type) rdf.core/uri) (get ontologies.owl/+datatype-property rdf.core/uri))
(eval true)
(eval false)))[/raw]
[/cc]
The Simplest Resource
Here is the simplest resource
that can be written:
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type “http://xmlns.com/foaf/0.1/Person”
iron/pref-label “Fred”
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]
It produces these triples:
[raw]
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .
[/raw]
In this example, all the references to the classes
and to other resources
are made using strings
that represent URIs
. Then all the literal
values are normal strings
as well. The vector
is composed of a list of URIs
without mixing different type of values.
Using Classes Symbols
In this example, we will use the symbol that reference a class resource
we have defined in an ontology
:
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type foaf/+person
iron/pref-label “Fred”
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]
The same set of triples as the previous example will be generated by serialize-ntriples
.
What is interesting with this new example is that it is more appealing to human readers. Instead of having a full URI string, we are seeing a symbol which is more pleasant to the eyes.
But the real benefit is no that it is more pleasant to the eyes, but that it really refers to something. It refers to a class resource
. This means that we have a docstring
for that class
, that we can check the code that describes the class
to see all its characteristics, that you will be able to auto-complete it in your IDE, etc.
Multiple Types
It is possible to define multiple types for a resource. The only thing you have to do is to use a vector
as the value of the rdf/type
key:
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label “Fred”
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]
This code will generate the following triples:
[raw]
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .
[/raw]
Specifying a Datatype for a Literal
It is also possible to define a datatype
for a literal
. What you have to do is to use a map
with a value
and a datatype
key:
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label {value “Fred”
datatype xsd/*string}
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]
The triples that will be generated are:
[raw]
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred"""^^<http://www.w3.org/2001/XMLSchema#string> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .
[/raw]
Specifying a Language Tag for a Literal
It is also possible to define a language
tag for a literal
. What you have to do is to use a map
with a value
and a lang
key:
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label {value “Fred”
lang “en”}
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
“http://foo.com/datasets/people/mike”
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]
This code will produce the following triples:
[raw]
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://foo.com/datasets/people/fred> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://foo.com/datasets/people/fred> <http://purl.org/ontology/iron#prefLabel> """Fred"""@en .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/skypeID> """frederick.giasson""" .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/bob> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/mike> .
<http://foo.com/datasets/people/fred> <http://xmlns.com/foaf/0.1/knows> <http://foo.com/datasets/people/teo> .
[/raw]
Referring to a URI Using a Map
It is possible to explicit the fact that a literal value is a URI reference. It is possible to do that by using a map and by specifying the uri
key. Some people may prefer that approach because it makes the fact that a literal is a URI explicit without having to know the nature of the key
(i.e. if the property is a datatype or an object property):
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label {value “Fred”
lang “en”}
foaf/skypeID “frederick.giasson”
foaf/knows [{uri “http://foo.com/datasets/people/bob”}
{uri “http://foo.com/datasets/people/mike”}
{uri “http://foo.com/datasets/people/teo”}]})[/raw]
[/cc]
The same triples will be generated as the example above.
Mixing Values in a Vector
It is possible to mix the values in a vector. In the following example, we are using literals as URIs, and maps that refer to URIs as well. The rules permit this kind of values mixing within a vector and the software that manipulate this kind of RDF data should be agnostic to this:
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred {uri “http://foo.com/datasets/people/fred”
rdf/type [foaf/+person owl/+thing]
iron/pref-label {value “Fred”
lang “en”}
foaf/skypeID “frederick.giasson”
foaf/knows [“http://foo.com/datasets/people/bob”
{uri “http://foo.com/datasets/people/mike”}
“http://foo.com/datasets/people/teo”]})[/raw]
[/cc]
The same triples will be generated as the example above.
Using Symbols to Get Cleaner Code
Because of the way Clojure works, you can define new symbols that will refer to the values to add to this structure. Let’s go wild, and do define a symbol for each value of the resource we are describing:
[cc lang=’lisp’ line_numbers=’false’]
[raw](def fred-uri “http://foo.com/datasets/people/fred”)
(def foaf_person foaf/+person)
(def owl_thing owl/+thing)
(def fred-label {value “Fred”
lang “en”})
(def fred-skype-id “frederick.giasson”)
(def bob-uri “http://foo.com/datasets/people/bob”)
(def mike-uri {uri “http://foo.com/datasets/people/mike”})
(def teo-uri “http://foo.com/datasets/people/teo”)
(def fred {uri fred-uri
rdf/type [foaf/+person owl/+thing]
iron/pref-label fred-label
foaf/skypeID fred-skype-id
foaf/knows [bob-uri
mike-uri
teo-uri]})[/raw]
[/cc]
The triples generated will be the same as the example above.
What happens here is that the Clojure reader substitutes the symbols by the actual things they refer to at evaluation time.
This is why this works and it is why it is still valid RDF code
according to the rules outlined above. The serialize-ntriples
function is not even aware that this data structure has been defined that way. It is the case because the map that it will receive as input will already be evaluated. This means that what it gets as input are the objects referenced by these symbols, and not the symbols themselves.
This mechanism can be leveraged to make the RDF code even more readable if you have patterns that a repeated constantly in a dataset file you are creating.
Conclusion
This is all good, but why yet another serialization for RDF? As I started to outline in my two previous blogposts on this topic, this is not just about serializing RDF in another format. It is about having RDF data that can be evaluated as code (in this case, Clojure code).
Considering RDF properties as functions open-up a World of possibilities. First, it means that by evaluating and compiling this kind of RDF code, you have a data structure that is able to validate itself according to the way the properties are defined into the OWL ontologies used to define their behaviors and semantics.
This blog post focused on how to serialize instances data. In the next blog posts, I will cover:
- How to reify triples using this serialization
- How to create the classes, properties and datatypes used by the serialization
- How to validate the data structure
- How to manage ontologies as Clojure packages
Feel free to comment this blog post and propose changes to the serialization format or the serialize-ntriples
function.