In my previous blog post RDF Code: Serializing RDF Data as Clojure Code I did outline a first version of what a RDF serialization could look like if it would be serialized using Clojure code. However, after working with this proposal for two weeks, I found a few issues with the initial assumptions that I made that turned out to be bad design decisions in terms of Clojure code.
This blog post will discuss these issues, and I will update the initial set of rules that I defined in my previous blog post. Going forward, I will use the current rules as the way to serialize RDF data as Clojure code.
What Was Wrong
After two weeks of using the previous set of serializations rules and developing all kind of functions that uses that codes in the context of UMBEL graph traversal and analysis I found the following issues:
Ontologiesshould all be in the same namespace (and not in different namespaces)
- The prefix/entity separator for the RDF resources should be a
colonand not a
These are the three serialization rules that changed after working with the previous version of the proposal. Now, let’s see what caused these changes to occur.
Keys and Values as Vars
The major change is that when we serialize RDF data as Clojure map structures, the
values that are not
strings, should be
There are three things that I didn’t properly evaluated when I first outlined the specification:
- The immutable nature of the Clojure data structures
- The dependency between ontologies
- The non-cyclical namespaces dependency rule imposed by Clojure
In the previous proposal, every RDF property were Clojure functions and they were also the keys of the Clojure maps that were used to serialize the RDF resources. That was working well. However, there was a side effect to this decision: everything was fine until the function’s internal ID changed.
The issue here is that when we work with Clojure maps, we are working with immutable data structures. This means that even if I create a RDF record like this:
And that somehow, in the compilation process the RDF ontology file get re-compiled, then the internal ID of the
rdf/type property (function) will change. That means that if I create another record like this:
that uses the same
rdf/type function, then these two records would refer to different
rdf/type functions since it changed between the time I created the
mike and the
mike-2 resources. That may not look like an issue since both functions does exactly the same thing. However, this is an issue since for multiple tasks to manipulate and query RDF data rely on comparing these keys (so, these functions). That means that unexpected behaviors can happen and may even looks like random.
The issue here was that we were not referring to the
Var that point to the function, but the function itself. By using the
Var as the
values of the
map, then we fix this inconsistency issue. What happens is that all the immutable data structure we are creating are referring to the
Var which point to the function. That way, when we evaluate the
Var, we will get reference to the same function whatever when it got created (before or after the creation of
mike-2). Here is what the
mike records looks like with this modification:
We use the
#' macro reader to specify that we use the
Var as the
values of the
map and not the actual functions or other values referenced by that
The second and third issues I mentioned are tightly related. In a RDF & OWL world, there are multiple examples of ontologies that re-use external ontologies to describe their own semantic. There are cases where an ontology A use classes and properties from an ontology B and where the ontology B use classes and properties from an ontology A. They cross-use each other. Such usage cycles exists in RDF & OWL and are not that uncommon neither.
The problem with that is that at first, I was considering that each OWL ontologies that were to be defined as Clojure code would be in their own Clojure namespace. However, if you are a Clojure coder, you can envision the issue that is coming: if two ontologies cross-use each other, then it means that you have to create a namespace dependency cycles in your Clojure code… and you know that this is not possible because this is restricted by the compiler. This means that everything works fine until this happens.
To overcome that issue, we have to consider that all the ontologies belong to the same namespace (like
clojure.core). However, in my next blog post that will focus on these ontologies description I will show how we can split the ontologies in multiple files while keeping them in the same namespace.
Now that we should have all the ontologies in the same namespace, and that we cannot use the namespaced symbols of Clojure anymore, I made the decision to use the more conventional way to write namespaced properties and classes in other RDF serializations which is to delimit the ontology’s
prefix with a
colon like that:
Revision of the RDF Code Rules
Now let’s revise the set of rules that I defined in the previous blog post:
- A RDF
resourceis defined as a Clojure
Varthat point to a
stringis considered a
stringis considered a
valuekey is present
maprepresent a reference to another
urikey is present
mapis invalid if it doesn’t have a
vectorrefer to multiple values. Values of a vector can be
symbolcan be created to simplify the serialization. However, these symbols have to reference a
varreference another entity
In addition to these rules, there are some more specific rules such as:
- The value of a
urikey is always a
- If the #’
rdf:typekey is not defined for a
resource, then the
resourceis considered to be of type #’
owl:+thing(since everything is at least an instance of the
Finally, there are two additional
datatypes creation conventions:
- The name of the
classesstarts with a
+sign, like: #’
- The name of the
datatypesstarts with a
*sign, like: #’
As you can see, the rules that govern the serialization of RDF data as Clojure code are minimal and should be simple to understand for someone who is used to Clojure code and that tried to write a few resource examples using this format. Now, let’s apply these rules with a series of examples.
Note 1: in the examples of this blog post, I am referring to Vars like #’
datatype, etc. To make the rules simpler to read and understand, consider that these Vars are defined in the
user‘s namespace. However, they are vars that are defined in the
rdf.core namespace that will be made publicly available later.
Note 2: All the properties and classes resource Vars have been defined in the same namespace. They should be included with
(:use [ontologies.core]) from the
ns function of the Clojure source code file that define this RDF resource. We will discuss about these namespaces in a subsequent blog post.
Revision of Serializing RDF Code in N-Triples
serialize-ntriples function got modified to comply with the new set of rules:
Serializing a RDF Resource
Now let’s serialize a new RDF resource using the new set of rules:
One drawback with these new rules (even if essential) is that they complexify the writing of the RDF resources because of the (heavy) usage of the
However, on the other hand, they may looks like more familiar to people used to RDF serializations because of the usage of the
colon instead of the
slash to split the ontology prefix with the ending of the URI.
What we have above, is how the RDF data is represented in Clojure. However, there is a possibility to make this serialization less compact by creating a macro that would change the input map and automatically inject the usage of the
#' reader macro into the map structures that define the RDF resources.
Here is the
r macro (“r” stands for Resource) that does exactly this:
Then you can use it to define all the RDF resources you want to create:
That structure is equivalent to the other one because the
r macro will add the
#' reader macro calls to change the input map before creating the resource’s
By using the
r macro, we can see that the serialization is made much simpler, and that at the end, it is more natural to people used to other RDF serializations.
I used the initial specification in the context of creating a new series of web services for the UMBEL project. This heavy usage of this kind of RDF data leaded to discover the issues I covered in this blog post. Now that these issues are resolved, I am confident that we can move forward in the series of blog posts that covers how (and why!) using Clojure code to serialize RDF data.
The next blog post will cover how to manage the ontologies used to instantiate these RDF resources.