I recently started to investigate different ways to serialize RDF triples using Clojure code 1 2 3. I had at least two goals in mind: first, ending up with an RDF serialization format that is valid Clojure code and that could easily be manipulated using core Clojure functions. The second goal was to be able to “execute” the code to validate the data according to the semantics of the ontologies used to define the data.
This blog post focuses on showing how the second goal can be implemented.
Before doing so, let’s take some time to explore what the sayings of ‘
Code as Data' and ‘
Data as Code' may mean in that context.
Code as Data, Data as Code
Code as Data? It means that the program code you write is also data that can be manipulated by a program. In other words, the code you are writing can be used as input [to a macro], which can then be transformed and then evaluated. The code is considered to be data to be manipulated by a macro system to output executable code. The code itself becomes data that can be manipulated with some internal mechanism in the language. But the result of these manipulations is still executable code.
Data as Code? It means that you can use a programming language’s code to embed (serialize) data. It means that you can specify your own sublanguage (DSL), translate it into code (using macros) and execute the resulting code.
The initial goal of a RDF/Clojure serialization is to specify a way to write RDF triples (data) as Clojure (code). That code is data that can be manipulated by macros to produce executable code. The evaluation of the resulting code is the validation of the data structures (the graph defined by the triples) according to the semantics defined in the ontologies. This means that validating the graph may also occur by evaluating the resulting code (and running the functions).
In my previous blog posts about serializing RDF data as Clojure code, I noted that the properties, classes and datatypes that I was referring to in those blog posts were to be defined elsewhere in the Clojure application and that I would cover it in another blog post. Here it is.
All of the ontology properties, classes and datatypes that we are using to serialize the RDF data are defined as Clojure code. They can be defined in a library, directly in your application’s code or even as data that gets emitted by a web service endpoint that you evaluate at runtime (for data that has not yet been evaluated).
In the tests I am doing, I define RDF properties as Clojure functions; the RDF classes and datatypes are normal records that comply with the same RDF serialization rules as defined for the instance records.
Some users may wonder: why is everything defined as a
map but not the properties? Though each property’s RDF description is available as a
map, we use it as Clojure
meta-data for that function. We consider that properties are functions and not a
map. As you will see below, these functions are used to validate the RDF data serialized in Clojure code. That is the reason why they are represented as Clojure functions and not as
maps like everything else.
Someone could easily leverage the RDF/Clojure serialization without worrying about the ontologies. He could get the triples that describes the records without worrying about the semantics of the data as represented by the ontologies. However, if that same person would like to reason over the data that is presented to him — if he wants to make sure the data is valid and coherent –then he will require the ontologies descriptions.
Now let’s see how these ontologies are being generated.
Creating OWL Classes
As I said above, an OWL class is nothing but another record. It is described using the same rules as previously defined4. However, it is described using the OWL language and refers to a specific semantic. Creating such a class is really easy. We just have to follow the semantics of the OWL language, and the rules of RDF/Clojure serialization. For example, take this example that creates a simple FOAF person class:
As you can see, we are describing the class the same way we were defining normal instance records. However, we are doing it using the OWL language.
Creating OWL Datatypes
Datatypes are also serialized like normal RDF/Clojure records; that is, just like classes. However, since the datatypes are fairly static in the way we define them, I created a simple macro called
gen-datatype that can be used to generate datatypes:
You can use this macro like this:
And it will generate a datatype like this:
What this datatype defines is a class of literals that represents the full version of an US phone number. I will explain how such a datatype is used to validate RDF data records below.
Creating OWL Properties
Properties are different from classes and datatypes. They are represented as functions in the RDF/Clojure serialization. I created another simple macro called
gen-property to generate these OWL properties:
Note that this macro currently only accommodates a subset of the OWL language. For example, there is no way to use the macro to specify cardinality, etc. I only created what was required for writing this blog post.
You can then use this macro to create new properties like this:
Some other Classes, Datatypes and Properties
So, here is the list of classes, datatypes and properties that will be used later in this blog post for demonstrating how validation occurs in such a framework:
Concluding with Ontologies
Ontologies are easy to write in RDF/Clojure. There is a simple set of macros that can be used to help create the ontology classes, properties and datatypes. However, in the future I am anticipating to create a library that would use the OWLAPI to take any OWL ontology and to serialize it using these rules. The output could be Clojure code like this, or JAR libraries. Additionally, some investigation will be done to use more Clojure idiomatic projects like Phil Lord’s Tawny-OWL project.
RDF Data Instantiation Using Clojure Code
Now that we have the classes, datatypes and properties defined in our Clojure application, we can start defining data records like this:
Now that we have all of the ontologies defined in our Clojure application, we can start to define records. Let’s start with a record called
valid-record that describes something with a phone number and a preferred label. The data is there and available to you. Now, what if I would like to do a bit more than this, what if I would like to validate it?
Validating such a record is as easy as evaluating it. What does that mean? It means that each
value of the
map that describes the record will be evaluated by Clojure. Since each key refers to a function, then evaluating each value means that we evaluate the function and use the value as specified by the description of the record. Then we iterate over the whole map to validate all of the triples.
To perform this kind of process, we can create a
validate-resource function that looks like:
You can use it like this:
If no exceptions are thrown, then the record is considered valid according to the ontology specifications. Easy, no? Now let’s take a look at how this works.
If you check the
gen-property macro, you will notice that every time a function is evaluated, the
#'rdf.property/validate-property function is called. What this function does is to perform the validation of the property given the specified value(s). The validation is done according to the description of the property in the ontology specification. Such a
validate-property looks like:
So what it does is to run a series of other functions to validate different characteristics of a property. For this blog post, we demonstrate how the following characteristics are being validated:
- Cardinality of a property
- URI validation
- Datatype validation
- Range validation when the range is a class.
Validating the cardinality of a property means that we check if the number of values of a given property is as specified in the ontology. In this example, we validate the exact cardinality of a property. It could be extended to validate the maximum and minimum cardinalities as well.
The function that validates the cardinality is the
validate-owl-cardinality function that is defined as:
For each property, it checks to see if the
owl:cardinality property is defined. If it is, then it makes sure that the number of values for that property is valid according to what is defined in the ontology. If there is a mismatch, then the validation function will throw an exception and the validation process will stop.
Here is an example of a record that has a cardinality validation error as defined by the property (see the description of the property below):
Everything you define in RDF/Clojure has a URI. However, not every string is a valid URI. All of the URIs you may define can be validated as well. When you define a URI, you use the
#'rdf.core/uri function to specify the URI. That function is defined as:
As you can see, we are using the
java.net.URI function to validate the URI you are defining for your records/classes/properties/datatypes. If you make a mistake when writing a URI, then a validation error will be thrown and the validation process will stop.
Here is an example of a record that has an invalid URI:
In OWL, a datatype property is used to refer to literal values that belong to classes of literals (datatypes classes). A datatype class is a class that represents all the literals that belong to that class of literal values as defined by the datatype. For example, the
*full-us-phone-number datatype we described above defines the class of all the literals that are full US phone numbers.
Validating the value of a property according to its datatype means that we make sure that the literal value(s) belong to that datatype. Most of the time, people will use the
XSD datatypes. If custom datatypes are created, then they will be based on one of the XSD datatypes, and a regex pattern will be defined to specify how the literal should be constructed.
What this function does is to validate the range of a property. It checks what kind of values that exist for the input property according to the RDF/Clojure specification (is it a
var, etc.?). Then it checks if the property is an
object property or a
datatype property. If it is a
datatype property, then it checks if a
range has been defined for it. If it does, then it validates the value(s) according to the datatype defined in the range of the property.
Here is an example of a few records that have different datatype validation errors:
As you can see, the
validate-rdfs-range is incomplete regarding datatype validation. I am still updating this function to make sure that we validate all the existing
XSD datatypes. Then we have to better validate the custom datatypes to make sure that we consider their
xsp:base type, etc. The code that should be created is similar to the one I created for the Data Validation Tool (which is written in PHP).
Range validation when the range is a class
Finally, let’s shows how the range of an
object property can be validated. Validating the range of an object property means that we make sure that the record referenced by the object property belongs to the class of the range of the property.
For example, consider a property
foo:knows that has a range that specifies that all the values of
foo:knows needs to belong to the class
+person. This means that all of the values defined for the
foo:knows property for any record needs to refer to a record that is of type
umbel-rc:+person. If it is not the case, then there is a validation error.
Here is an example of a record where the
foo:knows property is not properly used:
Remember we defined the
foo:knows property with the range of
umbel-rc:+person. However, in the example, the reference is to a
wrench record that is of type
umbel-rc:+product. Thus, we get a validation error:
The function that validates the ranges of the object properties is defined as:
Normally, this kind validation should be done using the descriptions of the loaded ontologies. However, for the benefit of this blog post, I used a different way to perform this validation. I purposefully used some UMBEL Reference Concepts as the type of the records I described. Then the object range validation function leverages the
UMBEL super-classes web service endpoint to check get the
super-classes of a given class.
So what this function does is to check the type of the record(s) referenced by the
foo:knows property. Then it checks the type of these record(s). What needs to be validated is whether the type(s) of the referenced record is the same, or is included, in the class defined in the range of the
In our example, the range is
#'umbel-rc:+person. This means that the
foo:knows property can only refer to
umbel-rc:+person records. In the example where we have a validation error, the type of the
wrench record is
umbel-rc:+product. What the validation function does is to get the list of all the
super classes of the
umbel-rc:+product class, and check if it is a
sub-class of the
umbel-rc:+person class. In this case, it is not, thus an error is thrown
What is interesting with this example is the
UMBEL super-classes web service endpoint does return the list of super classes as Clojure code. Then we use the
read-string function to evaluate the list before manipulating it as if it was part of the application’s code.
What is elegant with this kind RDF/Clojure serialization is that the validation of RDF data is the same as evaluating the underlying code (
Data as Code). If the data is invalid, then exceptions are thrown and the validation process aborts.
One thing that I yet have to investigate with such a RDF/Clojure serialization is how the semantics of the properties, classes and datatypes could be embedded into the RDF/Clojure records such that we end up with stateful RDF records that embed their own semantic at a specific point in time. This leverage would mean that even if an ontology changes in the future, the records will still be valid according to the original ontology that was used to describe them at a specific point in time (when they got written, when they got emitted by a web service endpoint, etc.).
Also, as some of my readers pointed out with my previous blog post about this subject, the fact that I use
vars to serialize the RDF triples means that the serialization won’t produce valid ClojureScript code since
vars doesn’t exists in ClojureScript. Paul Gearon was proposing to use
keywords as the key instead of
vars. Then to get the same effect as with the
vars, to use a lookup index to call the functions. This avenue will be investigated as well and should be the topic of a future blog post about this RDF/Clojure serialization.