I have been professionally working in the field of the Semantic Web for more than 7 years now. I have been developing all kind of Ontologies. I have been integrating all kind of datasets from various sources. I have been working with all kind of tools and technologies using all kind of technologies stacks. I have been developing services and user interfaces of all kinds. I have been developing a set of 27 web services packaged as the Open Semantic Framework and re-implemented the core Drupal modules to work with RDF data has I wanted it to. I did write hundred of thousands of line of codes with one goal in mind: leveraging the ideas and concepts of the Semantic Web to make me, other developers, ontologists and data-scientists working more accurately and efficiently with any kind data.
However, even after doing all that, I was still feeling a void: a disconnection between how I was thinking about data and how I was manipulating it using the programming languages I was using, the libraries I was leveraging and the web services that I was developing. Everything is working, and is working really well; I did gain a lot of productivity in all these years. However, I was still feeling that void, that disconnection between the data and the programming language.
Every time I want to work with data, I have to get that data serialized using some format, then I have to parse it using a parser available in the language I am working with. Then the data needs to be converted into an internal structure by the parser. Then I have to use all kind of specialized APIs to work with the data represented by that structure. Then if I want to validate the data that I am working with, I will probably have to use another library that will perform the validation for me which may force me to migrate that data to another system that will make it available to these reasoners and validators. Etc, etc, etc…
All this is working: I have been doing this for years. However, the level of interaction between all these systems is big and the integration take time and resources. Is there a way to do things differently?
The Pink Book
Once I realized that, I started a quest to try to change that situation. I had no idea where I was heading, and what I would find, but I had to change my mind, to change my view-point, to start getting influenced by new ideas and concepts.
What I realized is how disconnected mainstream programming languages may be with the data I was working with. That makes a natural first step to start my investigation. I turned my chair and started to stare at my bookshelves. Then, like the One Ring, there was this little Pink (really pink) book that was staring at me: Lambda-calcul types et modèles. I bought that books probably 10 years ago, then I forgot about it. I always found its cover page weird, and its color awkward. But, because of these uncommon features, I got attracted by it.
Re-reading about lambda-calculus opened my eyes. It leaded me to have a particular interest in homoiconic programming languages such as Lisp and some of its dialects.
Code as Data. Data as Code.
Is this not what I was looking for? Could this not fill the void I was feeling? Is this not where my intuition was heading?
What if the “data” I manipulate is the same as the code I am writing? What if the data that I publish could be the code of a module of an application? What if writing code is no different than creating data? What if data could be self-aware of its own semantic? What if by evaluating data structures, I would validate that data at the same time? What if “parsing” my data is in fact evaluating the code of my application? What if I could reuse the tools and IDEs I use for programming, but for creating, editing and validating data? Won’t all these things make things simpler and make me even more productive to work with data?
My intuition tells me: yes!
We have a saying at Structured Dynamics: the right tool for the right job.
That seems to be the kind of tool I need to fill that void I was feeling. I had the feeling that the distinction between the code and the data should be as minimal as possible and homoiconic languages seems to be the right tool for that job.
Code as Data. Data as Code.
That is all good, but what does that really mean? What are the advantages and benefits?
That is the starting of a journey, and this is what we will discover in the coming weeks and months. Structured Dynamics is starting to invest resources into that new project. We choose to do our work using Clojure instead of other Lisp dialects such as Common Lisp. We choose Clojure for many reason: it is compiled in JVM bytecode. This means that you can re-use any of this code into any other Java applications and this also means that you can re-use any Java libraries natively into Clojure. But we also did use it because of its native way to handle concurrency and parallelism, its unique way to manage metadata within data structures, for its meta-programming capabilities using its macro system that enable us to create DSL, etc.
The goal was to create a new serialization format for RDF and to serialize RDF data as Clojure code. The intuition was that RDF data would then become an integral part of Clojure applications because the data would be the code as well.
The data would be self-aware of its own semantic, which means that by evaluating the Clojure “RDF” code it would also auto-validate itself using its embedded semantic. The RDF data would be in itself an [Clojure] application that would be self-aware of its own semantic and that would know how to validate itself.
That is the crux of my thinking. Then, how could this be implemented?
That is what I will cover in the coming weeks and months. We choose to use Clojure because it seems to be a perfect fit for that job. We will discover the reasons over time. However, the goal of these blog posts is to show how RDF can be serialized into [Clojure] code and the benefits of doing so. It is not about showing all the neat features of, and the wonderful minding behind Clojure. For that, I would strongly suggest you to get started with Clojure by reading the material covered in Tips for Clojure Beginners, and particularly to take a few hours to listen Rich Hickey’s great videos.
13 thoughts on “Data as Code. Code as Data: Tighther Semantic Web Development Using Clojure”
I have the same idea for RDF/OWL – so I wish you well in this. I worked with lisp in the late 80’s & early 90’s. But now I will come back to it with Clojure – which I am starting to learn. (retiring next year and will pursue this next – as a fun second career). It became clear to me when using at hiccup for html…that this idea has merit. Both Clojure & RDF/OWL are now quite robust…and likely powerful development stack.
Well, I am certainly sold to RDF & OWL. I like their concepts and the minding behind them. For me (and Structured Dynamics), everything is a triple and everything is (should) be driven by ontologies. It may appears dogmatic, but it is just that this is the best way (and most natural for us) we found to work with data. So I agree to say that these technologies are mature and quite robust.
However, I am always pursuing to find or create new tools to leverage the RDF & OWL concepts (which makes us more productive). That is the goal, and that is the goal behind this new effort. Clojure happens to be a really good fit to RDF, and it happens to be fun to work with (which is essential when you work with something 8 hours per day, day after day after day 🙂 ).
I am just starting my investigation, and many more blog posts will be released in the coming weeks and months about this idea. I will re-implement the UMBEL web services using these new principles as well and will be released along with the next UMBEL version.
Hope you continue to read these posts, and suggest changes or ideas if you find any issues with what I will present here.
Disclo[j]ure: I am learning may way with Clojure at the same time, so the code I am writing may not be optimal, but is certainly working.
Hi Frederick – I appreciate your dogmatism regarding ontologies. I am leaving the life sciences field. There, we have data integration requirements that ontologies are a perfect match for. However – we also need to visualize the “data”…with COTS tools requiring a SQL table as interface. So sometimes we have to migrate triples to structured storage…also for performance and transactions. For an alternative to this I am interested in Datomic dbms. In fact – in the end – the marriage of RDF/OWL & Datomic feels even more right to me. It can be programmed with datalog and has an incestuous relationship with Clojure. I suggest you might look at Datomic as well as Clojure…my two cents. I will be following your progress with much interest.
take a look at my talk about Graphity (a generic processor for declarative Linked Data applications) architecture and see if it rings any bells 🙂
Graphityâ€”generic Linked Data platform for interactive Web applications
I agree with some of your premisses like when you say that we need things that remove abstraction levels (layers) like the inflexible objects that encapsulate the RDF meaning. I totally agree with that, and this is quite important. However, I would push that further: having RDF data fully defined in native core basic data structures such as Clojure maps should make RDF data much easier to manipulate to normal developers that doesn’t have much, if at all, knowledge into RDF/OWL technologies and specifications. They could use and manipulate RDF data with the tools they are used to (like all the map processing functions and macros, etc). This means that instead of using something like SPIN for developing these rules, they could use simple Clojure code.
Also, you mentioned “functional code” in a few slices, but you never pushed it further (and Graphity doesn’t seem to take advantage of this neither). So I am not quite sure where you were heading with this in your presentation and how it fits Graphity.
Frederick, I think we’re on the same page 🙂 What I don’t understand however is why would you still prefer source code over (RDF) data? Even though that code might also function as data, as in the case of Lisp or similar languages.
In my mind it is much more flexible to manipulate RDF than code — you can put it in a triplestore, you can query it, you can provide a user interface for it. Source code ties you to a platform while RDF does not. Native RDF support in programming languages might also help its adoption, but I think it is less realistic and more time-consuming.
I think in the long-term development of custom (semantic) code should shrink and be replaced by generic data management applications.
Graphity is written in Java but it is functional in the sense that all class members are final, there are no setter methods, and the system keeps no state. It would be interesting to port it to Clojure, shouldn’t be too hard.
Please join our Declarative Linked Data Apps Community Group: http://www.w3.org/community/declarative-apps/
I need to post something there soon 🙂 Maybe we can develop these ideas further there.
These are good questions 🙂 First of all, let’s keep in mind that this is a research project at the moment, and we are investigating the potential of “Code as Data; Data as Code”.
However, what we are finding at the moment is quite promising.
You will learn more in the coming week with the future blog posts that I will write on this blog, however, let’s mention a few things here.
First, this “RDF Clojure Serialization” is nothing else than yet another RDF serialization. Any RDF libraries could parse it like RDF+N3, RDF+XML, etc. So, that is really not a problem. If you prefer working with another serialization, then you could convert it back into one of these other serialization, and convert it back after. It is no different.
However, because this kind of serialization is also code, it means much (at least, to Clojure developers). This means that you can easily manipulate this data as core Clojure structure. It means that you can easily create your own code to manipulate the graph structure without having to rely on other specifications and libraries (that you may not know how they work, etc). For example, one could write a Clojure function to infer everything related to transitive OWL/RDFS properties in about 10 lines of code. That is fast, effective and really performant.
Another advantage is that at any time, you can tell your data structure to evaluate itself and to find any kind of errors: serialization error, semantic errors (related to the classes, datatypes, etc).
Then you can leverage a series of IDE that would help you create and manage this data via existing tooling such as contextual documentation, auto-completion, etc.
These are the things we will discover in the coming blog posts.
Hi Frederick –
I’m wondering if you’re familiar with the work of Phillip Lord and his Tawny Owl project to use Clojure to build OWL ontologies? Github here, https://github.com/phillord/tawny-owl, online journal here, http://www.russet.org.uk/blog/. I’ve been keeping an eye on it for awhile, eager to try it out when I get a chance – now want to take a look at your work as well.
Sure I am 🙂 Philip’s work is well recognized, and many people already made outreaches to me regarding it. In fact, as a Semantic Web scientist, the first thing I did when I started this work is to check what was existing related to RDF & OWL with Clojure, and obviously I came across his work.
However the goals are different I think (but certainly not exclusive). Tawny is really a DSL to create and manage OWL ontologies using a DSL created using Clojure. Right now, my own focus is really related to serializing RDF triples in Clojure code (which also includes triples that define OWL ontologies). The goal is to be able to easily manipulate RDF triples (and so information) using core Clojure functions then to have that data evaluated to “validate itself” according to the description of the ontologies. That is what I am doing with the new release of the UMBEL web services for example.
Is there a status update to your investigations?
David, if you check the “Clojure” category on this blog post you will see subsequent blog posts about that. However, it worked but the complexity of the serialization was probably too big. Right now we are experimenting with a EDN serialization of RDF that we use internally for the same purpose.
Hey Frederick, I’m working with Tawny. Did Datomic ever surface as a possibility for serialization?
You mean, with Transit? I did some more experiment with EDN and Transit, but nothing that great. Transit was kind of slow for the serialization/deserialization because of all the handlers that were triggered in the process.