I have been professionally working in the field of the Semantic Web for more than 7 years now. I have been developing all kind of Ontologies. I have been integrating all kind of datasets from various sources. I have been working with all kind of tools and technologies using all kind of technologies stacks. I have been developing services and user interfaces of all kinds. I have been developing a set of 27 web services packaged as the Open Semantic Framework and re-implemented the core Drupal modules to work with RDF data has I wanted it to. I did write hundred of thousands of line of codes with one goal in mind: leveraging the ideas and concepts of the Semantic Web to make me, other developers, ontologists and data-scientists working more accurately and efficiently with any kind data.
However, even after doing all that, I was still feeling a void: a disconnection between how I was thinking about data and how I was manipulating it using the programming languages I was using, the libraries I was leveraging and the web services that I was developing. Everything is working, and is working really well; I did gain a lot of productivity in all these years. However, I was still feeling that void, that disconnection between the data and the programming language.
Every time I want to work with data, I have to get that data serialized using some format, then I have to parse it using a parser available in the language I am working with. Then the data needs to be converted into an internal structure by the parser. Then I have to use all kind of specialized APIs to work with the data represented by that structure. Then if I want to validate the data that I am working with, I will probably have to use another library that will perform the validation for me which may force me to migrate that data to another system that will make it available to these reasoners and validators. Etc, etc, etc…
All this is working: I have been doing this for years. However, the level of interaction between all these systems is big and the integration take time and resources. Is there a way to do things differently?
The Pink Book
Once I realized that, I started a quest to try to change that situation. I had no idea where I was heading, and what I would find, but I had to change my mind, to change my view-point, to start getting influenced by new ideas and concepts.
What I realized is how disconnected mainstream programming languages may be with the data I was working with. That makes a natural first step to start my investigation. I turned my chair and started to stare at my bookshelves. Then, like the One Ring, there was this little Pink (really pink) book that was staring at me: Lambda-calcul types et mod[raw]è[/raw]les. I bought that books probably 10 years ago, then I forgot about it. I always found its cover page weird, and its color awkward. But, because of these uncommon features, I got attracted by it.
Code as Data. Data as Code.
Is this not what I was looking for? Could this not fill the void I was feeling? Is this not where my intuition was heading?
What if the “data” I manipulate is the same as the code I am writing? What if the data that I publish could be the code of a module of an application? What if writing code is no different than creating data? What if data could be self-aware of its own semantic? What if by evaluating data structures, I would validate that data at the same time? What if “parsing” my data is in fact evaluating the code of my application? What if I could reuse the tools and IDEs I use for programming, but for creating, editing and validating data? Won’t all these things make things simpler and make me even more productive to work with data?
My intuition tells me: yes!
We have a saying at Structured Dynamics: the right tool for the right job.
That seems to be the kind of tool I need to fill that void I was feeling. I had the feeling that the distinction between the code and the data should be as minimal as possible and homoiconic languages seems to be the right tool for that job.
Code as Data. Data as Code.
That is all good, but what does that really mean? What are the advantages and benefits?
That is the starting of a journey, and this is what we will discover in the coming weeks and months. Structured Dynamics is starting to invest resources into that new project. We choose to do our work using Clojure instead of other Lisp dialects such as Common Lisp. We choose Clojure for many reason: it is compiled in JVM bytecode. This means that you can re-use any of this code into any other Java applications and this also means that you can re-use any Java libraries natively into Clojure. But we also did use it because of its native way to handle concurrency and parallelism, its unique way to manage metadata within data structures, for its meta-programming capabilities using its macro system that enable us to create DSL, etc.
The goal was to create a new serialization format for RDF and to serialize RDF data as Clojure code. The intuition was that RDF data would then become an integral part of Clojure applications because the data would be the code as well.
The data would be self-aware of its own semantic, which means that by evaluating the Clojure “RDF” code it would also auto-validate itself using its embedded semantic. The RDF data would be in itself an [Clojure] application that would be self-aware of its own semantic and that would know how to validate itself.
That is the crux of my thinking. Then, how could this be implemented?
That is what I will cover in the coming weeks and months. We choose to use Clojure because it seems to be a perfect fit for that job. We will discover the reasons over time. However, the goal of these blog posts is to show how RDF can be serialized into [Clojure] code and the benefits of doing so. It is not about showing all the neat features of, and the wonderful minding behind Clojure. For that, I would strongly suggest you to get started with Clojure by reading the material covered in Tips for Clojure Beginners, and particularly to take a few hours to listen Rich Hickey’s great videos.