Using Clojure in Org-mode and Implementing Asynchronous Processing

I recently started to get interested in Org-mode which was still unknown to me just a few weeks ago until I read this great article from Howard Abrams about literate programming using Org-mode. Initially I was wondering what this Emacs package was really about (it does all kind of things like document outlining (à la Markdown), tasks management and planning, agenda generation, time clocking… and it has a series of features related to literal programming that let you embed and run code blocks using sub-processes and to display results directly into the Org-mode [text] document.

What I was really interested in are the code block related features of Org-mode. Initially I wanted to test Org-mode using as a Notebook application but I also wanted to re-start trying to coding in literate programming format. I will extend on the later in my next blogpost, for now I will concentrate on why I want to use Org-mode as notebook style programming user interfaces. Since everything I code these days is in Clojure programming language, I wanted to be able to use Org-mode’s code blocks with Clojure.

Finally I will describes a few issues I experimented in the process and how I update the Org-babel-clojure package to fix those issues.

Notebook Creations Using Org-mode

My partner Mike Bergman got me interested in notebook style programming user interfaces maybe a year ago. We wanted to find a way to easily experiment with different data management structures and frameworks we are developing at Structured Dynamics. The idea behind a Notebook was quite interesting: it is to run code snippets anywhere in a document, to see the results within that document and finally to be able to document the process. Then if something changed in the data, or in the code, then each code snipped within a Notebook could be rerun at any time, and the results updated. This is a great way to do experimentation, to keep tracks of the tests your are doing and to document the whole process.

The idea is really interesting for the kind of work we are doing. I tested the Gorilla REPL which is an implementation of this style user interface in Clojure. Other such interfaces exists in other programming languages like IPython, Wolfram, etc. However, I always had an issue with what I was using: I had a hard time re-purposing the content I was creating; I couldn’t easily export this information in different format (blog posts, papers, etc.). Saving, reloading, re-running in different environment was often too much trouble: until I find Org-mode.

I am not sure why I didn’t came across Org-mode before, maybe because it was not advertised as as “notebook style programming user interface” but this is really what it is (mostly) all about, at least to me. As far as I know, this is the only such software that let you work with any kind of programming language in the same notebook. It can also export the notebooks in virtually any formats (several formats are supported by Org-mode itself, others can be exported using Pandoc).

This being said, I started experimenting with Org-mode to create different kind of Notebooks using Clojure. I am using notebooks that shows how to use different APIs we are creating, or ones that shows how different data processing workflows actually works or that shows how some structures (like UMBEL) have been created and how they can be leveraged. I am also creating notebooks to research and experiment different kind of algorithms that we are trying to implement in our products, or to do bug investigation reports for our clients, or… the possibilities are probably endless. But the core idea is almost always the same: communication. We write these notebooks to communicate (write) information for other people to consume (or more important, his future self).

Given this kind of tasks that I am performing in a notebook, I often have to run procedures that may takes minutes or even hours before their processing is finalized. However, as you will see below, running procedures that takes minutes to finalize is a show stopper with the current Org-mode Org-babel-clojure (ob-clojure.el)= package that let Org-mode to run Clojure code.

Installing & Configuring Org-mode

Before outlining the issues I had with the current implementation of the Org-babel-clojure package, let me explain how I installed and configured Org-mode locally.

First of all I installed Org-mode contribs from ELPA, then I configured it that way in my .emacs file. Note that I made multiple little changes here and there to end-up with the kind of editor I am comfortable to use. So this is about installing, enabling and tweaking Org-mode in Emacs:

;; Configure Org-mode with Cider

;; Load Org-mode
(add-to-list 'load-path "~/.emacs.d/lib/org-mode/")
(require 'org)

;; Here I specify the languages I want to be able to use with Org-babel.
(org-babel-do-load-languages
 'org-babel-load-languages
 '((clojure . t)
   (sh . t)
   (emacs-lisp . t)))

;; Specify the Clojure back-end we want to use in Org-mode.
;; I personally use Cider, but one could specify Slime
(setq org-babel-clojure-backend 'cider)

;; Let's have pretty source code blocks
(setq org-edit-src-content-indentation 0
      org-src-tab-acts-natively t
      org-src-fontify-natively t
       org-confirm-babel-evaluate nil
      org-support-shift-select 'always)

;; Useful keybindings when using Clojure from Org
(org-defkey org-mode-map "\C-x\C-e" 'cider-eval-last-sexp)
(org-defkey org-mode-map "\C-c\C-d" 'cider-doc)      

(require 'cider)

;; Remove the markup characters, i.e., "/text/" becomes (italized) "text"
(setq org-hide-emphasis-markers t)

;; No timeout when executing calls on Cider via nrepl
(setq org-babel-clojure-nrepl-timeout nil)

;; Turn on visual-line-mode for Org-mode only
;; Note: you have to install "adaptive-wrap" from elpa
(add-hook 'org-mode-hook 'turn-on-visual-line-mode)

;; Enable Confluence export (or any other contributed export formats)
(require 'ox-confluence)

Note that most of these configurations comes from the Org-babel-clojure webpage.

Timeout issues

The first issue I encountered is when I started to run code that was taking longer than 10 seconds. Every time I was running such code, I ran into the follow error:

“nrepl-send-sync-request: Sync nREPL request timed out”

What this error means is the the synchronous request to nREPL (the Clojure back-end that run the actual code written into Org-mode) timeout. I was really not expecting a query to timeout that way. This led me to start reading the Org-babel-clojure code to see where such an error may be coming from. However, I have to do a disclaimer here: I never really looked into Elisp code until now. The only other work I did with Elisp was to configure Emacs so be indulgent with me and report all awkward code I may be writing here.

My journey started with the ob-clojure.el which is the file used to make the bridge between Org-mode and the Clojure back-end (Cider/nREPL in this case). It is after reading that code that I noticed the following function: org-babel-execute:clojure which appeared to be the thing that is run when we run a Clojure code block in Org-mode. Then I noticed the call to the function nrepl-sync-request:eval. That needed to be the culprit and what sent this Sync timeout error. I found this function in the Cider code. But then I found this other function that is called by the later: nrepl-send-sync-request. It is when I read this function that I noticed the nrepl-sync-request-timeout variable. Looking back at org-babel-execute:clojure I couldn’t see where I could define this timeout parameter. I looks like it was not possible to define it, which was a big issue to me since I needed to be able to run procedure that takes minutes to run.

It is at that time that I choose to hack the ob-clojure.el code to expose that timeout setting such that I could setup it properly for my own needs. The code I created for that purpose is:

; Addition of the org-babel-clojure-nrepl-timeout setting
(defvar org-babel-clojure-nrepl-timeout nil)

(defun org-babel-execute:clojure (body params)
  "Execute a block of Clojure code with Babel."
  (let ((expanded (org-babel-expand-body:clojure body params))
        result)
    (case org-babel-clojure-backend
      (cider
       (require 'cider)
       (let ((result-params (cdr (assoc :result-params params))))
         (setq result
               (nrepl-dict-get
                ; Addition of the org-babel-clojure-nrepl-timeout setting
                (let ((nrepl-sync-request-timeout org-babel-clojure-nrepl-timeout))
                  (nrepl-sync-request:eval
                   expanded (cider-current-connection) (cider-current-session)))
                (if (or (member "output" result-params)
                        (member "pp" result-params))
                    "out"
                  "value")))))
      (slime
       (require 'slime)
       (with-temp-buffer
         (insert expanded)
         (setq result
               (slime-eval
                `(swank:eval-and-grab-output
                  ,(buffer-substring-no-properties (point-min) (point-max)))
                (cdr (assoc :package params)))))))
    (org-babel-result-cond (cdr (assoc :result-params params))
      result
      (condition-case nil (org-babel-script-escape result)
        (error result)))))

What I modified in this code is to add a new global setting org-babel-clojure-nrepl-timeout. If this setting is nil then there won’t be any timeout, otherwise the timeout value will be in seconds. What I did is simply to bind its value to the nREPL setting nrepl-sync-request-timeout and be done with it.

That solved this issue. After I updated ob-clojure.el accordingly, I could run Clojure code that may takes several minutes in Org-mode! That was fanstastic, but it was not optimal.

In fact, when I am running workflows that may take 30 minutes to finalize, I normally output processing steps in the REPL such that I know where the process is and what it is currently processing.

The problem with the current implementation of Org-babel-clojure is that it uses the synchronous API of the nREPL. What I want is to be able to run Clojure code asynchronously such that I can get some feedbacks (via the REPL) from the procedure I am running. This opened a kind of a Pandora box, and something that looked simple turned out to be more complex than anticipated for someone without any knowledge into Elisp, internal mechanisms and APIs of Emacs.

Making Org-babel-clojure “Asynchrone”

The next goal I had is to try to make Org-babel-clojure asynchrone. What I wanted is to be able to get, somehow, was the output of a Clojure procedure when that procedure was outputing something to the REPL. My second journey started after reading John Kitchin’s blog post about Asynchronously running Python code into Org-mode code blocks. What I found out is that Python code was run via a sub-process which run the Python interpreter. John’s solution was to use a local file to write what the interpreter is outputing and then to feed that output to a new window that got created by John’s function.

I took that example as a given, and then I tried to implement the same solution, but for Clojure (without knowing what I was really doing). It is in this process that I found that the Clojure solution to that problem would be quite different than John’s. There is an asynchronous API in nREPL, it is just that it is not used in Org-babel-clojure. What I ended-up using from John’s example is not his code, but his core idea: using a new window to output the asynchrone process and then to kill it once the processing is finalized and before populating #+RESULSTS section of the Org-mode file.

After much testing and debugging I ended-up with the following solution to my problem:

(defun org-babel-execute:clojure (body params)
  "Execute a block of Clojure code with Babel."
  (lexical-let* ((expanded (org-babel-expand-body:clojure body params))
                 ; name of the buffer that will receive the asyn output
                 (sbuffer "*Clojure Sub Buffer*")
                 ; determine if the :async option is specified for this block
                 (async (if (assoc :async params) t nil))
                 ; generate the full response from the REPL
                 (response (cons 'dict nil))
                 ; keep track of the status of the output in async mode
                 status
                 ; result to return to Babel
                 result)
    (case org-babel-clojure-backend
      (cider
       (require 'cider)
       (let ((result-params (cdr (assoc :result-params params))))
         ; Check if the user want to run code asynchronously
         (when async
           ; Create a new window with the async output buffer
           (switch-to-buffer-other-window sbuffer)

           ; Run the Clojure code asynchronously in nREPL
           (nrepl-request:eval
            expanded 
            (lambda (resp) 
              (when (member "out" resp)
                ; Print the output of the nREPL in the asyn output buffer
                (princ (nrepl-dict-get resp "out") (get-buffer sbuffer)))
              (nrepl--merge response resp)
              ; Update the status of the nREPL output session
              (setq status (nrepl-dict-get response "status")))
            (cider-current-connection) 
            (cider-current-session))

           ; Wait until the nREPL code finished to be processed
           (while (not (member "done" status))
             (nrepl-dict-put response "status" (remove "need-input" status))
             (accept-process-output nil 0.01)
             (redisplay))

           ; Delete the async buffer & window when the processing is finalized
           (let ((wins (get-buffer-window-list sbuffer nil t)))
             (dolist (win wins)
               (delete-window win))
             (kill-buffer sbuffer))

           ; Put the output or the value in the result section of the code block
           (setq result (nrepl-dict-get response 
                                        (if (or (member "output" result-params)
                                                (member "pp" result-params))
                                            "out"
                                          "value"))))
         ; Check if user want to run code synchronously
         (when (not async)
           (setq result
                 (nrepl-dict-get
                  (let ((nrepl-sync-request-timeout 
                         org-babel-clojure-nrepl-timeout))
                    (nrepl-sync-request:eval
                     expanded (cider-current-connection) (cider-current-session)))
                  (if (or (member "output" result-params)
                          (member "pp" result-params))
                      "out"
                    "value"))))))
      (slime
       (require 'slime)
       (with-temp-buffer
         (insert expanded)
         (setq result
               (slime-eval
                `(swank:eval-and-grab-output
                  ,(buffer-substring-no-properties (point-min) (point-max)))
                (cdr (assoc :package params)))))))
    (org-babel-result-cond (cdr (assoc :result-params params))
      result
      (condition-case nil (org-babel-script-escape result)
        (error result)))))

The first thing this code does, is to expose a new #+BEGIN_SRC option called :async. If the new :async option is specified in a block code for the Clojure language, then that code block will be processed asynchronously. What this means is that a new window will be created in Emacs, it will be populated with anything that is outputted to the REPL and then it will be closed once the processing will be finalized.

Here is an example of a code block that would use that new option:

#+BEGIN_SRC clojure :results output :async

(dotimes [n 10]
  (println n ".")
  (Thread/sleep 500))

#+END_SRC

This code would output “1. 2.” etc into a new window and would close that window when it reaches 10 and then populate the #+RESULTS section with the output of the code.

This code works with the :results options output, value and silent. If output is specified, then everything that was outputted into the window will be added into the results section of the code block. If value is specified, then all output will still be displayed into the window, but only the resulting value will be added to the results section of the code block. If silent is specified, then all the output will still be displayed into the window, but nothing will be displayed in the results section of the code block.

If the :async is omitted, then the normal behavior of Org-babel-clojure will be used, with the new timeout setting org-babel-clojure-nrepl-timeout.

Call for help!

As I mentioned above, this is my attempt in coding something for Emacs using Elisp. There are certainly things that should be done differently. So if you have any Elisp and/or Cider/nREPL knowledge, and if you have some time to review this code, I am sure we could improve the usage of this function. The only thing I know is that such asynchronous capabilities of the Clojure code blocks is essential.

There is one major area of improvement that I noted. Right now, the results comes asynchronously, but we still can’t use the Emacs instance to do other things (like writing in the Org-mode file while the process is running in background and results reported in this other buffer. Until this other issue is resolved, I don’t think we can say that this really makes Org-babel-clojure really 100% asynchronous. If this can be done (I did not have time to look into this yet), then I think the :async feature would be fully and properly integrated, but I am not yet sure if this is possible.

 Sources

For the ones interested in this update of Org-babel-clojure, here is:

  • The Org-mode file of this blogpost which you can run to test the updated org-babel-execute:clojure function
  • The diff file if you want to update your local ob-clujure.el file

Investigating Options to Serialize RDF data as Clojure Code

My initial intuition is that I could serialize RDF data into Clojure code where the OWL semantic of the RDF data is embedded, in some ways, into that code. I want to test how the general saying of homoiconic languages: Data as Code. Code as Data, fits with RDF & OWL.

Another intuition I have is the concept of Portable Data: stateful RDF data which embed its own semantic and which doesn’t rely on external (mostly stateless since we can rarely rely on their stated versions) ontologies. My intuition is that it would be possible to serialize RDF data in such a way that it would be self-aware of its own semantic which means that it would know how it can be interpreted, how it can be used, and how it should be validated. The idea is to end-up with Portable Data snippets that could be exchanged between systems without requiring prior, or post, schemas (ontologies) to interpret that information. Then web service endpoints such as OSF, or any other kind of applications, could emit such Portable Data structures without requiring any subsequent ontologies analysis from their part.

However, before being able to implement and demonstrate these intuitions, the first step is to check what such a RDF serialization may looks like. This is the goal of this blog post.

Serializing RDF Data as Clojure Code

Where to start? There are probably multiple ways to do that. Do we want to do that using a map, a structmap, a records, or…? What I wanted to use (at least initially) is a basic data structure that would give me the flexibility I need to serialize RDF data. I wanted a core structure such that existing Clojure developers could easily manipulate it using the existing Clojure functions and techniques that they are used to use.

The collection I choose to start with is the map. This key/value pair structure is ideal for serializing RDF data. It looks like JSON code, but is even simpler since it doesn’t require commas nor colons in its syntax.

The crux of the map structure is that in a map, the keys can be: keywords, symbols, strings, characters, booleans and numbers. The only things it cannot be are regular expressions and the nilvalue. What should be stated here is that symbols can be a lot of different things. They are names for vars, functions, etc.

This opens a World of possibilities to serialize RDF data as Clojure code. In fact, the keys of the map can virtually be anything: and this is just too nice to be true!

What we will investigate in the remaining of this blog posts are different ways to serialize RDF data as Clojure code. These are the initial tests I did to test my intuitions. All of them works, but only the last one really opens-up a World of possibilities and that enables me to implement my early intuitions.

Quick Introduction to RDF Data

RDF is nothing else than a bunch of triples of the form:

  • <subject> <predicate> <object>

Where the <subject> is the thing (resource, record, entity, etc.) being described, where the <predicate> is the property (attribute, etc.) that describes the subject and where the <object> is the value of the predicate which can be a reference to another subject, a literal value, etc.

Each <subject> do have at least a type. A type is nothing else than a class of things which is defined in a RDFS schema or a OWL ontology.

Then if you wire these triples together, you get a directed graph which we often refer to as a datataset. It is as simple as that. However, I won’t state that RDF is necessarily simple, since its expressivity (a double-edged sword) can make things much more complex.

The semantic of the data lies into the <predicate> and the type. It is the predicate and the type that tell us how to interpret, and use, the data. It is what is used to validate the data for example. That is exactly where Clojure, and its map structure, can help us to create this kind of portable data.

As you will see below, the serialization of RDF data as Clojure code looks like the structJSON RDF serialization format developed by Structured Dynamics and used at the core of the Open Semantic Framework. It is not a coincidence since that simple structure has been highly effective to serialize and transmit RDF information between OSF web services and other applications such as OSF for Drupal and other JavaScript applications.

Leveraging Serialization’s Hierarchy to Create Triples

Before jumping into Clojure, let’s take a quick look at a really simple structJSON record. What I want to show you is how triples can be extracted from such a data structure. It is the same principle that will be used to extract triples from the Clojure serialization:

[cc lang=’javascript’ line_numbers=’false’]
[raw] “subject”: [
{
“uri”: “http://dataset1.com/record-a/”,
“predicate”: [
{
“rdfs:type”: “http://umbel.org/umbel/rc/Person”
},
{
“iron:prefLabel”: “Bob”
},
{
“foaf:knows”: {
“uri”: “http://dataset2.com/record-b/”
}
}
}
][/raw]
[/cc]

What we leverage here to extract triples is the hierarchy nature of the serialization. Here the "subject" key introduce an array of objects. Each object has a "uri" key which is the identifier (<subject> of a triple). Then the "predicate" key introduces a series of attributes for that record. Each element of the array is a predicate with the key is the prefixed version of the RDF <predicate>. Then you have a value for each of these predicate keys. If you read the documentation, you will see that you can get to another level called the reification of that triple (don’t confuse with Clojure’s reification mechanism) that is used to define extra information related to a triple statement. That structJSON code would produce the following ntriples:

[cc lang=’text’ line_numbers=’false’]
[raw]http://dataset1.com/record-a/ rdfs:type http://umbel.org/umbel/rc/Person .
http://dataset1.com/record-a/ iron:prefLabel “Bob” .
http://dataset1.com/record-a/ foaf:knows http://dataset2.com/record-b/ .[/raw]
[/cc]

Serializing RDF using Maps and Keywords Keys

The most intuitive way to serialize RDF data as Clojure maps would be to create a map where all the keys are keywords. An initial test would be:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def resource {:uri “http://foo.com/1”
:rdf/type [foaf/Person owl/Thing]
:iron/prefLabel {:value “Fred”
:lang nil
:datatype xsd/string}
:foaf/knows [{:uri “http://foo.com/2”
:rei [{:iron/prefLabel [{:value “Bob”
:lang “en”}
{:value “Robert”
:lang “fr”}]}]}
{:uri “http://foo.com/3”
:rei [:iron/prefLabel “Mike”]}]})[/raw]
[/cc]

What we did here is to define a map with the symbol resource. This map is composed of a series of keys and values where the keys are keywords, and were the values can be strings, vectors or maps. The basic serialization rules are:

  • Each map has a :uri key that define the URI of the resource being described
  • Each key is a namespaced key where the root of the namespace is the prefix of the ontology where the <predicate> or type is defined
  • If the predicate is a owl:DatatypeProperty, then its value can be:
    • A vector with one or multiple map and/or string
    • A map which can have four keys:
      • :value which specify the actual string value
      • :lang which specify the language of that string
      • :datatype which specify the datatype of the string
      • :rei which specify reification statements for the triple
    • A string which is the actual value without any additional information about that Literal
  • If the predicate is a owl:ObjectProperty, then its value can be:
    • A vector with one or multiple map, string and/or symbol
    • A map which can have two keys:
      • :uri which specify the actual URI of the referenced resource
      • :rei which specify reification statements for the triple
    • A string which represent the URI of the resource to be referenced
    • A symbol which represents the URI string of the resource to be referenced

Namespacing Keywords

One of the important notion is that the keywords used as map keys are namespaced. This means that they are defined, and live, in their own namespace. This is an essential requirement for a RDF serialization since we re-use multiple ontologies that may share the same name for some of the predicates and that we don’t want these keywords to clash. That is why that by convention we do create each of these keywords in their respective ontology’s namespace. An ontology namespace is defined as the prefix used to refer to the ontology (for example, the Bibliographic Ontology‘s prefix is bibo, so :bibo/shortTitle would be the key referring to the property http://purl.org/ontology/bibo/shortTitle).

Usage

Now let see how we can work with such a structure in Clojure:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Return the values of the rdf/type property
(:rdf/type resource)
(resource :rdf/type)
(get resource :rdf/type)

;; Return all the properties that describes the resource
(keys resource)

;; Get the URI of the first person known by Fred
(:uri (first (:foaf/knows resource)))

;; Get the French name of the first person known by Fred
(:value (second (:iron/prefLabel (first (:rei (first (:foaf/knows resource)))))))

;; Update the name of Fred to Frederick
(update-in resource [:iron/prefLabel :value] str “erick”)

;; Output the difference betweeen the original resource and the updated one
(diff resource (update-in resource [iron/prefLabel value] str “erick”))

;; Find the value of a key
(find resource iron/prefLabel)

;; Select values of multiple keys
(select-keys resource [iron/prefLabel foaf/knows])

;; Merge a resource into another resource. The URI and properties of the later resource are kept into the merged resource
(def res-1 {uri “http://foo.com/datasets/test/1”
rdf/type owl/Thing
iron/prefLabel “Preferred Label”})

(def res-2 {uri “http://foo.com/datasets/test/2”
rdf/type owl/Thing
iron/altLabel “Alternative Label”})

(merge res-1 res-2)[/raw]
[/cc]

That is all good and easy. We use Clojure’s core functions and mechanism to easily manipulate RDF data into our application.

However, is this implementing the intuitions I started with? Definitely not. This is more like a conventional serialization format for RDF just like structJSON. The thing here is that if we want to do any kind of validation on this data, if we want the data to be self-aware of its own semantic, then it is not possible when keys are keywords. We would need external mechanisms to create that map structure, then to check what it refers to (the properties, the types, etc.). And then we would have to look them up into their respective ontologies and finally we would have to validate the data structure according to what these ontologies are saying by re-processing that map structure.

This is not quite what I had in mind and what my intuition was telling me.

Serializing RDF using Maps and Symbol Keys

Let push this idea further. What if the keys of the map that represent our RDF data are not keywords, but symbols? Symbols in Clojure name things like vars, functions, etc. Initially, let’s use symbols that refers to the URI (string) of the <predicate> and the types.

The serialization would look like:

[cc lang=’lisp’ line_numbers=’false’]
[raw](def resource {uri “http://foo.com/1”
rdf/type [foaf/Person owl/Thing]
iron/prefLabel {value “Fred”
datatype xsd/string}
foaf/knows [{uri “http://foo.com/2”
rei [{iron/prefLabel [{value “Bob”
lang “en”}
{value “Robert”
lang “fr”}]}]}
{uri “http://foo.com/3”
rei [iron/prefLabel “Mike”]}]})[/raw]
[/cc]

Now our resource is defined with the same structure, except that the keys are actual symbols. In this second iteration, we will consider that the symbols we defined here are representing a string which is the URI of the predicates or the types.

The real advantage of using symbol over keywords for what we are doing with these RDF serialization is that a symbol can:

  • Have a docstring
  • Have meta-data
  • The evaluation of the symbol will results into getting the actual full URI of the predicates/types

These are obvious enhancements over using keywords. First, by being able to define docstrings, which means that we will be able to document these properties and types such that Clojure IDEs can display the documentation of these symbols while you are writing/editing RDF data in Clojure.

Clojure’s meta-data system will be highly leveraged in the final candidate serialization format that I will cover in another blog post, so I won’t discuss it further for the moment.

Finally, once we evaluate such a map, we get the map along with all the evaluated properties/types which refers to their full URI. The evaluation of such as structure [(eval resource)] looks like:

[cc lang=’lisp’ line_numbers=’false’]
[raw]{uri “http://foo.com/1”, “http://www.w3.org/1999/02/22-rdf-syntax-ns#type” [“http://xmlns.com/foaf/0.1/Person” “http://www.w3.org/2002/07/owl#Thing”], “http://purl.org/ontology/iron#prefLabel” {value “Fred”, datatype “http://www.w3.org/2001/XMLSchema#string”}, “http://xmlns.com/foaf/0.1/knows” [{uri “http://foo.com/2”, rei [{“http://purl.org/ontology/iron#prefLabel” [{value “Bob”, lang “en”} {value “Robert”, lang “fr”}]}]} {uri “http://foo.com/3”, rei [“http://purl.org/ontology/iron#prefLabel” “Mike”]}]}[/raw]
[/cc]

As you can see, we can get the full description of this resource with the full expansion of the URIs referenced by the symbols.

The same parsing rules defined in the previous section applies for this new format that uses symbols instead of keys. The same comments regarding namespaces applies here too.

The usage is nearly identical except that a symbol is not a function like the keys which means that you cannot get the value of a key like this when the key is a symbol:

[cc lang=’lisp’ line_numbers=’false’]
[raw](rdf/type resource)[/raw]
[/cc]

What you have to do is to access that using one of the following two methods:

[cc lang=’lisp’ line_numbers=’false’]
[raw](resource rdf/type)
(get resource rdf/type)[/raw]
[/cc]

Even if we improved upon using keywords as keys for the map, we still don’t have any kind of embedded semantic or auto-validation capabilities as my intuition was telling me. It remains the same kind of structure without much significant improvements.

Serializing RDF using Maps and Symbol Keys Referring to Functions

Let’s change our mind, and let evolve this idea of symbols: what if the symbols we define in the map are actually functions instead of strings?

What!?!?

A function could be the key of a map in Clojure?

Well not directly, but yes. In Clojure symbols are naming different things such as functions. This is quite an important feature of Clojure: it makes the distinction between how things are named, and these actual things.

This means that what is really used as keys in our map structure is a symbol. However, that symbol happen to refer to a function. So it is not the function itself that is used as a key, but the actual thing that refers to it which is the symbol.

However, the result is the same: if we evaluate the map, we will get a series of symbols that evaluates to functions. That is exactly what we were looking for: that little gem, hanging around, just waiting to be picked-up.

This opens an overwhelming number of possibilities. This means that we have a data structure that can be evaluated to a series of functions and that can be executed. That is exactly what should enable us to define that Portable [RDF] Data serialization format.

That means that we won’t only be able to define RDF triples as Clojure code, but that we could even execute that Clojure code to do different things with the data, such as auto-validating itself, etc.

Finally, what if we consider RDF predicates as Clojure functions? Predicates have all kind of properties and semantics. They can be specified to be used to describe only certain kind of resources, or to refer to specific type of values. Predicates can be symmetric, functional, transitive, etc. What if we simply implement these characterics as Clojure functions? This is what this whole thing is mean to be. When evaluating and “running” that RDF map structure, we would simply execute these functions that define the semantic and characteristics of these predicates. That is exactly where lies my intuitions: we would end-up with a RDF serialization format that “embed” it own semantic and that can be used to self-validate itself by executing the structure. That is what I would refer to as Portable Data: stateful data with embedded stateful semantic.

The initial version of this other revision of the RDF serialization as Clojure code will be outlined in the next blog post since its discussion warrant a full blog post in itself. However I think that you can start understanding where I am heading with these intuitions and why I am using Clojure to test them.

Once an initial version of this serialization will be outlined, we will see how it can be used, what are the benefits, how the idea of Portable Data could be leveraged, how it can help creating and managing data using traditional IDEs such as Emacs. Once the basis will be outlined, we will have all the leisure to explore the benefits of this concept.

My Optimal GNU Emacs Settings for Developing Clojure (so far)

Note: this blog post has been revised with this other blog post.

In the coming months, I will start to publish a series of blog posts that will explain how RDF data can be serialized in Clojure code and more importantly what are the benefits of doing this. At Structured Dynamics, we started to invest resources into this research project and we believe that it will become a game changer regarding how people will consume, use and produce RDF data.

But I want to take a humble first step into this journey just by explaining how I ended up configuring Emacs for working with Clojure. I want to take the time to do this since this is a trials and errors process, and that it may be somewhat time-consuming for the new comers.

Light Table

Before discussing how I configured Emacs, I want to introduce you to the new IDE: Light Table. This new IDE is mean to be the next generation of code editor. If you are new to Clojure, and more particularly if you never used Emacs before, I would strongly suggest you to start with this code editor. It is not only simple to use, but all the packages you will require to work with Clojure are already built-in.

As you may know, GNU Emacs has been developed using Emacs Lisp (a Lisp dialect). This means that it can be extended by installing and enabling  packages, all configurations options and behaviors can be changes, and even while it is running! Light Table is no different. It has been developed in ClosureScript, and it can be extended the same way. To me, the two real innovations with Light Table are:

  • The instarepl
  • The watches

The instarepl is a way to evaluate the value of anything, while you are coding, directly inline in your code. This is really powerful and handy when prototyping and testing code. Every time you type some code, it get evaluated in the REPL, and displayed inline in the code editor.

The watches are like permanent instarepl that you place within the code. Then every time the value changes, you see the result in the watch section. This is really handy when you have to see the value of some computation while the application, or part of the application, are running. You get a live output of what is being computed, directly into your code.

The only drawback I have with LightTable is that there is no legacy REPL available (yet?). This means that if you want to evaluate something unrelated to your code, you have to write the code directly into the editor and then evaluate it with the instarepl. Another issue regarding some use cases is that the evaluation of the code can become confusing like when you define a Jetty server in your code. Since everything get evaluated automatically (if the live mode is enabled) then it can start the server without you knowing it. Then to stop it, you have to write a line of code into your code and then to evaluate it to stop the server.

Because of the nature of my work, I am a heavy user of multiple monitors (daily working with six monitors). This means that properly handling multiple monitors is essential to my productivity. That is another issue I have with LightTable: you can create new windows that you can move to other monitors, but these windows are unconnected: they are different instances of LightTable.

Simple is beautiful, and it is why I really do like LightTable and why I think it is what beginners should use to start working with Clojure. However, it is not yet perfect for what I have to do. That is why I choose to use GNU Emacs for my daily work.

GNU Emacs

I don’t think that GNU Emacs needs any kind of introduction. It is heavy, it is unnatural, it takes time to get used to, the learning curve is steep, but… hell it is powerful for working with Lisp dialects like Clojure!

The problem with Emacs is not just to learn the endless list of key bindings (even if you can go a long way with the core ones), but also to configure it for your taste. Since everything can be configured, and that there exists hundred of all kind of packages, it takes time to configures all the options you want, and all the modules you require. This is the main reason I wrote this blog post: to share my (currently) best set of configuration options and packages for using Emacs for developing with Clojure.

I am personally developing on Windows 8, but these steps should be platform agnostic. You only have to download and install the latest GNU Emacs 24 version.

The first thing you have to do is to locate you .emacs file. All the configurations I am defining in this blog post goes into that file.

Packages

Once Emacs is installed, the first thing you have to do is to install all the packages that are required to develop in Clojure or that will make your life easier for handling the code. The packages that you have to install are:

  • cider
    • Clojure Integrated Development Environment and REPL – This is like Slime for Common Lisp. This is what turns Emacs into a Clojure IDE
    • Important note: make sure that the Cider version you are installing is coming from the MELPA-Stable repository, and not the MELPA one. At the time of the publication of the blogpost, the latest stable release is 0.6.
  • clojure-mode
    • Major mode for Clojure code
  • clojure-test-mode
    • Minor mode for Clojure tests
  • auto-complete
    • Auto Completion for GNU Emacs – This is what is used to have auto-completion capabilities into your code and in the mini-buffer
  • ac-nrepl
    • Auto-complete sources for Clojure using nrepl completions – This is what is used to add auto-completion capabilities to the NREPL
  • paredit
    • minor mode for editing parentheses  -*- Mode: Emacs-Lisp -*- – This is what will do all the Lisp like code formatting (helping you managing all these parenthesis)
  • popup
    • Visual Popup User Interface – This is what will enable popup contextual menus when using auto-completion in your code and in the NREPL
  • raindow-delimiters
    • Highlight nested parens, brackets, braces a different color at each depth – This is really handy to visually see where you are with your parenthesis. An essential to have
  • rainbow-mode
    • Colorize color names in buffers

Before installing them, we have to tell Emacs to use the Marmelade packages repository where all these packages are hosted and ready to the installed into your Emacs instance. At the top of your .emacs file, put:

[cc lang=’lisp’ line_numbers=’false’]
[raw](require ‘package)

(add-to-list ‘package-archives
‘(“melpa-stable” . “http://melpa-stable.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“melpa” . “http://melpa.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“marmalade” . “http://marmalade-repo.org/packages/”))

;; Initialize all the ELPA packages (what is installed using the packages commands)
(package-initialize)[/raw]
[/cc]

Important note: only use the MELPA repository if you want to install non-stable modules such as the Noctulix theme. If you are not expecting using it, then I would strongly suggest you to remove it and only to keep the MELPA-Stable repository in that list.

If you are editing your .emacs file directly into Emacs, and you can re-evaluate the settings file using Emacs, then by moving cursor at each top-level expression end (after closing parenthesis) and press C-x C-e. However, it may be faster just to close and restart Emacs to take the new settings into account. You can use any of these methods for the following set of settings changes.

Before changing any more settings, we will first install all the required packages using the following sequence of commands:

  • M-x package-install [RET] cider [RET]
  • M-x package-install [RET] clojure-mode [RET]
  • M-x package-install [RET] clojure-test-mode[RET]
  • M-x package-install [RET] auto-complete[RET]
  • M-x package-install [RET] ac-nrepl [RET]
  • M-x package-install [RET] paredit[RET]
  • M-x package-install [RET] popup [RET]
  • M-x package-install [RET] rainbow-delimiters [RET]
  • M-x package-install [RET] rainbow-mode [RET]

Additionally, you could have used M-x package-list-packages, then move your cursor in the buffer to the packages’ line. Then press i (for install) and once all the packages are selected, you could have press x (execute) to install all the packages all at once.

In the list of commands above, M-x is the “meta-key” normally bound to the left Alt key on your keyboard. So, M-x usually means Alt-x.

Now that all the packages are installed, let’s take a look at how we should configure them.

Configuring Keyboard

If you are using an English/US keyboard, you can skip this section. Since I use a French Canadian layout (On an English/US Das Keyboard!), I had multiple issues to have my keys working since all the binding changed in Emacs. To solve this problem, I simply had to define that language configuration option. Then I had to start using the right Alt key of my keyboard to write my brackets, curly brackets, etc:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Enable my Canadian French keyboard layout
(require ‘iso-transl)[/raw]
[/cc]

Configuring Fonts

Since I am growing older (and that I have much screen estates with six monitors), I need bigger fonts. I like coding using Courier New, so I just configured it to use the font size 13 instead of the default 10:
[cc lang=’lisp’ line_numbers=’false’]
[raw];; Set bigger fonts
(set-default-font “Courier New-13”)[/raw]
[/cc]

Cider and nREPL

The next step is to configure Cider and the nREPL which are the two pieces that turns Emacs into a wonderful Clojure IDE:

[cc lang=’lisp’ line_numbers=’false’]
[raw](add-hook ‘clojure-mode-hook ‘turn-on-eldoc-mode)
(setq nrepl-popup-stacktraces nil)
(add-to-list ‘same-window-buffer-names “nrepl“)[/raw]
[/cc]

Auto-completion

The next step is to configure the auto-completion feature everywhere in Emacs: in any buffer, nREPL or in the mini-buffer. Then we want the auto-completion to appear in a contextual menu where the docstrings (documentation) of the functions will be displayed:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; General Auto-Complete
(require ‘auto-complete-config)
(setq ac-delay 0.0)
(setq ac-quick-help-delay 0.5)
(ac-config-default)

;; ac-nrepl (Auto-complete for the nREPL)
(require ‘ac-nrepl)
(add-hook ‘cider-mode-hook ‘ac-nrepl-setup)
(add-hook ‘cider-repl-mode-hook ‘ac-nrepl-setup)
(add-to-list ‘ac-modes ‘cider-mode)
(add-to-list ‘ac-modes ‘cider-repl-mode)[/raw]
[/cc]

Popping Contextual Documentation At Any Time

What is really helpful is to be able to pop the documentation for any symbol at any time just by pressing a series of keys. What need to be done is to configure Cider & ac-nrepl to bind this behavior to the C-c C-d sequence of keys:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Poping-up contextual documentation
(eval-after-load “cider”
‘(define-key cider-mode-map (kbd “C-c C-d”) ‘ac-nrepl-popup-doc))[/raw]
[/cc]

Par Edit

Par Edit is the package that will help you out automatically formatting you Clojure code. It will balance the parenthesis, automatically indenting your S-expressions, etc.

[cc lang=’lisp’ line_numbers=’false’]
[raw](add-hook ‘clojure-mode-hook ‘paredit-mode)[/raw]
[/cc]

Show Parenthesis Mode

Another handy feature is to enable, by default, the show-parent-mode configuration option. That way, every time the cursor points to a parenthesis, the parent parenthesis will be highlighted into the user interface. This is an essential most-have with Par Edit:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Show parenthesis mode
(show-paren-mode 1)[/raw]
[/cc]

Rainbow Delimiters

Another essential package to have to help you out maintaining these parenthesis. The rainbow delimiters will change the color of the parenthesis depending on how “deep” they are into the structure. Another essential visual cue:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; rainbow delimiters
(global-rainbow-delimiters-mode)[/raw]
[/cc]

Noctilux Theme

Did I say that I like LightTable? In fact, I really to like their dark theme. It is the best I saw so far. I never used any in my life since I never liked any of them. But that one is really neat, particularly to help visualizing Clojure code. That is why I really wanted to get a LightTable theme for Emacs. It exists and it is called Noctilux and works exactly the same way with the same colors.

If you want to install it, you can get it directly from the packages archives. Type M-x package-list-packages then search and install noctilux-theme.

Then enable it by adding this setting:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; Noctilus Theme
(load-theme ‘noctilux t)[/raw]
[/cc]

Binding Some Keys

Then I wanted to bind some behaviors to the F-keys. What I wanted is to be able to run Cider, to be able to start and stop Par Edit and to switch frames (windows within monitors) in a single click. I also added a shortkey for starting speedbar for the current buffer, it is an essential for managing project files. What I did is to bind these behaviors to these keys:

[cc lang=’lisp’ line_numbers=’false’]
[raw](global-set-key [f8] ‘other-frame)
(global-set-key [f7] ‘paredit-mode)
(global-set-key [f9] ‘cider-jack-in)
(global-set-key [f11] ‘speedbar)[/raw]
[/cc]

Fixing the Scroll

There is one thing that I really didn’t like, and it was the default behavior of the scrolling of Emacs on Windows. After some searching, I found the following configurations that I could fix to have a smoother scrolling behavior on Windows:

[cc lang=’lisp’ line_numbers=’false’]
[raw];; scroll one line at a time (less “jumpy” than defaults)

(setq mouse-wheel-scroll-amount ‘(1 ((shift) . 1))) ;; one line at a time

(setq mouse-wheel-progressive-speed nil) ;; don’t accelerate scrolling

(setq mouse-wheel-follow-mouse ‘t) ;; scroll window under mouse

(setq scroll-step 1) ;; keyboard scroll one line at a time[/raw]
[/cc]

Complete Configuration File

Here is the full configuration file that I am using:

[cc lang=’lisp’ line_numbers=’false’]
[raw](require ‘package)

(add-to-list ‘package-archives
‘(“melpa-stable” . “http://melpa-stable.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“melpa” . “http://melpa.milkbox.net/packages/”))

(add-to-list ‘package-archives
‘(“marmalade” . “http://marmalade-repo.org/packages/”))

;; Initialize all the ELPA packages (what is installed using the packages commands)
(package-initialize)

;; Enable my Canadian French keyboard layout
(require ‘iso-transl)

;; Set bigger fonts
(set-default-font “Courier New-13”)

;; Cider & nREPL
(add-hook ‘clojure-mode-hook ‘turn-on-eldoc-mode)
(setq nrepl-popup-stacktraces nil)
(add-to-list ‘same-window-buffer-names “nrepl“)

;; General Auto-Complete
(require ‘auto-complete-config)
(setq ac-delay 0.0)
(setq ac-quick-help-delay 0.5)
(ac-config-default)

;; ac-nrepl (Auto-complete for the nREPL)
(require ‘ac-nrepl)
(add-hook ‘cider-mode-hook ‘ac-nrepl-setup)
(add-hook ‘cider-repl-mode-hook ‘ac-nrepl-setup)
(add-to-list ‘ac-modes ‘cider-mode)
(add-to-list ‘ac-modes ‘cider-repl-mode)

;; Popping-up contextual documentation
(eval-after-load “cider”
‘(define-key cider-mode-map (kbd “C-c C-d”) ‘ac-nrepl-popup-doc))

;; paredit
(add-hook ‘clojure-mode-hook ‘paredit-mode)

;; Show parenthesis mode
(show-paren-mode 1)

;; rainbow delimiters
(global-rainbow-delimiters-mode)

;; Noctilus Theme
(load-theme ‘noctilux t)

;; Switch frame using F8
(global-set-key [f8] ‘other-frame)
(global-set-key [f7] ‘paredit-mode)
(global-set-key [f9] ‘cider-jack-in)
(global-set-key [f11] ‘speedbar)

;; scroll one line at a time (less “jumpy” than defaults)
(setq mouse-wheel-scroll-amount ‘(1 ((shift) . 1))) ;; one line at a time
(setq mouse-wheel-progressive-speed nil) ;; don’t accelerate scrolling
(setq mouse-wheel-follow-mouse ‘t) ;; scroll window under mouse
(setq scroll-step 1) ;; keyboard scroll one line at a time[/raw]
[/cc]

Conclusion

Now that we have the proper development environment in place, the next blog posts will really get into the heart of the matter: what are the different ways to serialize RDF data in Clojure code, how the generated code can be used, what are the benefits, how it changes the way that data (RDF in this case, but really any data) can be produced and consumed.

We think that there are profound implications into how we, as Semantic Web specialists, will work with data instances and ontologies in the future. The initial project that will embed and benefit from these new principles and techniques will be the next version of the UMBEL ontology.

Final note: there are an endless list of features and packages for Emacs. Obviously, I don’t know all of them, so if you are aware of any settings or packages that I missed here and that could improve this setup, please share them in the comments.