Creating and Running Unit Tests Directly in Source Files with Org-mode

Developing a computer program is not an easy task. The process needs a constant focus. Any interruption of that process means that time is spent re-focusing on it and errors are more prone to be introduced. The process involves analyzing a problem to solve by executing a series of steps. It involves writing about the problem we are trying to solve and writing about the solution we found to solve it. Finally it involves writing test procedures that can run to make sure that the current state of the implementation of the solution behaves as expected and that expected behavior is not altered by subsequent modifications.

Continue reading “Creating and Running Unit Tests Directly in Source Files with Org-mode”

Using Clojure in Org-mode and Implementing Asynchronous Processing

I recently started to get interested in Org-mode which was still unknown to me just a few weeks ago until I read this great article from Howard Abrams about literate programming using Org-mode. Initially I was wondering what this Emacs package was really about (it does all kind of things like document outlining (à la Markdown), tasks management and planning, agenda generation, time clocking… and it has a series of features related to literal programming that let you embed and run code blocks using sub-processes and to display results directly into the Org-mode [text] document.

What I was really interested in are the code block related features of Org-mode. Initially I wanted to test Org-mode using as a Notebook application but I also wanted to re-start trying to coding in literate programming format. I will extend on the later in my next blogpost, for now I will concentrate on why I want to use Org-mode as notebook style programming user interfaces. Since everything I code these days is in Clojure programming language, I wanted to be able to use Org-mode’s code blocks with Clojure.

Finally I will describes a few issues I experimented in the process and how I update the Org-babel-clojure package to fix those issues.

Notebook Creations Using Org-mode

My partner Mike Bergman got me interested in notebook style programming user interfaces maybe a year ago. We wanted to find a way to easily experiment with different data management structures and frameworks we are developing at Structured Dynamics. The idea behind a Notebook was quite interesting: it is to run code snippets anywhere in a document, to see the results within that document and finally to be able to document the process. Then if something changed in the data, or in the code, then each code snipped within a Notebook could be rerun at any time, and the results updated. This is a great way to do experimentation, to keep tracks of the tests your are doing and to document the whole process.

The idea is really interesting for the kind of work we are doing. I tested the Gorilla REPL which is an implementation of this style user interface in Clojure. Other such interfaces exists in other programming languages like IPython, Wolfram, etc. However, I always had an issue with what I was using: I had a hard time re-purposing the content I was creating; I couldn’t easily export this information in different format (blog posts, papers, etc.). Saving, reloading, re-running in different environment was often too much trouble: until I find Org-mode.

I am not sure why I didn’t came across Org-mode before, maybe because it was not advertised as as “notebook style programming user interface” but this is really what it is (mostly) all about, at least to me. As far as I know, this is the only such software that let you work with any kind of programming language in the same notebook. It can also export the notebooks in virtually any formats (several formats are supported by Org-mode itself, others can be exported using Pandoc).

This being said, I started experimenting with Org-mode to create different kind of Notebooks using Clojure. I am using notebooks that shows how to use different APIs we are creating, or ones that shows how different data processing workflows actually works or that shows how some structures (like UMBEL) have been created and how they can be leveraged. I am also creating notebooks to research and experiment different kind of algorithms that we are trying to implement in our products, or to do bug investigation reports for our clients, or… the possibilities are probably endless. But the core idea is almost always the same: communication. We write these notebooks to communicate (write) information for other people to consume (or more important, his future self).

Given this kind of tasks that I am performing in a notebook, I often have to run procedures that may takes minutes or even hours before their processing is finalized. However, as you will see below, running procedures that takes minutes to finalize is a show stopper with the current Org-mode Org-babel-clojure (ob-clojure.el)= package that let Org-mode to run Clojure code.

Installing & Configuring Org-mode

Before outlining the issues I had with the current implementation of the Org-babel-clojure package, let me explain how I installed and configured Org-mode locally.

First of all I installed Org-mode contribs from ELPA, then I configured it that way in my .emacs file. Note that I made multiple little changes here and there to end-up with the kind of editor I am comfortable to use. So this is about installing, enabling and tweaking Org-mode in Emacs:

;; Configure Org-mode with Cider

;; Load Org-mode
(add-to-list 'load-path "~/.emacs.d/lib/org-mode/")
(require 'org)

;; Here I specify the languages I want to be able to use with Org-babel.
(org-babel-do-load-languages
 'org-babel-load-languages
 '((clojure . t)
   (sh . t)
   (emacs-lisp . t)))

;; Specify the Clojure back-end we want to use in Org-mode.
;; I personally use Cider, but one could specify Slime
(setq org-babel-clojure-backend 'cider)

;; Let's have pretty source code blocks
(setq org-edit-src-content-indentation 0
      org-src-tab-acts-natively t
      org-src-fontify-natively t
       org-confirm-babel-evaluate nil
      org-support-shift-select 'always)

;; Useful keybindings when using Clojure from Org
(org-defkey org-mode-map "\C-x\C-e" 'cider-eval-last-sexp)
(org-defkey org-mode-map "\C-c\C-d" 'cider-doc)      

(require 'cider)

;; Remove the markup characters, i.e., "/text/" becomes (italized) "text"
(setq org-hide-emphasis-markers t)

;; No timeout when executing calls on Cider via nrepl
(setq org-babel-clojure-nrepl-timeout nil)

;; Turn on visual-line-mode for Org-mode only
;; Note: you have to install "adaptive-wrap" from elpa
(add-hook 'org-mode-hook 'turn-on-visual-line-mode)

;; Enable Confluence export (or any other contributed export formats)
(require 'ox-confluence)

Note that most of these configurations comes from the Org-babel-clojure webpage.

Timeout issues

The first issue I encountered is when I started to run code that was taking longer than 10 seconds. Every time I was running such code, I ran into the follow error:

“nrepl-send-sync-request: Sync nREPL request timed out”

What this error means is the the synchronous request to nREPL (the Clojure back-end that run the actual code written into Org-mode) timeout. I was really not expecting a query to timeout that way. This led me to start reading the Org-babel-clojure code to see where such an error may be coming from. However, I have to do a disclaimer here: I never really looked into Elisp code until now. The only other work I did with Elisp was to configure Emacs so be indulgent with me and report all awkward code I may be writing here.

My journey started with the ob-clojure.el which is the file used to make the bridge between Org-mode and the Clojure back-end (Cider/nREPL in this case). It is after reading that code that I noticed the following function: org-babel-execute:clojure which appeared to be the thing that is run when we run a Clojure code block in Org-mode. Then I noticed the call to the function nrepl-sync-request:eval. That needed to be the culprit and what sent this Sync timeout error. I found this function in the Cider code. But then I found this other function that is called by the later: nrepl-send-sync-request. It is when I read this function that I noticed the nrepl-sync-request-timeout variable. Looking back at org-babel-execute:clojure I couldn’t see where I could define this timeout parameter. I looks like it was not possible to define it, which was a big issue to me since I needed to be able to run procedure that takes minutes to run.

It is at that time that I choose to hack the ob-clojure.el code to expose that timeout setting such that I could setup it properly for my own needs. The code I created for that purpose is:

; Addition of the org-babel-clojure-nrepl-timeout setting
(defvar org-babel-clojure-nrepl-timeout nil)

(defun org-babel-execute:clojure (body params)
  "Execute a block of Clojure code with Babel."
  (let ((expanded (org-babel-expand-body:clojure body params))
        result)
    (case org-babel-clojure-backend
      (cider
       (require 'cider)
       (let ((result-params (cdr (assoc :result-params params))))
         (setq result
               (nrepl-dict-get
                ; Addition of the org-babel-clojure-nrepl-timeout setting
                (let ((nrepl-sync-request-timeout org-babel-clojure-nrepl-timeout))
                  (nrepl-sync-request:eval
                   expanded (cider-current-connection) (cider-current-session)))
                (if (or (member "output" result-params)
                        (member "pp" result-params))
                    "out"
                  "value")))))
      (slime
       (require 'slime)
       (with-temp-buffer
         (insert expanded)
         (setq result
               (slime-eval
                `(swank:eval-and-grab-output
                  ,(buffer-substring-no-properties (point-min) (point-max)))
                (cdr (assoc :package params)))))))
    (org-babel-result-cond (cdr (assoc :result-params params))
      result
      (condition-case nil (org-babel-script-escape result)
        (error result)))))

What I modified in this code is to add a new global setting org-babel-clojure-nrepl-timeout. If this setting is nil then there won’t be any timeout, otherwise the timeout value will be in seconds. What I did is simply to bind its value to the nREPL setting nrepl-sync-request-timeout and be done with it.

That solved this issue. After I updated ob-clojure.el accordingly, I could run Clojure code that may takes several minutes in Org-mode! That was fanstastic, but it was not optimal.

In fact, when I am running workflows that may take 30 minutes to finalize, I normally output processing steps in the REPL such that I know where the process is and what it is currently processing.

The problem with the current implementation of Org-babel-clojure is that it uses the synchronous API of the nREPL. What I want is to be able to run Clojure code asynchronously such that I can get some feedbacks (via the REPL) from the procedure I am running. This opened a kind of a Pandora box, and something that looked simple turned out to be more complex than anticipated for someone without any knowledge into Elisp, internal mechanisms and APIs of Emacs.

Making Org-babel-clojure “Asynchrone”

The next goal I had is to try to make Org-babel-clojure asynchrone. What I wanted is to be able to get, somehow, was the output of a Clojure procedure when that procedure was outputing something to the REPL. My second journey started after reading John Kitchin’s blog post about Asynchronously running Python code into Org-mode code blocks. What I found out is that Python code was run via a sub-process which run the Python interpreter. John’s solution was to use a local file to write what the interpreter is outputing and then to feed that output to a new window that got created by John’s function.

I took that example as a given, and then I tried to implement the same solution, but for Clojure (without knowing what I was really doing). It is in this process that I found that the Clojure solution to that problem would be quite different than John’s. There is an asynchronous API in nREPL, it is just that it is not used in Org-babel-clojure. What I ended-up using from John’s example is not his code, but his core idea: using a new window to output the asynchrone process and then to kill it once the processing is finalized and before populating #+RESULSTS section of the Org-mode file.

After much testing and debugging I ended-up with the following solution to my problem:

(defun org-babel-execute:clojure (body params)
  "Execute a block of Clojure code with Babel."
  (lexical-let* ((expanded (org-babel-expand-body:clojure body params))
                 ; name of the buffer that will receive the asyn output
                 (sbuffer "*Clojure Sub Buffer*")
                 ; determine if the :async option is specified for this block
                 (async (if (assoc :async params) t nil))
                 ; generate the full response from the REPL
                 (response (cons 'dict nil))
                 ; keep track of the status of the output in async mode
                 status
                 ; result to return to Babel
                 result)
    (case org-babel-clojure-backend
      (cider
       (require 'cider)
       (let ((result-params (cdr (assoc :result-params params))))
         ; Check if the user want to run code asynchronously
         (when async
           ; Create a new window with the async output buffer
           (switch-to-buffer-other-window sbuffer)

           ; Run the Clojure code asynchronously in nREPL
           (nrepl-request:eval
            expanded 
            (lambda (resp) 
              (when (member "out" resp)
                ; Print the output of the nREPL in the asyn output buffer
                (princ (nrepl-dict-get resp "out") (get-buffer sbuffer)))
              (nrepl--merge response resp)
              ; Update the status of the nREPL output session
              (setq status (nrepl-dict-get response "status")))
            (cider-current-connection) 
            (cider-current-session))

           ; Wait until the nREPL code finished to be processed
           (while (not (member "done" status))
             (nrepl-dict-put response "status" (remove "need-input" status))
             (accept-process-output nil 0.01)
             (redisplay))

           ; Delete the async buffer & window when the processing is finalized
           (let ((wins (get-buffer-window-list sbuffer nil t)))
             (dolist (win wins)
               (delete-window win))
             (kill-buffer sbuffer))

           ; Put the output or the value in the result section of the code block
           (setq result (nrepl-dict-get response 
                                        (if (or (member "output" result-params)
                                                (member "pp" result-params))
                                            "out"
                                          "value"))))
         ; Check if user want to run code synchronously
         (when (not async)
           (setq result
                 (nrepl-dict-get
                  (let ((nrepl-sync-request-timeout 
                         org-babel-clojure-nrepl-timeout))
                    (nrepl-sync-request:eval
                     expanded (cider-current-connection) (cider-current-session)))
                  (if (or (member "output" result-params)
                          (member "pp" result-params))
                      "out"
                    "value"))))))
      (slime
       (require 'slime)
       (with-temp-buffer
         (insert expanded)
         (setq result
               (slime-eval
                `(swank:eval-and-grab-output
                  ,(buffer-substring-no-properties (point-min) (point-max)))
                (cdr (assoc :package params)))))))
    (org-babel-result-cond (cdr (assoc :result-params params))
      result
      (condition-case nil (org-babel-script-escape result)
        (error result)))))

The first thing this code does, is to expose a new #+BEGIN_SRC option called :async. If the new :async option is specified in a block code for the Clojure language, then that code block will be processed asynchronously. What this means is that a new window will be created in Emacs, it will be populated with anything that is outputted to the REPL and then it will be closed once the processing will be finalized.

Here is an example of a code block that would use that new option:

#+BEGIN_SRC clojure :results output :async

(dotimes [n 10]
  (println n ".")
  (Thread/sleep 500))

#+END_SRC

This code would output “1. 2.” etc into a new window and would close that window when it reaches 10 and then populate the #+RESULTS section with the output of the code.

This code works with the :results options output, value and silent. If output is specified, then everything that was outputted into the window will be added into the results section of the code block. If value is specified, then all output will still be displayed into the window, but only the resulting value will be added to the results section of the code block. If silent is specified, then all the output will still be displayed into the window, but nothing will be displayed in the results section of the code block.

If the :async is omitted, then the normal behavior of Org-babel-clojure will be used, with the new timeout setting org-babel-clojure-nrepl-timeout.

Call for help!

As I mentioned above, this is my attempt in coding something for Emacs using Elisp. There are certainly things that should be done differently. So if you have any Elisp and/or Cider/nREPL knowledge, and if you have some time to review this code, I am sure we could improve the usage of this function. The only thing I know is that such asynchronous capabilities of the Clojure code blocks is essential.

There is one major area of improvement that I noted. Right now, the results comes asynchronously, but we still can’t use the Emacs instance to do other things (like writing in the Org-mode file while the process is running in background and results reported in this other buffer. Until this other issue is resolved, I don’t think we can say that this really makes Org-babel-clojure really 100% asynchronous. If this can be done (I did not have time to look into this yet), then I think the :async feature would be fully and properly integrated, but I am not yet sure if this is possible.

 Sources

For the ones interested in this update of Org-babel-clojure, here is:

  • The Org-mode file of this blogpost which you can run to test the updated org-babel-execute:clojure function
  • The diff file if you want to update your local ob-clujure.el file

clj-fst: Finite State Transducers (FST) for Clojure

clj-fst is a Clojure wrapper around the Lucene FST API. Finite state transducers are finite state machines with two tapes: an input and an output tape. The automaton maps an input string to an output. The output can be a vector of strings or a vector of integers. There are more profound mathematical implications to FSTs, but those are the basics for now.

Why Use FSTs?

Considering that basic definition of a FST, one could legitimately wonder why he should care about FSTs. FSTs could be seen as simple Clojure maps, so why bother with FSTs?

Everything is a matter of scale. Using a map, or such generic structures, for efficiently handling millions or billions of values is far from effective, if even possible.

That is why we need some specialized structures like FSTs: to be able to create such huge associative structures that are lightning fast to query and that use a minimum of memory.

There are two general use cases for using FSTs:

  1. When you want to know if an instance A exists in a really huge set X (where the set X is the FST)
  2. When you want to get a list of outputs from a given input from a really huge set.

Lucene FSTs

There are multiple FSTs implementations out there, however I choose to go with Lucene’s implementation development by Micheal McCandless. The main reason for using the Lucene FST API is because of their implementation of the FST. It implements the work of Stoyan Mihov and Denis Maurel1 to create a minimal unweighted FST from pre-sorted inputs. The implementation results in lightning fast querying of the structure with a really efficient use of memory. Considering the size of the structures we manipulate at Structured Dynamics, these were the two main characteristics to look for and the reason why we choose that implementation.

Limitations

However, there are two things to keep in mind when working with FSTs:

  1. The FSTs are static. This means that you cannot add to them once they are created. You have to re-create them from the beginning if you want to change their content.
  2. The entries have to be pre-sorted. If your entries are not sorted when you create the FST ,then unexpected results will happen.

clj-fst

The clj-fst project is nothing more than a wrapper around the Lucene FST API. However, one of the goals of this project is to make this specific Lucene function outstanding and to liberalize its usage in Clojure.

If you take the time to analyze the clj-fst wrapper, and the Lucene API code, you will notice that not all the of functionality of the API is wrapped. The thing is that the API is somewhat complex and doesn’t have much documentation. What clj-fst tries to do is to simplify the usage of the API and to create more documentation and code usage examples around it. Finally, it tries create an abstraction layer over the API to manipulate the FSTs in the Clojure way…

Basic Usage

Creating an FST is really simple, it has 3 basic, and one optional, steps:

  1. Create the FST builder
  2. Populate the FST using the builder
  3. Create the actual FST from the builder
  4. Optionally, save the FST on the file system to reload it later in memory.

Note that the complete clj-fst documentation is available here.

The simplest code looks like:

[cc lang=”lisp”]
[raw]
;; The first thing to do is to create the Builder
(def builder (create-builder! :type :int))

;; This small sorted-map defines the things
;; to add to the FST
(def values (into (sorted-map) {“cat” 1
“dog” 2
“mice” 3}))

;; Populate the FST using that sorted-map
(doseq [[input output] values]
(add! builder {input output}))

;; Creating a new FST
(def fst (create-fst! builder))

;; Save a FST on the file system
(save! “resources/fst.srz” fst)
[/raw]
[/cc]

Once the FST is saved on the file system, you can easily reload it later:

[cc lang=”lisp”]
[raw]
;; Load a FST from the file system
(load! “resources/fst.srz)
[/raw]
[/cc]

You can easily get the output related to an input:

[cc lang=”lisp”]
[raw]
;; Query the FST
(get-output “cat” fst)
[/raw]
[/cc]

You can iterate the content of FST:

[cc lang=”lisp”]
[raw]
;; Create the FST enumeration
(def enum (create-enum! fst))

;; Get the first item in the FST
(next! enum)

;; Get the current FST item pointed by the enumerator
(current! enum)
[/raw]
[/cc]

Finally you have other ways to query the FST using the enumerator:

[cc lang=”lisp”]
[raw]
;; Search for different input terms
(get-ceil-term! “cat” enum)

(get-floor-term! “cat” enum)

(get-exact-term! “cat” enum)
[/raw]
[/cc]

More Complex Example

Let’s take a look at a more complex example. What we will be doing here is to create a FST that will be used as a high performance inference index for UMBEL reference concepts (classes). What we are doing is to query the UMBEL super classes web service endpoint to populate the super-types index.

The process is:

  1. Get the number of concepts in the UMBEL structure
  2. Get the list of all the UMBEL concepts using the UMBEL search endpoint
  3. Sort the list of UMBEL concepts URIs
  4. Get the super-classes, by inference, for each of the concepts
  5. Populate the FST with the concepts as input and its super-classes as output
  6. Save the FST on the file system.

To simplify the example, I simply list all of the UMBEL reference concepts in a CSV file. However, you could have created that list using the UMBEL search web service endpoint.

The function that creates the UMBEL reference concepts super-classes index is:

[cc lang=”lisp”]
[raw]
(ns foo.core
(:require [clojure.string :as string]
[clj-http.client :as http]
[clojure.data.csv :as csv]
[clojure.java.io :as io]
[clj-fst.core :as fst]))

(defn get-umbel-reference-concepts []
(->> (with-open [in-file (io/reader “https://fgiasson.com/blog/wp-content/uploads/2015/04/umbel-reference-concepts.csv”)]
(doall
(csv/read-csv in-file)))
flatten
(into [])))

(defn create-umbel-super-classes-fst []
(let [ref-concepts (->> (get-umbel-reference-concepts)
(map (fn [ref-concept]
[(string/replace ref-concept “http://umbel.org/umbel/rc/” “”)]))
(apply concat)
(into [])
distinct
sort)
builder (fst/create-builder! :type :char :pack true)]
(doseq [ref-concept ref-concepts]
(println ref-concept)
(let [resultset (http/get (str “http://umbel.org/ws/super-classes/” ref-concept)
{:accept “application/clojure”
:throw-exceptions false})]
(when (= (get resultset :status) 200)
(doseq [super-class (->> resultset
:body
read-string
(into []))]
(fst/add! builder {(str “http://umbel.org/umbel/rc/” ref-concept) super-class})))))
(let [fst (fst/create-fst! builder)]
(fst/save! “resources/umbel-super-classes.fst” fst))))
[/raw]
[/cc]

After running the (create-umbel-super-classes-fst) function, a umbel-super-classes.fst file will be created in the resources/ folder of your project. This process should take about 5 to 10 minutes to complete. All the latency comes from the fact that you have to issue a web service query for every concept. From the standpoint of the FST, you could populate one with millions of inputs within a few seconds.

Eventually you will be able to reload that index in any context:

[cc lang=”lisp”]
[raw]
(def umbel-super-classes (fst/load! “resources/umbel-super-classes.fst”))
[/raw]
[/cc]

 

Conclusion

As you can see, an FST is a really interesting structure that lets you query really huge arrays in an effective way. The goal of this new Clojure library is to make its usage as simple as possible. It is intended to be used by any developer that has to query very large sets of data with a computational- and memory-effective way.

Open Semantic Framework 3.1 Released

Structured Dynamics is happy to announce the immediate availability of the Open Semantic Framework version 3.1. This new version includes a set of fixes to different components of the framework in the last few months. The biggest change is deployment of OSF using Virtuoso Open Source version 7.1.0. triple_120

We also created a new API for Clojure developers called: clj-osf. Finally we created a new Open Semantic Framework web portal that better describes the project and is hopefully easier to use and more modern.

Quick Introduction to the Open Semantic Framework

What is the Open Semantic Framework?

The Open Semantic Framework (OSF) is an integrated software stack using semantic technologies for knowledge management. It has a layered architecture that combines existing open source software with additional open source components. OSF is designed as an integrated content platform accessible via the Web, which provides needed knowledge management capabilities to enterprises. OSF is made available under the Apache 2 license.

OSF can integrate and manage all types of content – unstructured documents, semi-structured files, spreadsheets, and structured databases – using a variety of best-of-breed data indexing and management engines. All external content is converted to the canonical RDF data model, enabling common tools and methods for tagging and managing all content. Ontologies provide the schema and common vocabularies for integrating across diverse datasets. These capabilities can be layered over existing information assets for unprecedented levels of integration and connectivity. All information within OSF may be powerfully searched and faceted, with results datasets available for export in a variety of formats and as linked data.

A new Open Semantic Framework website

The OSF 3.1 release also triggered the creation of a new website for the project. We wanted something leaner and more modern and that is what I think we delivered. We also reworked the content, we wrote about a series of usecases 1 2 3 4 5 6 and we better aggregated and presented information for each web service endpoint.

A new OSF sandbox

We also created an OSF sandbox where people can test each web service endpoint and test how each functionality works. All of the web services are open to users. The sandbox is not meant to be stable considering that everybody have access to all endpoints. However, the sandbox server will be recreated on a periodic basis. If the sandbox is totally broken and users experiment issues, they can always request a re-creation of the server directly on the OSF mailing list.

Each of the web service pages on the new OSF portal has a Sandbox section where you see some code examples of how to use the endpoint and how to send requests to the sandbox. Here are the instructions to use the sandbox server.

A new OSF API for Clojure: clj-osf

The OSF release 3.1 also includes a new API for Clojure developers: clj-osf.

clj-osf is a Domain Specific Language (DSL) that should lower the threshold to use the Open Semantic Framework.

To use the DSL, you only have to configure your application to use a specific OSF endpoint. Here is an example of how to do this for the Sandbox server:

[cc lang=”lisp”]
[raw]
;; Define the OSF Sandbox credentials (or your own):
(require ‘[clj-osf.core :as osf])

(osf/defosf osf-test-endpoint {:protocol :http
:domain “sandbox.opensemanticframework.org”
:api-key “EDC33DA4D977CFDF7B90545565E07324”
:app-id “administer”})

(osf/defuser osf-test-user {:uri “http://sandbox.opensemanticframework.org/wsf/users/admin”})
[/raw]
[/cc]

Then you can send simple OSF web service queries. Here is an example that sends a search query to return records of type foaf:Person that also match the keyword “bob”:

[cc lang=”lisp”]
[raw]
(require ‘[clj-osf.search :as search])

(search/search
(search/query “bob”)
(search/type-filters [“http://xmlns.com/foaf/0.1/Person”]))
[/raw]
[/cc]

A complete set of clj-osf examples is available on the OSF wiki.

Finally the complete clj-osf DSL documentation is available here.

A community effort

This new release of the OSF Installer is another effort of the growing Open Semantic Framework community. The upgrade of the installer to deploy the OSF stack using Virtuoso Open Source version 7.1.0 has been created by William (Bill) Anderson.

Deploying a new OSF 3.1 Server

Using the OSF Installer

OSF 3.1 can easily be deployed on a Ubuntu 14.04 LTS server using the osf-installer application. It can easily be done by executing the following commands in your terminal:

[cc lang=”bash”]
[raw]
mkdir -p /usr/share/osf-installer/

cd /usr/share/osf-installer/

wget https://raw.github.com/structureddynamics/Open-Semantic-Framework-Installer/3.1/install.sh

chmod 755 install.sh

./install.sh

./osf-installer –install-osf -v
[/raw]
[/cc]

Using a Amazon AMI

If you are an Amazon AWS user, you also have access to a free AMI that you can use to create your own OSF instance. The full documentation for using the OSF AMI is available here.

Upgrading Existing Installations

Existing OSF installations can be upgraded using the OSF Installer. However, note that the upgrade won’t deploy Virtuoso Open Source 7.1.0 for you. All the code will be upgraded, but Virtuoso will remain the version you were last using on your instance. All the code of OSF 3.1 is compatible with previous versions of Virtuoso, but you won’t benefit the latest improvements to Virtuoso (in terms of performances) and its latest SPARQL 1.1 implementations. If you want to upgrade Virtuoso to version 7.1.0 on an existing OSF instance you will have to do this by hands.

To upgrade the OSF codebase, the first thing is to upgrade the installer itself:

[cc lang=”bash”]
[raw]
# Upgrade the OSF Installer
./usr/share/osf-installer/upgrade.sh
[/raw]
[/cc]

Then you can upgrade the components using the following commands:

[cc lang=”bash”]
[raw]
# Upgrade the OSF Web Services
./usr/share/osf-installer/osf –upgrade-osf-web-services=”3.1.0″

# Upgrade the OSF WS PHP API
./usr/share/osf-installer/osf –upgrade-osf-ws-php-api=”3.1.0″

# Upgrade the OSF Tests Suites
./usr/share/osf-installer/osf –upgrade-osf-tests-suites=”3.1.0″

# Upgrade the Datasets Management Tool
./usr/share/osf-installer/osf –upgrade-osf-datasets-management-tool=”3.1.0″

# Upgrade the Data Validator Tool
./usr/share/osf-installer/osf –upgrade-osf-data-validator-tool=”3.1.0″
[/raw]
[/cc]

clj-turtle: A Clojure Domain Specific Language (DSL) for RDF/Turtle

Some of my recent work leaded me to heavily use Clojure to develop all kind of new capabilities for Structured Dynamics. The ones that knows us, knows that every we do is related to RDF and OWL ontologies. All this work with Clojure is no exception.

Recently, while developing a Domain Specific Language (DSL) for using the Open Semantic Framework (OSF) web service endpoints, I did some research to try to find some kind of simple Clojure DSL that I could use to generate RDF data (in any well-known serialization). After some time, I figured out that no such a thing was currently existing in the Clojure ecosystem, so I choose to create my simple DSL for creating RDF data.

The primary goal of this new project was to have a DSL that users could use to created RDF data that could be feed to the OSF web services endpoints such as the CRUD: Create or CRUD: Update endpoints.

What I choose to do is to create a new project called clj-turtle that generates RDF/Turtle code from Clojure code. The Turtle code that is produced by this DSL is currently quite verbose. This means that all the URIs are extended, that the triple quotes are used and that the triples are fully described.

This new DSL is mean to be a really simple and easy way to create RDF data. It could even be used by non-Clojure coder to create RDF/Turtle compatible data using the DSL. New services could easily be created that takes the DSL code as input and output the RDF/Turtle code. That way, no Clojure environment would be required to use the DSL for generating RDF data.

Installation

For people used to Clojure and Leinengen, you can easily install clj-turtle using Linengen. The only thing you have to do is to add Add [clj-turtle "0.1.3"] as a dependency to your project.clj.

Then make sure that you downloaded this dependency by running the lein deps command.

API

The whole DSL is composed of simply six operators:

  • rdf/turtle
    • Used to generate RDF/Turtle serialized data from a set of triples defined by clj-turtle.
  • defns
    • Used to create/instantiate a new namespace that can be used to create the clj-turtle triples
  • rei
    • Used to reify a clj-turtle triple
  • iri
    • Used to refer to a URI where you provide the full URI as an input string
  • literal
    • Used to refer to a literal value
  • a
    • Used to specify the rdf:type of an entity being described

Usage

Working with namespaces

The core of this DSL is the defns operator. What this operator does is to give you the possibility to create the namespaces you want to use to describe your data. Every time you use a namespace, it will generate a URI reference in the triple(s) that will be serialized in Turtle.

However, it is not necessary to create a new namespace every time you want to serialize Turtle data. In some cases, you may not even know what the namespace is since you have the full URI in hands already. It is why there is the iri function that let you serialize a full URI without having to use a namespace.

Namespaces are just shorthand versions of full URIs that mean to make your code cleaner an easier to read and maintain.

Syntactic rules

Here are the general syntactic rules that you have to follow when creating triples in a (rdf) or (turtle) statement:

  1. Wrap all the code using the (rdf) or the (turtle) operator
  2. Every triple need to be explicit. This means that every time you want to create a new triple, you have to mention the subject, predicate and the object of the triple
  3. A fourth “reification” element can be added using the rei operator
  4. The first parameter of any function can be any kind of value: keywords, strings, integer, double, etc. They will be properly serialized as strings in Turtle.

Strings and keywords

As specified in the syntactic rules, at any time, you can use a string, a integer, a double a keyword or any other kind of value as input of the defined namespaces or the other API calls. You only have to use the way that is more convenient for you or that is the cleanest for your taste.

More about reification

Note: RDF reification is quite a different concept than Clojure’s reify macro. So carefully read this section to understand the meaning of the concept in this context.

In RDF, reifying a triple means that we want to add additional information about a specific triple. Let’s take this example:

[cc lang=”lisp”]
[raw]
(rdf
(foo :bar) (iron :prefLabel) (literal “Bar”))
[/raw]
[/cc]

In this example, we have a triple that specify the preferred label of the :bar entity. Now, let’s say that we want to add “meta” information about that specific triple, like the date when this triple got added to the system for example.

That additional information is considered the “fourth” element of a triple. It is defined like this:

[cc lang=”lisp”][raw]
(rdf
(foo :bar) (iron :prefLabel) (literal “Bar”) (rei
(foo :date) (literal “2014-10-25” :type :xsd:dateTime)))
[/raw][/cc]

What we do here is to specify additional information regarding the triple itself. In this case, it is the date when the triple got added into our system.

So, reification statements are really “meta information” about triples. Also not that reification statements doesn’t change the semantic of the description of the entities.

Examples

Here is a list of examples of how this DSL can be used to generate RDF/Turtle data:

Create a new namespace

The first thing we have to do is define the namespaces we will want to use in our code.

[cc lang=”lisp”][raw]
(defns iron “http://purl.org/ontology/iron#”)
(defns foo “http://purl.org/ontology/foo#”)
(defns owl “http://www.w3.org/2002/07/owl#”)
(defns rdf “http://www.w3.org/1999/02/22-rdf-syntax-ns#”)
(defns xsd “http://www.w3.org/2001/XMLSchema#”)
[/raw][/cc]

Create a simple triple

The simplest example is to create a simple triple. What this triple does is to define the preferred label of a :bar entity:

[cc lang=”lisp”][raw]
(rdf
(foo :bar) (iron :prefLabel) (literal “Bar”))
[/raw][/cc]

Output:

[cce]<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Bar””” .[/cce]

Create a series of triples

This example shows how we can describe more than one attribute for our bar entity:

[cc lang=”lisp”][raw]
(rdf
(foo :bar) (a) (owl :Thing)
(foo :bar) (iron :prefLabel) (literal “Bar”)
(foo :bar) (iron :altLabel) (literal “Foo”))
[/raw][/cc]

Output:

[cce]<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Thing> .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Bar””” .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#altLabel> “””Foo””” .
[/cce]

Note: we prefer having one triple per line. However, it is possible to have all the triples in one line, but this will produce less readable code:

Just use keywords

It is possible to use keywords everywhere, even in (literals)

[cc lang=”lisp”][raw]
(rdf
(foo :bar) (a) (owl :Thing)
(foo :bar) (iron :prefLabel) (literal :Bar)
(foo :bar) (iron :altLabel) (literal :Foo))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/2002/07/owl#Thing> .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Bar””” .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#altLabel> “””Foo””” .
[/cce]

Just use strings

It is possible to use strings everywhere, even in namespaces:

[cc lang=”lisp”][raw]
(rdf
(foo “bar”) (a) (owl “Thing”)
(foo “bar”) (iron :prefLabel) (literal “Bar”)
(foo “bar”) (iron :altLabel) (literal “Foo”))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/2002/07/owl#Thing> .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Bar””” .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#altLabel> “””Foo””” .
[/cce]

Specifying a datatype in a literal

It is possible to specify a datatype for every (literal) you are defining. You only have to use the :type option and to specify a XSD datatype as value:

[cc lang=”lisp”][raw]
(rdf
(foo “bar”) (foo :probability) (literal 0.03 :type :xsd:double))
[/raw][/cc]

Equivalent codes are:

[cc lang=”lisp”][raw]
(rdf
(foo “bar”) (foo :probability) (literal 0.03 :type (xsd :double)))
(rdf
(foo “bar”) (foo :probability) (literal 0.03 :type (iri “http://www.w3.org/2001/XMLSchema#double”)))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/foo#probability> “””0.03″””^^xsd:double .
[/cce]

Specifying a language for a literal

It is possible to specify a language string using the :lang option. The language tag should be a compatible ISO 639-1 language tag.

[cc lang=”lisp”][raw]
(rdf
(foo “bar”) (iron :prefLabel) (literal “Robert” :lang :fr))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Robert”””@fr .
[/cce]

Defining a type using the an operator

It is possible to use the (a) predicate as a shortcut to define the rdf:type of an entity:

[cc lang=”lisp”][raw]
(rdf
(foo “bar”) (a) (owl “Thing”))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/2002/07/owl#Thing> .
[/cce]

This is a shortcut for:

[cc lang=”lisp”][raw]
(rdf
(foo “bar”) (rdf :type) (owl “Thing”))
[/raw][/cc]

Specifying a full URI using the iri operator

It is possible to define a subject, a predicate or an object using the (iri) operator if you want to defined them using the full URI of the entity:

[cc lang=”lisp”][raw]
(rdf
(iri “http://purl.org/ontology/foo#bar”) (iri “http://www.w3.org/1999/02/22-rdf-syntax-ns#type) (iri http://www.w3.org/2002/07/owl#Type))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Type> .
[/cce]

Simple reification

It is possible to reify any triple suing the (rei) operator as the fourth element of a triple:

[cc lang=”lisp”][raw]
(rdf
(foo :bar) (iron :prefLabel) (literal “Bar”) (rei
(foo :date) (literal “2014-10-25” :type :xsd:dateTime)))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Bar””” .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://purl.org/ontology/foo#bar> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://purl.org/ontology/iron#prefLabel> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> “””Bar””” .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://purl.org/ontology/foo#date> “””2014-10-25″””^^xsd:dateTime .
[/cce]

Reify multiple properties

It is possible to add multiple reification statements:

[cc lang=”lisp”][raw]
(rdf
(foo :bar) (iron :prefLabel) (literal “Bar”) (rei
(foo :date) (literal “2014-10-25” :type :xsd:dateTime)
(foo :bar) (literal 0.37)))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Bar””” .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://purl.org/ontology/foo#bar> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://purl.org/ontology/iron#prefLabel> .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> “””Bar””” .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://purl.org/ontology/foo#date> “””2014-10-25″””^^xsd:dateTime .
<rei:6930a1f93513367e174886cb7f7f74b7> <http://purl.org/ontology/foo#bar> “””0.37″”” .
[/cce]

Using clj-turtle with clj-osf

clj-turtle is meant to be used in Clojure code to simplify the creation of RDF data. Here is an example of how clj-turtle can be used to generate RDF data to feed to the OSF Crud: Create web service endpoint via the clj-osf DSL:

[cc lang=”lisp”][raw]
[require ‘[clj-osf.crud :as crud])

(crud/create
(crud/dataset “http://test.com/datasets/foo”)
(crud/document
(rdf
(iri link) (a) (bibo :Article)
(iri link) (iron :prefLabel) (literal “Some article”)))
(crud/is-rdf-n3)
(crud/full-indexation-mode))
[/raw][/cc]

Using the turtle alias operator

Depending on your taste, it is possible to use the (turtle) operator instead of the (rdf) one to generate the RDF/Turtle code:

[cc lang=”lisp”][raw]
(turtle
(foo “bar”) (iron :prefLabel) (literal “Robert” :lang :fr))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Robert”””@fr .
[/cce]

Merging clj-turtle

Depending the work you have to do in your Clojure application, you may have to generate the Turtle data using a more complex flow of operations. However, this is not an issue for clj-turtle since the only thing you have to do is to concatenate the triples you are creating. You can do so using a simple call to the str function, or you can create more complex processing using loopings, mappings, etc that end up with a (apply str) to generate the final Turtle string.

[cc lang=”lisp”][raw]
(str
(rdf
(foo “bar”) (a) (owl “Thing”))
(rdf
(foo “bar”) (iron :prefLabel) (literal “Bar”)
(foo “bar”) (iron :altLabel) (literal “Foo”)))
[/raw][/cc]

Output:

[cce]
<http://purl.org/ontology/foo#bar> <http://www.w3.org/1999/02/22-rdf-syntax-ns#> <http://www.w3.org/2002/07/owl#Thing> .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#prefLabel> “””Bar””” .
<http://purl.org/ontology/foo#bar> <http://purl.org/ontology/iron#altLabel> “””Foo””” .
[/cce]

Conclusion

As you can see now, this is a really simple DSL for generating RDF/Turtle code. Even if simple, I find it quite effective by its simplicity. However, even if it quite simple and has a minimum number of operators, this is flexible enough to be able to describe any kind of RDF data. Also, thanks to Clojure, it is also possible to write all kind of code that would generate DSL code that would be executed to generate the RDF data. For example, we can easily create some code that iterates a collection to produce one triple per item of the collection like this:

[cc lang=”lisp”][raw]
(->> {:label “a”
:label “b”
:label “c”}
(map
(fn [label]
(rdf
(foo :bar) (iron :prefLabel) (literal label))))
(apply str))
[/raw][/cc]

That code would generate 3 triples (or more if the input collection is bigger). Starting with this simple example, we can see how much more complex processes can leverage clj-turtle for generating RDF data.

A future enhancement to this DSL would be to add a syntactic rule that gives the opportunity to the user to only have to specify the suject of a triple the first time it is introduced to mimic the semi-colon of the Turtle syntax.