Literate Programming, Programming, Clojure

Literate [Clojure] Programming: Anatomy of a Org-mode file

This blog post is the second of a series of blog posts about Literate [Clojure] Programming where I explain how I develop my [Clojure] applications using literate programming concepts and principles. In the previous blog post I outlined a project’s structure. In this blog post I will demonstrate how I normally structure an Org-mode file to discuss the problem I am trying to solve, to code it and to test it.

One of the benefits of Literate Programming is that the tools that implement its concepts (in this case Org-mode) give to the developer the possibility to write its code in the order (normally more human friendly) he wants. This is one of the aspects I will cover in this article.

If you want to look at a really simple [Clojure] literate application I created for my Creating And Running Unit Tests Directly In Source Files With Org-mode blog post, take a look at the org-mode-clj-tests-utils (for the rendered version). It should give you a good example of what a literate file that follows the structure discussed here looks like.

This blog post belong to a series of posts about Literate [Clojure] Programming:

  1. Configuring Emacs for Org-mode
  2. Project folder structure
  3. Anatomy of a Org-mode file (this post)
  4. Tangling all project files
  5. Publishing documentation in multiple formats
  6. Unit Testing

Structure

A literate programming file can have any kind of structure. Depending on the task at hand, it can take the form of a laboratory notebook or a software documentation file. The structure I will explain here is the structure I use to develop normal applications using the Clojure programming language. In other blog posts I will explain other styles, but I will stick to that one for now.

The usual structure of a literate programming file is composed of the following sections:

  1. introduction
  2. main section
    1. sub-section
      1. introduction
      2. code/explanation/…/code/explanation
      3. unit tests
    2. sub-section
  3. complete namespace definition
    1. unit tests

Each of the sub-sections has the same outline, but multiple levels of sub-sections can be created depending on the needs. Every code block is uniquely named (identified) and belongs to a section or a subsection. The portion of that outline that lets you write your application in the order more friendly to a compiler is the complete namespace definition section which is where we “reconstruct” the code to be tangled (written in a standard source code file).

Introduction

Every file starts with a title and a description of the problem you are trying to solve and an overview of how you are trying to solve it. If required, subsections can always be added to the introduction to properly describe the problem and the solution to that problem. In any case, no code blocks are defined in the introduction; only text, images, tables of data or anything else that helps define a problem and its solution are included.

Note that the title of the file is defined using the #+TITLE: Org-mode markup.

Main & Sub Sections

The main and sub-sections have the same outline. They only differ in the level of the details. You could have a series of main sections without any sub-sections in them. Or you could have a single main section with multiple levels of sub-sections. This split really depends on how you want to formulate the solution to the problem exposed in the introduction.

A section should define a portion of the solution you are developing to fix the problem. The scope of the section is only defined by the developer and it depends on how things are being solved. A more complex solution may require more refined solutions which would require subsections (or multiple levels of them).

In any case, for each of these sections, I almost always define the following portions:

  1. introduction
  2. code/explanation/…/code/explanation
  3. unit tests

For each section, I try to introduce the portion of the solution with some text, images or data tables. Then I start to code my application by adding the code of my application in code blocks intertwined with some text iteratively until that portion of the overall solution is completed. Then I define a third section where I create some unit tests where I iteratively test the functions I created in the section. The unit tests are also used to document how the API can be used by acting as usage examples.

It is also possible that you may want to define a section in your file, but that you don’t want to weave that section in the resulting documentation. This can easily be done by adding the :noexport: markup at the end of a section title.

Code Blocks

Each code block should be named with a unique name across all org-mode files of your project. A name is defined using the #NAME: markup before a code block starts. The name is quite important since it helps understand the flow of your application. It should be written as a short description of what the code block does.

Code blocks in Org-mode have numerous options. However we will only cover the few key ones here. Note that most of the other options will be defined in the Complete Namespace Definition section of the file. This is where we will reference the name of the code block and where we will order the code to tangle from this literate file.

One of the key options of a code block is the :results option which can have one of the following values: silent, value or output. Depending on what you want to output in the literate document, you can display the value returned by the code block, the output of the code processed in the code block, or you can make it silent. The value or the output of a code block’s execution will appear in an EXAMPLE block underneath the code block.

Another important header option of the code block is :export which is used to tell Org-mode how to weave the code block and its results. It has 4 options: code (default), which only exports the code box, results which only exports the results box, both which export both and none which exports nothing when weaving a document.

As I said, many other header options exist like the possibility to use the result of a code block to assign to a variable that can be passed to another block in your literate file which lets you create workflows within your literate files. However I won’t cover these options here since they are more used in a laboratory notebook style.

Unit Tests Blocks

At the end of each section, I usually define a Unit tests section which is where I define different unit tests for the code defined in a section or one of its sub-sections. These tests are defined in a named code block. They are used to unit test the functions I created in a document and they are also used as API usage examples. Each of the unit tests blocks is aggregated in a test suites in the Unit Tests sub-section of the Complete Namespace Definition section (see below).

These unit tests are executed directly into the Org-mode file while I am developing them. This means that any issues will be caught right away and fixed in the code of that section. Also, if the code is updated in the future, these unit tests will be re-executed right away and any issues will be output directly in the Org-mode file without having to switch to any other testing facilities.

Complete Namespace Definition

A the end of each literate programming file, I do create a Complete Namespace Definition section where I outline how the tangled code will be ordered in the generated source file. Generally we don’t want to export this section into the weaved document, so I define it with the :noexport: markup at the end of the section name.

This section is where I define the header of my source files (usually the namespace declaration, import statements and such), where I order code to be tangled and where we define the code block header parameters related to tangling the code into the source code files.

It is in this section that you will understand why it is important to spend some time properly naming your code blocks in your file: since it is these names that will appear in this section that will make the outline of our code understandable.

There are 3 header parameters that I normally use for that code block:

  1. :tangle ../../../
  2. :mkdirp yes
  3. :noweb yes
  4. :results silent

First we don’t want to output anything in the Org-mode file after executing the code block, so we put :results to silent. Then we want to use the WEB markup in the code block, so we define the parameter :noweb to yes. Then if one of the folders specified by :tangle is not existing, we want Org-mode to create it for us instead of failing. Finally, the :tangle parameter is defined with the path where the tangled document will be written to the file system. The location of the source file will comply with the structure of your application.

The code block of the Unit tests sub section will be tangled in the unit test folder of your project.

Structure Navigation & Conclusion

One of the benefits of writing application using Literate Programming principles is that what we end-up doing is to create a much more human readable outline of an application. We end-up creating sections, subsections, etc. just like when you write an article, or a book, where you create sections and subsections where each of the sections focus on the thing your are writing about. To me, this is a much more natural work style to solve a problem. It is also much easier to share with non-developers who need to understand how your applications behave. To these people, it is like reading a scientific article grounded into a mathematic framework: if you are not a mathematician (and even if you are but are not familiar with the concepts discussed in the article) you will most likely (at least for a first read) read the article, understand its structure and idea, but you will skip the boxes where you have the equations. Here the same minding applies, it is just that non-developers will skip the boxes were there is the code. But in any case, he should be able to understand what you are doing, the problem you are trying to solve, and how you are trying to solve it.

This is why the structure that gets created when developing applications is quite interesting and beneficial. This structure is what I really like with Org-mode, which at its core is nothing other than a plain text outliner. This means that Org-mode has several features to help you manipulate and navigate the outline structure of a text file. Like conventional programming with an IDE where you can extend and collapse blocks of code, with Org-mode you can extend and collapse the outline of the document (created by the sections and subsections of your files).

This is quite powerful since you can focus on a series of functions that solve a particular problem just by extending its section and collapsing all others. You can even display only the content of that section into the Emacs buffer by using C-x n s to focus on a Org-mode region and C-x n w to unfocus that region. This means that even if you have a single file with several thousand of lines it doesn’t really matter since you can see any section of that file like if it was its own tiny file. This may be appealing to developers that don’t like a proliferation of files in their projects (in fact, they could end up with a single master well structured Org-mode file that gets tangled as multiple source code files).

2 thoughts on “Literate [Clojure] Programming: Anatomy of a Org-mode file

Leave a Reply to alipiodigital Cancel reply