Literate Programming is a great way to write computer software, particularly in fields like data science where data processing workflows are complex and often need much background information. I started to write about Literate Programming a few months ago, and now it is the time to formalize how I create Literate Programming applications.
This is the first post of a series of blog posts that will cover the full workflow. I will demonstrate how I do Literate Programming for developing a Clojure application, but exactly the same workflow would work for any other programming language supported by Org-mode (Python, R, etc.). The only thing that is required is to adapt the principles to the project structures in these other languages. The series of blog posts will cover:
- Project folder structure (this post)
- Anatomy of a Org-mode file
- Tangling all project files
- Publishing documentation in multiple formats
- Unit Testing
Clojure Project Folder Structure
The structure of a programming project can vary a lot. The structure I am using when developing Clojure is the one created by Leiningen which I use for creating and managing my Clojure projects. The structure of a simple project (in this case, the org-mode-clj-tests-utils project that I created for another blog post) looks like this:
- CHANGELOG.md - LICENCE - README.md - resources - pom.xml - project.clj - src - org_mode_clj_tests_utils - core.clj - target - test - org_mode_clj_tests_utils - core_test.clj
There are 4 main components to this structure:
project.cljfile which is used by Leiningen to configure the project
srcfolder where the project’s code files [to be compiled] are located
targetfolder is where the compiled files will be available, and
testfolder where the unit tests for the code sources are located
This kind of project outline is really simple and typical. Now let’s see what the structure would look like if this project would be created using Literate [Clojure] Programming.
Literate Clojure Folder Structure
The best way and cleanest way I found to create and manage the Org-mode files is to create a
org directory at the same level as the
src one. Then to replicate the same folder structure that exists in the
src folder. The names of the source files should be the same except that they have the
.org file extension. For example, the
src/core.clj file would become
org/core.org in the Org-mode folder, and the
org/core.org file is used to tangle (create) the
The new structure would look like that:
- CHANGELOG.md - LICENCE - README.md - resources - org - project.org - org_mode_clj_tests_utils - core.org - pom.xml - project.clj - src - org_mode_clj_tests_utils - core.clj - target - test - org_mode_clj_tests_utils - core_test.clj
The idea here is that all the files that needs to be modified related to the project would become a Org-mode file. Such files are the code source files, the test files, possible other documentation files and the
project.clj file. When the Org-mode files will be tangled, then all the appropriate files, required by the Clojure project would be generated.
Anything I am writing for this project comes from a Org-mode file. All the development occurs in Org-mode. If someone would want to modify such a Literate Clojure application, then they would have to modify the Org-mode file and not the source files otherwise the changes would be overwritten by the next tangling operation.
Utilities Org-mode Files
Finally, I created a series of Org-mode files that are used to perform special tasks such as:
- Tangling all project files at once, and
- Publishing documentation in multiple formats
These are Org-mode files that can be executed to perform these tasks. In the case of tangling all project files at once, it would be necessary to use it if you haven’t changed the behavior of your Emacs to automatically tangle files on save.
The second file is to publish
weaved documentation in multiple different formats (HTML, LaTeX, etc.) as required, all at once.
These two files are directly located into the
/org/ folder. I will explain how they work in a subsequent post in that series. The final structure of a Literate Clojure project is:
- CHANGELOG.md - LICENCE - README.md - resources - org - project.org - publish.org - tangle-all.org - setup.org - org_mode_clj_tests_utils - core.org - pom.xml - project.clj - src - org_mode_clj_tests_utils - core.clj - target - test - org_mode_clj_tests_utils - core_test.clj
As you can see, a Literate Clojure application is not much different. The way to program such an application is more profound than the small changes that occur at the level of the folder structure.
There is still an open question related to publishing this kind of Literate work on repositories such as Git: should only the
org folder be added to a Git repository, or should we also add the files that get tangled as well? In an ideal World, only the
org files would need to go into the repository. However, depending on the nature of work (work only accessible by you, or work accessible by a group of people that know Org-mode, or making the project public on GitHub, etc.) we may have to commit the tangled files too. In the case of an open source project, I think it is required since many people unfamiliar with Org-mode won’t be able to use the codebase because they won’t be able to tangle it from the Org files. For this specific reason, I tend to publish the
org files along with all the files that get tangled from them. That way I am sure that even if the users of the library doesn’t know anything about Org-mode or Literate Programming they could still use the code. The only thing I try to take care of is to commit the Org file and the tangled file related to a specific change in the same commit, and I try not to create two commits, one for each file.
The next blog post of that series will explain how the Org-mode source files are actually created, what is their internal structure, how they are organized and used.