OrgWeb: CLI Org-Mode Environment for WEB like development without Emacs

There is a wide range of tools and framework currently available for doing literate programming development. You have the ancestors like CWEB, NOWEB and nuweb. You have full editors like Leo. And then you have more modern approaches like nbdev, PyWebTool and FSharp.Formatting

However, most of them are specific to a programming language. Some of them are general like NOWEB, but they are lacking some kind of integrations in modern IDE environments.

For the last 8 years, I always fallback to the same: Org-Mode.


Org-mode is many things, but its most interesting feature has always been its code blocks to me. Org’s syntax is clean and powerful. Org is not specific to a particular programming languages: it supports tens of programming languages or other kind of configuration/scripting languages. Code blocks can be executed, tangled or weaved. 

Its drawback: the best (and frankly only) Org-mode implementation is in Emacs. Some, myself included, will say it is great because we love working with Emacs, and are happily willing to pay the cost. But we are not the norm, but the exception. Emacs is wonderfully different and it doesn’t appeal to all developers. I can understand that in today’s industry where the only incentive is to ship, ship, ship features.

But, how could we get the best of Org-mode without having to force people to use Emacs? One possibility could be to develop Org-mode plugins for other IDEs, the first on the list would most likely be VS Code. But this is not a small undertaking.

Org-mode CLI

There are some modules existing in other IDEs that support org-mode like on Vim, VS Code, etc. But those are mostly syntax highlighter, or implement some features mostly related to org-agenda and headings manipulation. It is a good start, but far from enough for a literate programming framework.

The goal of OrgWeb is to develop a simple tool that any developer could use to leverage the full power of doing literate programming using Org-mode and their preferred IDE. 


OrgWeb is a simple CLI tool that can be installed using this command:

pip install orgweb

The tool only has four commands:

  1. tangle: extract code from code blocks into their source files
  2. detangle: sync source files back to their original Org-mode code blocks
  3. execute: execute code blocks such that they produce their side effects
  4. monitor: monitor local file system to tangle/detangle files automatically

The tangle, detangle and execute commands can be performed on a folder (recursively) or one or multiple specific files.

Note: I am not covering all the details of how we can use Org mode to do literate programming. You can search my blog which has plenty of posts about that, but also refer to the Org-mode documentation to read about all and every features available to you.

In addition to the orgweb CLI, you will need Docker available in your environment. If it is not already installed, you can follow those instructions to install it on your system

VS Code + Org-mode

For this blog post, I will cover how Org-mode can be used in conjunction with VS Code to develop an application using literate programming. To start, you can simply clone OrgWeb’s repository, install this Org-mode module in VS Code. The general development layout is:

In the bottom window, this is where we have the terminal instances. This is where OrgWeb commands happens. In the main edit window, this is where the Org files, or the tangled source files will be manipulated.

You can notice that the Org-mode VS code module does some basic syntax highlighting, even within the code blocks using Python’s syntax highlighter. This is far than enough to easily understand and follow the Org files.


Tangling is the action of extracting code blocks from a literate file into its executable source code file.

Once ready to tangle the Org file, this command will tangle that specific file:

orgweb tangle . --file

It asks OrgWeb to tangle the current directory . but to only tangle the file. It will find all the Org files recursively, and tangle only the ones specified. If no files are specified, it will tangle all the Org files it finds. Then the file will be generated from all the code blocks from


Developers will often end-up working on the source files that have been generated from Org files. There are all kind of reasons for that, such as modifying a source file while debugging an application. When this happens, the literate Org files and the source files get desynchronized. Changes could be copy/paste to the Org files, but there is a much easier way to do it: detangling.

Detangling synchronize back any tangled code blocks from source files to their original Org file:

orgweb detangle . --file

It asks OrgWeb to detangle the current directory . but to only detangle the file. It will find all the Org files recursively, and tangle only the ones specified. Then the file will be generated from all the code blocks from


The execute command is like the tangling command but instead of moving code in source files, it does execute the code blocks that needs to be executed. Code blocks that get executed produces side effects. It is those side effects that we want to force with the execute command.

One example are the PlantUML code blocks in the file. When we execute them, the schema images will be generated.


This is all good, but it is still inconvenient to have to run commands in the terminal every time you want to tangle or detangle some files.

This is why there exists the monitor command:

The orgweb monitor . command will keep monitoring the specified folder. Every file that changes within that folder (recursively) will potentially be tangled or detangled by the running orgweb instance. If a .orgwebignore file exists in the target folder, then everything within that file will be ignores by the monitoring process.

In the envisioned development workflow, developers will simply run the monitoring in background such that every Org and source files automatically gets tangled and detangled every time they are saved. That way, developers will be sure that both files are always in sync.

How does it work?

As we know, orgweb is designed in a way that developers can use all the power of Org-mode, in any IDE they like, without having to rely on Emacs directly.

To do that, it leverages Docker to build an image where Emacs is properly installed and configured to implement the commands that are exposed via the command line tool.

If the orgweb docker image is not currently existing in the environment, then it will request Docker to build the image using the Dockerfile. The build process will install and configure all the components required to implement all orgweb commands.

orgweb check if it exists every time it is invoked from the command line. This process will happen any time that the image is not available in the environment.

If the image is existing in the environment, then the following will happen.

orgweb will ask Docker to create a container based on the image. Once the container is running, it will execute a command on the container’s terminal to run Emacs. Emacs is used directly from the command line by evaluating ELisp code it gets as input.

Every time a orgweb command line is executed, a new container is created and when the commands finishes, the container gets deleted:

Other possible avenues

OrgWeb works fine, but it won’t ever be as interesting as a proper IDE integration, like what is available in Emacs. Another interesting option worth investigating would be to use Emacs as a Org-mode backend of a LSP server. That way, IDE modules developers could more easily develop fully fledged Org-mode modules for specific IDE integration. That way, we could “easily” get the full Org-mode power within any IDE, being able to not only leverage code blocks but org-agenda, org-roam, tagging, date time, org-capture, etc.

Literate Programming at the dawn of LLMs

Since the beginning of the year, the industry’s main focus seems to revolve around “prompting.” We’ve seen the emergence of new job titles, new job descriptions, and even the introduction of “prompting wizards,” all of which are essentially part of branding and marketing strategies.

Prompting involves articulating a problem and providing clear instructions in the hope that the person or system reading it will produce the intended outcome. The recent shift lies in the recipient of these instructions: rather than a person taking action to solve the problem and follow the instructions, it’s now a thing (currently some form of AI model) that carries out the task.

What I find amusing, after 20 years of professional experience in software development and engineering management, is that we’re finally getting engineers to generate a substantial amount of text instead of solely focusing on writing code. This appears to signify quite a significant paradigm shift to me.

Prompting and Literate Programming

I recently had something of an epiphany while investigating the current state of Literate Programming: could Literate Programming not become a powerful software development paradigm with the advent of LLMs?

I mean, for 39 years, literate programming programmers have been essentially doing just this: “prompting” their software development. They have been describing their problems and outlining instructions before implementing the actual code, often in the format of a book or notebook. The only difference is that they were the ones doing 100% of the coding afterward (either themselves or with the help of an implementation development team).

Intuitively, it seems that this same format and these same skills are precisely what’s needed to best leverage LLMs in coding computer software. LLMs will undoubtedly become very effective tools, but they are just that: tools that need to be learned, experimented with, and mastered to extract the best results from them.

GitHub’s Copilot

In this blog post, I aim to explore how literate programming can influence and enhance the utilization of LLMs. The current leading LLM tool for software developers is undoubtedly GitHub’s Copilot, integrated into VS Code. It boasts three main features:

  1. Code completion
  2. Completions Panel (providing up to 10 distinct auto-completion suggestions)
  3. Chat (recently made available to the general public)

With all of these capabilities integrated into an IDE like VS Code, it forms a package that significantly accelerates the software development process.

The next question arises: will Copilot grasp, and potentially benefit from, the literate programming process in the suggestions it provides? This is what I’m aiming to explore – to observe how it reacts, what proves effective, and what may not.

To put it to the test, I’ve developed a straightforward command-line tool in Python designed to function as a basic calculator. The remainder of this post comprises a series of screenshots accompanied by my comments at each step.


Before diving in, Is still needed to create a new GitHub project, and to use nbdev_new to create a new nbdev project, and then to configure it.

Before starting to develop the CLI tool, I wanted to see if GitHub Copilot was self aware of its own capabilities:

It’s hard to discern from this interaction whether it’s generating content or not, but at the very least, it seems promising. Let’s see if we can further explore this level of contextual awareness.

The initial step I took was to compose the introduction for the tool, right here in this Jupyter notebook. It outlines the purpose of the tool and the extensive list of calculator operations we aim to implement. I obtained the imports from the prior interaction with Chat. I manually added typer as this is the library I intend to use for building the command-line utility.

Following that, I proceeded to discuss creating a Typer application and its functionalities, etc. In the subsequent code block, I deliberately refrained from writing anything, as I didn’t want Copilot to auto-generate code within this block. I was interested in evaluating if it had an understanding of the entire notebook’s context, not just within a specific code block. This is why I opened the Suggestions Panel to assess if it would suggest anything relevant given the current context.

What I received was particularly interesting, as the initial suggestion aligns perfectly with the next step. It overlooks the #| export nbdev instruction, but that’s perfectly acceptable, as it’s rather obscure.

Next, I began detailing the subsequent steps by creating a new Markdown cell. At this point, Copilot’s auto-completion capabilities come into play. This is particularly interesting, as it essentially anticipates what I was about to write, drawing from the extensive list of calculator commands I plan to implement. In this case, it starts with the first command on that list, which is addition. This suggests to me that it leverages the entire notebook as the context for its suggestions.

For context, here is the full list of operations we want to implement:

However, this was actually not the first command we wanted to implement. The first one we wanted to implement is the version of the command line tool that we display to the users if they ask for it.

Then the next step is to start implementing the long list of calculator operations, starting with addition:

Why was the quiet parameter suggested? To dig a bit further into its thought process, I decided to open the Completions Panel. Suggestion 3 sheds light on what it had in mind. However, for a basic calculator, this isn’t very useful since the outcome of adding two numbers is quite straightforward. I’ll go ahead and accept .

Now, let’s compile this command-line application to ensure it functions as intended:

By blindly accepting the code proposed by Copilot, here is how the add command works:

Let’s see if it works as intended:

Yes, it does. It’s not the most convenient method for adding two numbers; it’s a bit complex and verbose, but it will suffice for now.

Afterwards, I added the entire list of operators in the same manner, by appending code block after code block, and it successfully implemented each of them. There was a point around number 7 or 8 where it lost the order, but simply starting to type the right term got it back on track. For example, typing def si will continue with defining the Sin function accordingly. Here is the current list that has been implemented so far:

Adding Tests

Now that we have all these functions, I’d like to give Copilot a try at generating tests for each of them. To do this, I posed a very simple question to the newly generated release of Copilot Chat while having the 00_main.ipynb file open:

I would like to add tests for each of those commands.

By “those commands”, I was referring to what was currently displayed in the Workspace on my right, hoping that it would contextualize the request within the Workspace. The result Chat provided me with is:

It is even aware that it is missing some from the list described in the introduction and continue to list them starting at the right place (divide):

As you can see, it is fully aware of the context. It will produce one test per command, understanding that the commands print output to the terminal and that the functions do not return actual numbers. It will also attempt to use a CliRunner to execute the tests. While it doesn’t work out of the box, it’s certainly a step in the right direction.


This concludes the tests. It’s clear that Copilot is aware of a Workspace and contextualizes its suggestions accordingly. When working in a Jupyter notebook, it takes into account every code block.

This little experiment suggests to me that adopting a literate programming workflow and its principles can lead to better and more effective suggestions from LLMs like Copilot.

For thousands of years, humans have been expressing their thoughts in a sequential manner, from top to bottom. We’ve developed highly effective systems to organize these writings (you can explore the BIBO ontology for a glimpse into this). These systems have evolved and been refined up to the present day.

To me, this is the essence of Literate Programming. It’s about developing computer software in a more natural, thoughtful, and systematic human way.

Not many people in the industry share this perspective. However, what I’ve begun to explore in this blog post is how LLMs, along with integrated tools like GitHub’s Copilot, could potentially shift that perception. How Literate Programming could emerge as one of the top programming frameworks for effectively utilizing tools like Copilot.

Profiling Python Code in Jupyter while doing Literate Programming with nbdev

As you may know if you followed this blog in the last few weeks, I started to experiment doing literate programming in Python using nbdev. This means that most of the Python code I do today is first written in a Jupyter Notebook (in VSCode), and eventually get their ways into a .py module file.

Often time, I like to profile a function here and there to better understand where execution time is spent. I do this in my normal development process, without thinking about early optimization, but just to better understand how things works at that time.

This week I wanted to understand what would be the easiest way to quickly profile a function written in a Jupyter Notebook, without having to tangle the code blocks and work at the level of the .py module.

Line Profiler

The solution that worked best for me with my current workflow is to use the line_profiler Python library. I won’t go in details about how it works internally, but I will just show an example of how it can be used and expose the results.

Let’s start with the code. Here is a piece of code that I am currently working on, that I will release most likely next week, which is related to a small experiment that I am doing on the side.

What this code does is to read a RSS or Atom feed, from the local file system, parse it, and returns a feed namedtuple and a list of articles namedtuple. Subsequently, those will be used down the road to easily get into a SQLite database using executemany().

Each of those block are individual code block within the notebook, with explanatory text in between, which I omitted here.

from line_profiler import profile

def detect_language(text: str):
    """Detect the language of a given text"""

    # remove all HTML tags from text
    text = re.sub('<[^<]+?>', '', text)

    # remove all HTML entities from text
    text = re.sub('&[^;]+;', '', text)

    # remove all extra spaces
    text = ' '.join(text.split())

    # return if the text is too short
    if len(text) < 128:
        return ''

    # limit the text to 4096 characters to speed up the 
    # language detection processing
    text = text[:4096]

        lang = detect(text)
        # if langdetect returns an errors because it can't read the charset, 
        # simply return an empty string to indicate that we can't detect
        # the language
        return ''

    return lang
Feed = namedtuple('Feed', ['id', 'url', 'title', 'description', 'lang', 'feed_type'])
Article = namedtuple('Article', ['feed', 'url', 'title', 'content', 'creation_date', 'lang'])
def parse_feed(feed_path: str, feed_id: str):
    parsed = feedparser.parse(feed_path)

    feed_title = parsed.feed.get('title', '')
    feed_description = parsed.feed.get('description', '')

    feed = Feed(feed_id,
                parsed.feed.get('link', ''),
                detect_language(feed_title + feed_description),
                parsed.get('version', ''))

    articles = []
    for entry in parsed.entries:
        article_title = entry.get('title', '')
        article_content = entry.description if 'description' in entry else entry.content if 'content' in entry else ''
        articles.append(Article(entry.get('link', ''),
                                entry.published if 'published' in entry else,
                                detect_language(article_title + article_content)))
    return feed, articles

Let’s say that we want to profile the detect_language() function when calling the parse_feed() function. To do this, the first thing we did is to decorate the detect_language() function with the @profile decorator from from line_profiler import profile. Once this is done, we have to load the line_profiler external library using the %load_ext magic command in Jupyter. To do this, we simply have to create the following Python code block and execute the cell to load the module in the current running environment:

%load_ext line_profiler

Once it is loaded, we can create another Python code block that will execute the %lprun command which is specific to Jupyter:

%lprun -f detect_language parse_feed('/Users/frederickgiasson/.swfp/feeds/https---fgiasson-com-blog-index-php-feed-/13092023/feed.xml', 'https---fgiasson-com-blog-index-php-feed-')

Once this cell is executed, line_profiler will be executed and the profiling of the detect_language() function will occurs. Once finished, the following output will appears in the notebook:

Timer unit: 1e-09 s

Total time: 0.215358 s
File: /var/folders/pz/ntz31j490w950b6gn2g0j3nc0000gn/T/ipykernel_65374/
Function: detect_language at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
     3                                           @profile
     4                                           def detect_language(text: str):
     5                                               """Detect the language of a given text"""
     7                                               # remove all HTML tags from text
     8        11     136000.0  12363.6      0.1      text = re.sub('<[^<]+?>', '', text)
    10                                               # remove all HTML entities from text
    11        11      78000.0   7090.9      0.0      text = re.sub('&[^;]+;', '', text)
    13                                               # remove all extra spaces
    14        11     118000.0  10727.3      0.1      text = ' '.join(text.split())
    16                                               # return if the text is too short
    17        11      15000.0   1363.6      0.0      if len(text) < 128:
    18         1          0.0      0.0      0.0          return ''
    20                                               # limit the text to 4096 characters to speed up the 
    21                                               # language detection processing
    22        10      12000.0   1200.0      0.0      text = text[:4096]
    24        10       6000.0    600.0      0.0      try:
    25        10  214980000.0    2e+07     99.8          lang = detect(text)
    26                                               except:
    27                                                   # if langdetect returns an errors because it can't read the charset, 
    28                                                   # simply return an empty string to indicate that we can't detect
    29                                                   # the language
    30                                                   return ''
    32        10      13000.0   1300.0      0.0      return lang

As we can see, most of the time spent is used detecting the language using langdetect.


It is as simple as that thanks to line_profiler which is just simple, effective and well integrated in Jupyter. This is perfect for quickly profiling some code on the fly.

ReadNext 0.0.4: Local Embedding Model

I just release ReadNext version 0.0.4. The primary goal of this new version is to remove the dependency on the Cohere Embedding web service endpoint by using a local embedding model by default. To enable that, ReadNext got integrated with Hugging Face and is currently uses the BAAI/bge-base-en model.

Local vs. Remote

This new change remove dependency on one external service which makes it more stable. The processing time is a little bit longer with the local model, but it also depends on the capabilities of your local computer.

In terms of performance, the two systems are comparable. In my experience, about 80% of the propositions are the same, and the remaining 20% that are different yeld no major difference in accuracy. However, I do prefer the BAAI/bge-base-en propositions a little better for what I experienced so far.

You may want to experiment with both to see what works best for you. The only thing you have to do is to change the EMBEDDING_SYSTEM environment variable and to reload your terminal instance.

New Configurations

Two new configuration options have been added to this version:

  1. EMBEDDING_SYSTEM: This is the embedding system you want to use. One of: BAAI/bge-base-en(local) or cohere.
  2. MODELS_PATH: This is the local path where you want the models files to be saved on your local file system (ex: /Users/me/.readnext/models/)

If you already have ReadNext installed on your computer, please make sure to add those two new environment variables to you environment.

New Commands

Two new commands have been added as well. They have been added to help understanding the current status of the ReadNext tool. Those two commands are:

  1. readnext version: this gives the version of ReadNext that you are currently using
  2. readnext config: this gives the configuration parameters, and their values, currently used to run that instance of ReadNext

Literate Programming

While at it, I decided to migrate ReadNext’s Python codebase to use nbdev to continue its development using literate programming

All the literate files (notebooks in this case) where the code is tangled and the documentation weaved from are accessible in the nbs folder. The tangled codebase is available in the readnext folder. Finally, the weaved documentation is available as GitHub pages here.

Literate Programming in Python using NBDev

Donald Knuth considered that, of all his work on typography, the idea of literate programming had the greatest impact on him. This is a strong and profound statement that seems to be underestimated by history.

Literate programming has grown on me in such a way that I now have a hard time developing in a framework that is not literate. I need to be able to organize my ideas, my code, and its documentation the way I want, not in the way the programming language or library designers intend. I need that flexibility flexibility to be as effective as possible in my work; otherwise, I feel that something is missing.

Since 2016, I have been practicing literate programming using Org-Mode within Emacs. As of today, I have not yet found another tool as powerful as Org-Mode within Emacs for developing literate applications. It employs a simple plain text format with clean markup, making it easy to commit and suitable for peer review. However, when used in Emacs/Org-Mode and enhanced with Babel, developers end up with one of the most robust notebook systems imaginable, capable of facilitating effective literate programming.

However, the challenge lies in the tooling, particularly Emacs. I have been fortunate enough to build teams that worked with Emacs, allowing us to undertake projects in a literate manner. Yet, this was the exception rather than the norm.


I recently invested time in exploring the latest developments in the Literate Programming tooling space. I aimed to find a solution that would bring me closer to the experience of Org-mode + Emacs, but without the friction associated with Emacs for general developers.

In 2016, all my development work was conducted in Clojure. Clojure developers naturally gravitated toward Emacs due to Cider. Nowadays, I work extensively with Python and configuration files. Consequently, I began researching the current state of the literate programming ecosystem. My search began with two keywords: Python and VS Code.

This research led me to discover a relatively new project (initiated a few years ago) called nbdev, developed by (Jeremy Howard, Hamel Husain, and a few other contributors).

nbdev is an incredibly intriguing project. It leverages several existing open-source projects to build a new literate programming framework from the ground up: it employs Jupyter notebooks as the format for writing software (in contrast to a plain text format like Org-Mode). The Quarto tool is used to generate documentation from the codebase. Additionally, nbdev provides a range of tools for running tests, creating vanilla GitHub projects with built-in actions for automated deployment, and more. Due to its reliance on Jupyter, this literate workflow is Python-centric and can be developed using a simple browser or VS Code, complemented by the constantly improving Jupyter extension. There’s even an experimental nbdev extension available.

For this blog post, I will convert the en-fr-translation-service project I recently blogged about to use nbdev. Finally, based on my experience with Org-mode, I will propose some potential improvements to the project.

Creating a Vanilla nbdev (Notebook Dev) Project

The first step is to create a new vanilla literate-en-fr-translation-service GitHub repository and follow nbdev‘s End-to-End Walkthrough to create the literate version of the project. After installing jupyterlab, nbdev, and Quarto, I cloned the new repository locally and executed this command in my terminal to initialize the nbdev project:


This command generated several new files in the repository:

  • .github/workflows: two GitHub actions
  • literate_en_fr_translation_service/: New module
  • nbs: where all literate notebook files reside
  • settings.ini: nbdev’s core settings file
  • …and various other auto-generated files

Once the nbdev vanilla project is complete, simply commit and push the changes to the GitHub repository:

git add .
git commit -m'Initial commit'
git push

After pushing the changes to the repository, the final step is to enable pages in your GitHub repository. Then you can verify the proper functioning of your workflows.

Development Process

The literate programming development process is straightforward yet requires a mindset shift. In the following sections, I will focus on nbdev’s specific process, which is not substantially different from other literate programming frameworks.

The entire application is developed directly within Jupyter notebooks. Each notebook defines both the application’s code and its documentation. When preparing the application, the documentation will be weaved from the Jupyter notebook and hosted as a set of GitHub Pages. Subsequently, the code will be tangled into source code files within the module’s folder:

Documentation is intertwined among code boxes, and each code box has tangling instructions (indicating whether it should be part of the codebase or documentation, etc.). All the nbdev directives are accessible here.

The first step involves writing the nbs/index.ipynb file, which serves as the project’s readme. It introduces the project’s purpose, usage instructions, and more. This file becomes the initial page of your documentation.

Next, start organizing your application into different parts. In nbdev, a part is equivalent to a chapter, and a chapter is numbered. This numbering is a naming convention specific to nbdev. For our simple application, we’ll create two chapters: nbs/00_download_models.ipynb and nbs/01_main.ipynb. As you can see, the files are prefixed with numbers, acting as “chapter numbers.” These numbers help order the generated documentation’s index and provide clarity regarding the repository’s file flow.

The final step is to write each of these notebooks, focusing on both documentation (the why) and code (the how). This will be the focus of the upcoming sections.

Developing en-fr-translation-service as literate-en-fr-translation-service

The first step I took was to copy over the requirements.txt and Dockerfile to the root of the repository. Since nbdev currently only supports Python files, only that part of the application will be literate (more about this limitation later). The only change required is adjusting the paths of some files in the Dockerfile because nbdev creates a module for our application:

COPY literate_en_fr_translation_service/ .
COPY literate_en_fr_translation_service/ .


The initial step is to create the index.ipynb file. This serves as the entry point for the generated documentation and also becomes the file of the repository after running the nbdev_readme command.

This file is a simple Jupyter notebook containing a single Markdown cell where we provide an introduction to the project.


The next step involves creating the 00_download_models.ipynb file. This file contains all the code and documentation related to downloading the ML models required for the translation service. Since the first task the Docker container performs upon running is downloading the translation model artifacts, I’ve prefixed the file with 00_ to signify it as the first chapter of the application.

At the top of the file, a Markdown cell should be created for the default_ext directive. This directive informs nbdev which module file the code from subsequent export and exports directives should be woven into:

#| default_exp download_models

In this case, all code from subsequent Python cells will be placed in the literate_en_fr_translation_service/ file.

Next, we add the import statements:

#| exports
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import os

The difference between export and exports is that exports exports the code to both the code file and the documentation (the code will be displayed in a code box in the documentation). In contrast, export only adds the code to the code file and won’t appear in the documentation. For this case, we want the exports to be displayed in the documentation.

Following this, we define the download_models() function:

#| export
def download_model(model_path: str, model_name: str):
    """Download a Hugging Face model and tokenizer to the specified directory"""
    # Check if the directory already exists
    if not os.path.exists(model_path):

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

    # Save the model and tokenizer to the specified directory

In this case, we don’t intend for the code to appear in the documentation. Here, nbdev will document the function in textual form without directly including the code in the documentation.

Finally, we proceed to download the actual model artifacts:

#| exports
#| eval: false

download_model('models/en_fr/', 'Helsinki-NLP/opus-mt-en-fr')
download_model('models/fr_en/', 'Helsinki-NLP/opus-mt-fr-en')

This last code block is an interesting one that shows the flexibility of the code block directives, and their importance in the development flow. 

First, we do export the code to the codebase, and we show the two line of code in the documentation to help the user to understand how it works. But then we added an eval: false directive. Why? This is used to tell nbdev to not evaluate this code block when it tangles and weave the notebook file. Otherwise, this code would be executed, and the models artifacts would be downloaded which would add a lot of processing time and spend unnecessary bandwidth on the network. However, we want this code to appear in the codebase since the container will run that file to initialize the service with all the right models artifacts.

The result is a very simple and clean notebook that is easy to understand:


The subsequent chapter is the core file of the translation service. It’s where the web service endpoints are defined, model file selection occurs, and the service’s entry point is specified. 

You can access the notebook here to see the result. I won’t elaborate on each section since the directives used are the same as in the previous chapter.

However, one difference lies in the addition of tests after the endpoint creation:

assert is_translation_supported('en', 'fr')
assert is_translation_supported('fr', 'en')
assert not is_translation_supported('en', 'es')
assert not is_translation_supported('es', 'en')

Those assertions are defined in their own code block. This demonstrates a crucial aspect of literate programming that I wrote about in 2016. This kind of workflow enables developers to:

  1. Create a series of unit tests directly where it matters (right below the function to test).
  2. Run the tests when it matters (continuously while developing or improving the tested function).

The developer can run that code cell within the Jupyter notebook to ensure that what they just wrote is functioning as expected. They can also execute the nbdev_test command-line application to run all the tests of an nbdev application. Finally, it will also be picked up by the tests GitHub workflow. This aspect of the development process is extremely important and powerful.

Everything is contextualized in the same place; there’s no need to look at 2 or 3 different places. This makes PR reviews much more effective for the reviewer: the documentation, the code, and its tests will all appear more or less on the same screen. If any of those elements are missing, the reviewer can easily address it in a comment.


So, what does it look like in the end? Here are the references to each component of the literate application:

Possible nbdev Improvements

The team has done excellent work with nbdev. I can clearly sense the same literate process that I experienced using Org-mode+Emacs, but with a completely different toolbox, which is refreshing to experience!

Here is a series of potential improvements I considered while testing nbdev. These could eventually become proposed PRs for the project when I find the time to work on them.

Save Jupyter notebook as Markdown or py:percent instead of JSON

Since I used Org-Mode, I believe that all notebook formats should be plain text with some markup. One issue I have with Jupyter is its default serialization format, a very complex and large JSON file.

While not a problem itself, it becomes one when reviewing notebook PRs. Therefore, whenever I had developers working with Jupyter notebooks, I always asked them to export their notebooks to Markdown or py:percent formats before committing to GitHub. This way, the notebook can be easily diffed on GitHub, and inline comments from PR reviewers can be added. Without this, you’d need to use a service like ReviewNB, which adds unnecessary complexity in my opinion.

I suggest that nbdev could leverage Jupyter’s internal Markdown export functionality to export each chapter into its own Markdown or py:percent file, which would then be part of the literate GitHub repository.

Another possibility without touching anything to the nbdev workflow could be using jupytext to manage the synchronization.

Add .ipynb Files to .gitignore

Assuming nbdev exports all notebooks as Markdown or py:percent files, I would consider adding .ipynb files to the repository’s .gitignore. This simplifies the repository’s content (containing only plain text files) and avoids duplicates. This is possible since Markdown files can be used to recreate the original JSON Jupyter files.

Ignore All Files Generated by a Notebook During Export

If all notebooks are in Markdown format, there’s no need to commit all the exported content to the repository either.

Since everything is in these notebook files, any developer can generate all the artifacts by:

  1. cloning the repository
  2. exporting the notebook files

This would generate all the necessary files for the application’s functionality. The advantage is a streamlined repository with a collection of literate notebooks.

Support Beyond Python

This is where Org-Mode+Emacs shines. In a single notebook, I could incorporate code from various languages and formats, such as Clojure, bash curl commands, JSON outputs, Dockerfile, etc. This flexibility was possible due to Babel.

It might be possible to achieve this in Jupyter (consider jp-babel), or even in VS Code’s Jupyter extension. Nevertheless, nbdev would need updates to enable this.

Currently, nbdev assumes everything is Python. This is why the directives like #| export foo create a file in the module’s folder.

My proposal is for the export and exports directives to accept a path/file as a value, rather than a string used to create the target path and file. This would make the directive more verbose, yet considerably more flexible.

If it worked this way, I could have all my Python code interwoven into one or multiple places in the repository. Additionally, in the same notebook file, I could have multiple code blocks for creating my Dockerfile, which would then export to /Dockerfile in the repository. I would treat the Dockerfile like any other code source in my project.

This aspect is crucial to me, particularly for Machine Learning projects, as they often involve diverse configuration files (Docker, Terraform, etc.) that should be managed in a literate framework, similar to traditional source code files.

This aspect is more important than having a Babel in Jupyter (and we are lucky since it is way simpler to implement!)

New export-test directive

Having tests in the notebooks, along side the code it tests is very valuable. However, I would think they should be tangled as well, just like any other piece of the code base. We could think about different design, two that come in mind are:

  1. If export and exports end-up supporting a path/file argument, then we would use that new behaviour to specify where the tests goes (i.e. /tests/
  2. A new directive like export-test could be created where the test would be created in the /tests/ folder like: /tests/test_[default_ext].py

I think I prefer (1) since it is more flexible and could be used for other scenarios, like the ones mentioned above.


Lastly, I’ve compiled a list of excellent references about nbdev for anyone interested in trying it out: