Literate [Clojure] Programming: Publishing Documentation In Multiple Formats

This blog post is the sixth, and the last, of a series of blog posts about Literate [Clojure] Programming in Org-mode where I explain how I develop my [Clojure] applications using literate programming concepts and principles.

This last post introduce a tool that leverage the side effect of having all the codes neatly discussed: it gives the possibility to automatically generate all different kinds of project documentation with a single key-binding. The documentation that we will generate are:

  1. Human readable HTML general project documentation
  2. Programming API documentation
  3. Book style complete project documentation in PDF

This series of blog posts about literate [Clojure] programming in Org-mode is composed of the following articles:

  1. Configuring Emacs for Org-mode
  2. Project folder structure
  3. Anatomy of a Org-mode file
  4. Tangling all project files
  5. Publishing documentation in multiple formats (this post)
  6. Unit Testing

Continue reading “Literate [Clojure] Programming: Publishing Documentation In Multiple Formats”

Disambiguating KBpedia Knowledge Graph Concepts

One of the most important natural language processing tasks is to “tag” concepts in text. Tagging a concept means determining whether words or phrases in a text document matches any of the concepts that exist in some kind of a knowledge structure (such as a knowledge graph, an ontology, a taxonomy, a vocabulary, etc.). (BTW, a similar definition and process applies to tagging an entity.) What is usually performed is that the input text is parsed and normalized in some manner. Then all of the surface forms of the concepts within the input knowledge structure (based on their preferred and alternative labels) are matched against the words within the text. “Tagging” is when a match occurs between a concept in the knowledge structure and one of its surface forms in the input text.

But here is the problem. Given the ambiguous world we live in, often this surface form, which after all is only a word or phrase, may be associated with multiple different concepts. When we identify the surface form of “bank”, does that term refer to a financial institution, the shore of a river, a plane turning, or a pool shot? identical surface forms may refer to totally different concepts. Further, sometimes a single concept will be identified but it won’t be the right concept, possibly because the right one is missing from the knowledge structure or other issues.

A good way to view this problem of ambiguity is to analyze a random Web page using the Cognonto Demo online application. The demo usea the Cognonto Concepts Tagger service to tag all of the existing KBpedia knowledge graph concepts found in the target Web page. Often, when you analyze what has been tagged by the demo, you may see some of these amgibuities or wrongly tagged concepts yourself. For instance, check out this example. If you mouse over the tagged concepts, you will notice that many of the individual “tagged” terms refer to multiple KBpedia concepts. Clearly, in its basic form, this Cognonto demo is not disambiguating the concepts.

The purpose of this article is thus to explain how we can “disambiguate” (that is, suggest the proper concept from an ambiguous list) the concepts that have been tagged. What we will do is to show how we can leverage the KBpedia knowledge graph structure as-is to perform this disambiguation. What we will do is to create graph embeddings for each of the KBpedia concepts using the DeepWalk algorithm. Then we will perform simple linear algebra equations on the graph embeddings to determine if the tagged concept(s) is the right one given its context or not. We will test multiple different algorithms and strategies to analyze the impact on the overall disambiguation performance of the system.

Continue reading “Disambiguating KBpedia Knowledge Graph Concepts”

Literate [Clojure] Programming: Tangle All in Org-mode

This blog post is the fifth of a series of blog posts about Literate [Clojure] Programming in Org-mode where I explain how I develop my [Clojure] applications using literate programming concepts and principles.

This new blog post introduce a tool that is often necessary when developing literate applications using Org-mode: the tangle all script. As I explained in a previous blog post, doing literate programming is often like writing: you write something, you review and update it… often. This means that you may end-up changing multiple files in your Org-mode project. Depending how you configured you Emacs environment and Org-mode, you may have missed to tangle a file you changed that may cause issues down the road. This is the situation I will cover in this post.

This series of blog posts about literate [Clojure] programming in Org-mode is composed of the following articles:

  1. Configuring Emacs for Org-mode
  2. Project folder structure
  3. Anatomy of a Org-mode file
  4. Tangling all project files (this post)
  5. Publishing documentation in multiple formats
  6. Unit Testing

Continue reading “Literate [Clojure] Programming: Tangle All in Org-mode”

Literate [Clojure] Programming: Anatomy of a Org-mode file

This blog post is the second of a series of blog posts about Literate [Clojure] Programming where I explain how I develop my [Clojure] applications using literate programming concepts and principles. In the previous blog post I outlined a project’s structure. In this blog post I will demonstrate how I normally structure an Org-mode file to discuss the problem I am trying to solve, to code it and to test it.

One of the benefits of Literate Programming is that the tools that implement its concepts (in this case Org-mode) give to the developer the possibility to write its code in the order (normally more human friendly) he wants. This is one of the aspects I will cover in this article.

If you want to look at a really simple [Clojure] literate application I created for my Creating And Running Unit Tests Directly In Source Files With Org-mode blog post, take a look at the org-mode-clj-tests-utils (for the rendered version). It should give you a good example of what a literate file that follows the structure discussed here looks like.

This blog post belong to a series of posts about Literate [Clojure] Programming:

  1. Configuring Emacs for Org-mode
  2. Project folder structure
  3. Anatomy of a Org-mode file (this post)
  4. Tangling all project files
  5. Publishing documentation in multiple formats
  6. Unit Testing

Continue reading “Literate [Clojure] Programming: Anatomy of a Org-mode file”

Literate [Clojure] Programming Using Org-mode

Literate Programming is a great way to write computer software, particularly in fields like data science where data processing workflows are complex and often need much background information. I started to write about Literate Programming a few months ago, and now it is the time to formalize how I create Literate Programming applications.

This blog post belong to a series of posts about Literate [Clojure] Programming:

  1. Configuring Emacs for Org-mode
  2. Project folder structure (this post)
  3. Anatomy of a Org-mode file
  4. Tangling all project files
  5. Publishing documentation in multiple formats
  6. Unit Testing

Continue reading “Literate [Clojure] Programming Using Org-mode”