Cognonto, Artificial Intelligence, Clojure

Extended KBpedia With Wikipedia Categories

A knowledge graph is an ever evolving structure. It needs to be extended to be able to cope with new kinds of knowledge; it needs to be fixed and improved in all kinds of different ways. It also needs to be linked to other sources of data and to other knowledge representations such as schemas, ontologies and vocabularies. One of the core tasks related to knowledge graphs is to extend its scope. This idea seems simple enough, but how can we extend a general knowledge graph that has nearly 40,000 concepts with potentially multiple thousands more? How can we do this while keeping it consistent, coherent and meaningful? How can we do this without spending undue effort on such a task? These are the questions we will try to answer with the methods we cover in this article.

The methods we are presenting in this article are how we can extend Cognonto‘s KBpedia Knowledge Graph using an external source of knowledge, one which has a completely different structure than KBpedia and one which has been built completely differently with a different purpose in mind than KBpedia. In this use case, this external resource is the Wikipedia categories structure. What we will show in this article is how we may automatically select the right Wikipedia categories that could lead to new KBpedia concepts. These selections are made using a SVM classifier trained over graph embedding vectors generated by a DeepWalk model based on the KBpedia Knowledge Graph structure linked to the Wikipedia categories. Once appropriate candidate categories are selected using this model, the results are then inspected by a human to take the final selection decisions. This semi-automated process takes 5% of the time it would normally take to conduct this task by comparable manual means.

Continue reading

Literate Programming, Programming, Emacs, Clojure

Literate [Clojure] Programming: Tangle All in Org-mode

This blog post is the fifth of a series of blog posts about Literate [Clojure] Programming in Org-mode where I explain how I develop my [Clojure] applications using literate programming concepts and principles.

This new blog post introduce a tool that is often necessary when developing literate applications using Org-mode: the tangle all script. As I explained in a previous blog post, doing literate programming is often like writing: you write something, you review and update it… often. This means that you may end-up changing multiple files in your Org-mode project. Depending how you configured you Emacs environment and Org-mode, you may have missed to tangle a file you changed that may cause issues down the road. This is the situation I will cover in this post.

This series of blog posts about literate [Clojure] programming in Org-mode is composed of the following articles:

  1. Configuring Emacs for Org-mode
  2. Project folder structure
  3. Anatomy of a Org-mode file
  4. Tangling all project files (this post)
  5. Publishing documentation in multiple formats
  6. Unit Testing

Continue reading

Cognonto, Artificial Intelligence, Semantic Web, Clojure

Using Cognonto to Generate Domain Specific word2vec Models

word2vec is a two layer artificial neural network used to process text to learn relationships between words within a text corpus to create a model of all the relationships between the words of that corpus. The text corpus that a word2vec process uses to learn the relationships between words is called the training corpus.

In this article I will show you how Cognonto‘s knowledge base can be used to automatically create highly accurate domain specific training corpuses that can be used by word2vec to generate word relationship models. However you have to understand that what is being discussed here is not only applicable to word2vec, but to any method that uses corpuses of text for training. For example, in another article, I will show how this can be done with another algorithm called ESA (Explicit Semantic Analysis).

It is said about word2vec that “given enough data, usage and contexts, word2vec can make highly accurate guesses about a word’s meaning based on past appearances.” What I will show in this article is how to determine the context and we will see how this impacts the results.

Continue reading

Literate Programming, Programming, Clojure

Literate [Clojure] Programming: Anatomy of a Org-mode file

This blog post is the second of a series of blog posts about Literate [Clojure] Programming where I explain how I develop my [Clojure] applications using literate programming concepts and principles. In the previous blog post I outlined a project’s structure. In this blog post I will demonstrate how I normally structure an Org-mode file to discuss the problem I am trying to solve, to code it and to test it.

One of the benefits of Literate Programming is that the tools that implement its concepts (in this case Org-mode) give to the developer the possibility to write its code in the order (normally more human friendly) he wants. This is one of the aspects I will cover in this article.

If you want to look at a really simple [Clojure] literate application I created for my Creating And Running Unit Tests Directly In Source Files With Org-mode blog post, take a look at the org-mode-clj-tests-utils (for the rendered version). It should give you a good example of what a literate file that follows the structure discussed here looks like.

This blog post belong to a series of posts about Literate [Clojure] Programming:

  1. Configuring Emacs for Org-mode
  2. Project folder structure
  3. Anatomy of a Org-mode file (this post)
  4. Tangling all project files
  5. Publishing documentation in multiple formats
  6. Unit Testing

Continue reading

Literate Programming, Programming, Emacs, Clojure

Literate [Clojure] Programming Using Org-mode

Literate Programming is a great way to write computer software, particularly in fields like data science where data processing workflows are complex and often need much background information. I started to write about Literate Programming a few months ago, and now it is the time to formalize how I create Literate Programming applications.

This blog post belong to a series of posts about Literate [Clojure] Programming:

  1. Configuring Emacs for Org-mode
  2. Project folder structure (this post)
  3. Anatomy of a Org-mode file
  4. Tangling all project files
  5. Publishing documentation in multiple formats
  6. Unit Testing

Continue reading