After More Than 10 years In Business

I delayed this blog post for far too long.

Almost exactly one year ago I had to take a heartbreaking decision for myself and for my long term business partner and friend Mike Bergman. I had to stop working on our business projects, Structured Dynamics and Cognonto such that can restart bringing incoming for my family.

For more than ten years Mike and I had the good fortune to be able to spend all our time working together on all kind of interesting projects and doing really mind challenging research into the field of the Semantic Web, and more recently, its applications to the field of the Artificial Intelligence.

Our last business project was a company called Cognonto, and more precisely its huge knowledge graph called KBpedia. This project, just like anything we did prior to it, was everything about research and development, trying to push ideas, concepts and principles to the market. We put all our thoughts, energy, time and [monetary] resources in pursuing our goals.

The problem is that we have never been able to monetize this new endeavor unlike the other projects we created in the previous decade.

After more than a year and a new baby boy, after spending all the resources I had available for the project I had to take a decision for me and my growing family.  I had to seek for a new job.

That is why I have been silent for so long. I had to reorganize my time after being self employed for about fifteen years and dealing with two young boys.

However, I had the good fortune to be contacted by Curbside about at the same time that I took that decision to seek for a new job. I now lead  the creation, design and development of their internal machine learning environment.

Good fortune being what it is, I really do enjoy my new work, the importance of it, the design and research I am putting in it and the wonderful team that I am part of.

We dissolved the Structure Dynamics company a few months ago. However not everything is done. Mike is working on a really personal and important project related to KBpedia for which we have important announcements to do in the coming months.

Lastly, I would like to take the time to say thank you to Mike Bergman for all the time we spend together working on those wonderful semantic web and knowledge base representation projects together. All those daily chats and calls we had discussing and arguing  to advance our ideas. And for all those things you taught me about research methodology, business and life. I owe you much my friend!

KBpedia Knowledge Graph 1.40: Extended Using Machine Learning

I am proud to announce the immediate release of the KBpedia Knowledge Graph version 1.40. This new version of the knowledge graph includes 53,739 concepts which is 14,687 more than with the previous version. It also includes 251,848 new alternative labels for 20,538 previously existing concepts in the version 1.20, and 542 new definitions.

This new version of KBpedia will have an impact on multiple different knowledge graph related tasks such as concepts and entities tagging and most of the existing Cognonto use cases. I will be discussing these updates and their effects on the use cases in a forthcoming series of blog posts.

But the key topic of this current blog post is this: How have we been able to increase the coverage of the KBpedia Knowledge Graph by 37.6% while keeping it consistent (that is, there are no contradictory facts) and satisfiable (that is, checks to see if the candidate addition violates any existing class disjointness assertions), all within roughly a single month of FTE effort?

Continue reading “KBpedia Knowledge Graph 1.40: Extended Using Machine Learning”

Measuring the Influence of Expanded Knowledge Graphs on Machine Learning

Mike Bergman and I will release a new version 1.40 of the KBpedia Knowledge Graph in the coming month. This new version of the knowledge graph will include roughly 15,000 new concepts and 150,000 new alternative labels and 5,000 new definitions for existing KBpedia reference concepts. This new release will substantially increase the size of the current KBpedia Knowledge Graph.

This extension is based on a new methodology that we began to cover in the Extending KBpedia With Wikipedia Categories Cognonto use case. The extension uses graph embeddings for each KBpedia reference concept and its linkage to the Wikipedia category structure to pre-select the Wikipedia categories that are most likely to be good candidates to fill [current gaps] in the KBpedia graph structure. The new reference concept candidates scored through this automated process were then reviewed for likely selection. These selections were then analyzed by re-generating the KBpedia Knowledge Graph, which includes routines for identifying, reporting and fixing consistency and coherency issues using the KBpedia Generator. Problematic assignments are either dropped or fixed. These steps reflect the general process Cognonto follows in mapping and incorporating new schema and ontologies.

In the coming month or two, I will write a series of blog posts that will analyze the impact of different important versions of KBpedia on different machine learning models that we have previously created for the Cognonto use cases. All of the current use cases have been created using version 1.20 of KBpedia. We are about to finalize the creation of an intermediate version 1.30 (for internal analysis only). We are separately identifying thousands of reference concepts that will be temporarily removed, since they are more properly characterized as ‘aspects‘ and not true sub-classes. This removal will allow us to then define a third variant for machine learning comparisons. Some of these ‘aspects’ will be re-introduced into the graph where proper parent-child relationships can be established. The next public release of KBpedia, tentatively identified as The version 1.40, will include all of these updates.

Each of these three variants (versions 1.20, 1.30 and 1.40) will enable us to analyze and report on the influence that different version of the KBpedia knowledge graph can have on different machine learning tasks. The following tasks will be covered:

  1. Creating graph embeddings to disambiguate tagged concepts
  2. Creating domain specific training corpuses to train word embeddings
  3. Creating domain specific training sets to classify text, and
  4. Checking relatedness between Knowledge Graph concepts and Wikipedia categories based on their graph embeddings.

Our goal at Cognonto is to make available the power of knowledge-based artificial intelligence (KBAI) to any organization. Whether if it is for help populating search or tagging indexes, for performing semantic query expansion, or for help with a broad series of machine learning tasks, knowledge graphs plus KBAI provide a nearly automated way for doing so. Our research and expertise is geared toward creating, linking, extending, and leveraging knowledge graphs and knowledge bases to empower new and existing systems. We will continue to report in specific detail how and with what impact knowledge graphs and knowledge bases lead to better machine learning results.

Extended KBpedia With Wikipedia Categories

A knowledge graph is an ever evolving structure. It needs to be extended to be able to cope with new kinds of knowledge; it needs to be fixed and improved in all kinds of different ways. It also needs to be linked to other sources of data and to other knowledge representations such as schemas, ontologies and vocabularies. One of the core tasks related to knowledge graphs is to extend its scope. This idea seems simple enough, but how can we extend a general knowledge graph that has nearly 40,000 concepts with potentially multiple thousands more? How can we do this while keeping it consistent, coherent and meaningful? How can we do this without spending undue effort on such a task? These are the questions we will try to answer with the methods we cover in this article.

The methods we are presenting in this article are how we can extend Cognonto‘s KBpedia Knowledge Graph using an external source of knowledge, one which has a completely different structure than KBpedia and one which has been built completely differently with a different purpose in mind than KBpedia. In this use case, this external resource is the Wikipedia categories structure. What we will show in this article is how we may automatically select the right Wikipedia categories that could lead to new KBpedia concepts. These selections are made using a SVM classifier trained over graph embedding vectors generated by a DeepWalk model based on the KBpedia Knowledge Graph structure linked to the Wikipedia categories. Once appropriate candidate categories are selected using this model, the results are then inspected by a human to take the final selection decisions. This semi-automated process takes 5% of the time it would normally take to conduct this task by comparable manual means.

Continue reading “Extended KBpedia With Wikipedia Categories”

Create a Domain Text Classifier Using Cognonto

A common task required by systems that automatically analyze text is to classify an input text into one or multiple classes. A model needs to be created to scope the class (what belongs to it and what does not) and then a classification algorithm uses this model to classify an input text.

Multiple classification algorithms exists to perform such a task: Support Vector Machine (SVM), K-Nearest Neigbours (KNN), C4.5 and others. What is hard with any such text classification task is not so much how to use these algorithms: they are generally easy to configure and use once implemented in a programming language. The hard – and time-consuming – part is to create a sound training corpus that will properly define the class you want to predict. Further, the steps required to create such a training corpus must be duplicated for each class you want to predict.

Since creating the training corpus is what is time consuming, this is where Cognonto provides its advantages.

In this article, we will show you how Cognonto’s KBpedia Knowledge Graph can be used to automatically generate training corpuses that are used to generate classification models. First, we define (scope) a domain with one or multiple KBpedia reference concepts. Second, we aggregate the training corpus for that domain using the KBpedia Knowledge Graph and its linkages to external public datasets that are then used to populate the training corpus of the domain. Third, we use the Explicit Semantic Analysis (ESA) algorithm to create a vectorial representation of the training corpus. Fourth, we create a model using (in this use case) an SVM classifier. Finally, we predict if an input text belongs to the class (scoped domain) or not.

This use case can be used in any workflow that needs to pre-process any set of input texts where the objective is to classify relevant ones into a defined domain.

Unlike more traditional topic taggers where topics are tagged in an input text with weights provided for each of them, we will see how it is possible to use the semantic interpreter to tag main concepts related to an input text even if the surface form of the topic is not mentioned in the text. We accomplish this by leveraging ESA’s semantic interpreter.

[extoc]

Continue reading “Create a Domain Text Classifier Using Cognonto”