Extended KBpedia With Wikipedia Categories

A knowledge graph is an ever evolving structure. It needs to be extended to be able to cope with new kinds of knowledge; it needs to be fixed and improved in all kinds of different ways. It also needs to be linked to other sources of data and to other knowledge representations such as schemas, ontologies and vocabularies. One of the core tasks related to knowledge graphs is to extend its scope. This idea seems simple enough, but how can we extend a general knowledge graph that has nearly 40,000 concepts with potentially multiple thousands more? How can we do this while keeping it consistent, coherent and meaningful? How can we do this without spending undue effort on such a task? These are the questions we will try to answer with the methods we cover in this article.

The methods we are presenting in this article are how we can extend Cognonto‘s KBpedia Knowledge Graph using an external source of knowledge, one which has a completely different structure than KBpedia and one which has been built completely differently with a different purpose in mind than KBpedia. In this use case, this external resource is the Wikipedia categories structure. What we will show in this article is how we may automatically select the right Wikipedia categories that could lead to new KBpedia concepts. These selections are made using a SVM classifier trained over graph embedding vectors generated by a DeepWalk model based on the KBpedia Knowledge Graph structure linked to the Wikipedia categories. Once appropriate candidate categories are selected using this model, the results are then inspected by a human to take the final selection decisions. This semi-automated process takes 5% of the time it would normally take to conduct this task by comparable manual means.

Continue reading “Extended KBpedia With Wikipedia Categories”

Leveraging KBpedia Aspects To Generate Training Sets Automatically

In previous articles I have covered multiple ways to create training corpuses for unsupervised learning and positive and negative training sets for supervised learning 1 , 2 , 3 using Cognonto and KBpedia. Different structures inherent to a knowledge graph like KBpedia can lead to quite different corpuses and sets. Each of these corpuses or sets may yield different predictive powers depending on the task at hand.

So far we have covered two ways to leverage the KBpedia Knowledge Graph to automatically create positive and negative training corpuses:

  1. Using the links that exist between each KBpedia reference concept and their related Wikipedia pages
  2. Using the linkages between KBpedia reference concepts and external vocabularies to create training corpuses out of
    named entities.

Now we will introduce a third way to create a different kind of training corpus:

  1. Using the KBpedia aspects linkages.

Aspects are aggregations of entities that are grouped according to their characteristics different from their direct types. Aspects help to group related entities by situation, and not by identity nor definition. It is another way to organize the knowledge graph and to leverage it. KBpedia has about 80 aspects that provide this secondary means for placing entities into related real-world contexts. Not all aspects relate to a given entity.

[extoc]

Continue reading “Leveraging KBpedia Aspects To Generate Training Sets Automatically”

Dynamic Machine Learning Using the KBpedia Knowledge Graph – Part 2

In the first part of this series we found the good hyperparameters for a single linear SVM classifier. In part 2, we will try another technique to improve the performance of the system: ensemble learning.

So far, we already reached 95% of accuracy with some tweaking the hyperparameters and the training corpuses but the F1 score is still around ~70% with the full gold standard which can be improved. There are also situations when precision should be nearly perfect (because false positives are really not acceptable) or when the recall should be optimized.

Here we will try to improve this situation by using ensemble learning. It uses multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. In our examples, each model will have a vote and the weight of the vote will be equal for each mode. We will use five different strategies to create the models that will belong to the ensemble:

  1. Bootstrap aggregating (bagging)
  2. Asymmetric bagging 1
  3. Random subspace method (feature bagging)
  4. Asymmetric bagging + random subspace method (ABRS) 1
  5. Bootstrap aggregating + random subspace method (BRS)

Different strategies will be used depending on different things like: are the positive and negative training documents unbalanced? How many features does the model have? etc. Let’s introduce each of these different strategies.

Note that in this article I am only creating ensembles with linear SVM learners. However an ensemble can be composed of multiple different kind of learners, like SVM with non-linear kernels, decisions trees, etc. However, to simplify this article, we will stick to a single linear SVM with multiple different training corpuses and features.

[extoc]

Continue reading “Dynamic Machine Learning Using the KBpedia Knowledge Graph – Part 2”

Dynamic Machine Learning Using the KBpedia Knowledge Graph – Part 1

In my previous blog post, Create a Domain Text Classifier Using Cognonto, I explained how one can use the KBpedia Knowledge Graph to automatically create positive and negative training corpuses for different machine learning tasks. I explained how SVM classifiers could be trained and used to check if an input text belongs to the defined domain or not.

This article is the first of two articles.In first part I will extend on this idea to explain how the KBpedia Knowledge Graph can be used, along with other machine learning techniques, to cope with different situations and use cases. I will cover the concepts of feature selection, hyperparameter optimization, and ensemble learning (in part 2 of this series). The emphasis here is on the testing and refining of machine learners, versus the set up and configuration times that dominate other approaches.

Depending on the domain of interest, and depending on the required precision or recall, different strategies and techniques can lead to better predictions. More often than not, multiple different training corpuses, learners and hyperparameters need to be tested before ending up with the initial best possible prediction model. This is why I will strongly emphasize the fact that the KBpedia Knowledge Graph and Cognonto can be used to automate fully the creation of a wide range of different training corpuses, to create models, to optimize their hyperparameters, and to evaluate those models.

[extoc]

Continue reading “Dynamic Machine Learning Using the KBpedia Knowledge Graph – Part 1”

Building and Maintaining the KBpedia Knowledge Graph

The Cognonto demo is powered by an extensive knowledge graph called the KBpedia Knowledge Graph, as organized according to the KBpedia Knowledge Ontology (KKO). KBpedia is used for all kinds of tasks, some of which are demonstrated by the Cognonto use cases. KBpedia powers dataset linkage and mapping tools, machine learning training workflows, entity and concept extractions, category and topic tagging, etc.

The KBpedia Knowledge Graph is a structure of more than 39,000 reference concepts linked to 6 major knowledge bases and 20 popular ontologies in use across the Web. Unlike other knowledge graphs that analyze big corpuses of text to extract “concepts” (n-grams) and their co-occurrences, KBpedia has been created, is curated, is linked, and evolves using humans for the final vetting steps. KBpedia and its build process is thus a semi-automatic system.

The challenge with such a project is to be able to grow and refine (add or remove relations) within the structure without creating unknown conceptual issues. The sheer combinatorial scope of KBpedia means it is not possible for a human to fully understand the impact of adding or removing a relation on its entire structure. There is simply too much complexity in the interaction amongst the reference concepts (and their different kinds of relations) within the KBpedia Knowledge Graph.

What I discuss in this article is how Cognonto creates and then constantly evolves the KBpedia Knowledge Graph. In parallel with our creating KBpedia over the years, we also have needed to develop our own build processes and tools to make sure that every time something changes in KBpedia’s structure that it remains satisfiable and coherent.

Continue reading “Building and Maintaining the KBpedia Knowledge Graph”