I am proud to announce the immediate release of the KBpedia Knowledge Graph version 1.40
. This new version of the knowledge graph includes 53,739
concepts which is 14,687
more than with the previous version. It also includes 251,848
new alternative labels for 20,538
previously existing concepts in the version 1.20
, and 542
new definitions.
This new version of KBpedia will have an impact on multiple different knowledge graph related tasks such as concepts and entities tagging and most of the existing Cognonto use cases. I will be discussing these updates and their effects on the use cases in a forthcoming series of blog posts.
But the key topic of this current blog post is this: How have we been able to increase the coverage of the KBpedia Knowledge Graph by 37.6%
while keeping it consistent
(that is, there are no contradictory facts) and satisfiable
(that is, checks to see if the candidate addition violates any existing class disjointness assertions), all within roughly a single month of FTE effort?
Reciprocal Mapping: Leverage Linkages
At the core of the KBpedia Knowledge Graph are hundreds of thousands of class links between the KBpedia concepts and external core and extended data sources. To generate the initial mappings, we use the Cognonto Mapper to find potential linkage matches between the tens of thousands of KBpedia concepts and the millions of entities that exists in its external sources. We then finally vet the narrowed candidate pool of assignment by hand.
Then, through reciprocal mapping
(see related article), we leverage these initial mappings: 1) to find more alternative labels and definitions for existing KBpedia concepts; and, more importantly, 2) to extend the scope and coverage of the KBpedia Knowledge Graph structure.
Extending KBpedia’s Coverage
Extending the scope and the coverage of a knowledge graph structure that contains tens of thousands of consistent and satisfiable classes is not a simple thing to do, particularly when we want to improve its coverage by more than 37%
while trying to keep it consistent and satisfiable.
We have been able to achieve these aims with this new version 1.40
of the KBpedia Knowledge Graph by:
- Leveraging existing linkages between KBpedia concepts and Wikipedia categories
- Leveraging the inner structure of KBpedia using graph embeddings.
The Wikipedia Categories structure is a more-or-less consistent taxonomy used to categorize Wikipedia pages. Its structure is quite dissimilar than the KBpedia Knowledge Graph structure. However, even if the structure is dissimilar, the categories themselves can be used to extend knowledge areas not currently existing in KBpedia.
As explained in the Cognonto use case on Extending KBpedia With Wikipedia Categories, what first created graph embeddings for each of the Wikipedia categories. Then we created a classifier where the training positive examples are the Wikipedia category graph embeddings already linked to KBpedia concepts, and where the false training examples are other Wikipedia categories graph embeddings that are known to be bad candidates to create new KBpedia concepts.
Once the classifier is trained, we classify every sub-category of Wikipedia categories linked to KBpedia using that model. When the classification is done, a person reviews all of the positive classifications to determine the final candidates that will become new KBpedia concepts.
Once we have the list of vetted KBpedia concepts that we want to add to the core structure, we then use the KBpedia Generator to create a new KBpedia structure and to make sure that all the facts we have added to the Knowledge Graph are consistent and satisfiable. Inconsistencies and unsatisfiable class issues are fixed in an iterative process until the entire KBpedia Knowledge Graph structure is fully consistent and satisfiable with prior knowledge. It is only at this point that we can now release the new version of KBpedia. Without having the KBpedia Generator and its reasoning capabilities to check the consistency and the satisfiability of the knowledge structure, we simply could not extend the structure without adding hundreds of inconsistency and unsatisfiability issues. The number of relationships between all the concepts is simply too big to understanding all its ramifications simply by looking at it.
We estimate that this semi-automated process takes about 5% of the time it would normally take to conduct this entire process by comparable manual means. We know, since we have been doing the manual approach for nearly a decade.
Adding Alternative Labels and Definition
Another thing we gain by leveraging the external linkages of KBpedia is that we can use it to extend descriptions of existing KBpedia concepts since all the linkages are of high quality, both because they have been reviewed by a human and because they have been made consistent and satisfiable by the generation framework.
Reciprocal mapping shows how we can leverage the Wikipedia pages linkages. Via this rather efficient method, we also added 251,848
new alternative labels and 542
new definitions for 20,538
previously existing concepts in the version 1.20
. Adding that many alternative labels to the knowledge graph greatly improves the coverage of the Cognonto conceptual tagger by adding 251,848
new surface forms that were previously unknown.