{"id":3512,"date":"2016-11-21T06:14:06","date_gmt":"2016-11-21T11:14:06","guid":{"rendered":"http:\/\/fgiasson.com\/blog\/?p=3512"},"modified":"2016-11-18T08:15:32","modified_gmt":"2016-11-18T13:15:32","slug":"leveraging-kbpedia-aspects-to-generate-training-sets-automatically","status":"publish","type":"post","link":"https:\/\/fgiasson.com\/blog\/index.php\/2016\/11\/21\/leveraging-kbpedia-aspects-to-generate-training-sets-automatically\/","title":{"rendered":"Leveraging KBpedia Aspects To Generate Training Sets Automatically"},"content":{"rendered":"<p>In previous articles I have covered multiple ways to create training corpuses for unsupervised learning and positive and negative training sets for supervised learning <sup><a id=\"fnr.1\" class=\"footref\" href=\"#fn.1\">1<\/a><\/sup> <sup>, <\/sup><sup><a id=\"fnr.2\" class=\"footref\" href=\"#fn.2\">2<\/a><\/sup> <sup>, <\/sup><sup><a id=\"fnr.3\" class=\"footref\" href=\"#fn.3\">3<\/a><\/sup> using <a href=\"http:\/\/cognonto.com\">Cognonto<\/a> and KBpedia. Different structures inherent to a knowledge graph like KBpedia can lead to quite different corpuses and sets. Each of these corpuses or sets may yield different predictive powers depending on the task at hand.<\/p>\n<p>So far we have covered two ways to leverage the KBpedia Knowledge Graph to automatically create positive and negative training corpuses:<\/p>\n<ol class=\"org-ol\">\n<li>Using the links that exist between each KBpedia reference concept and their related Wikipedia pages<\/li>\n<li>Using the linkages between KBpedia reference concepts and external vocabularies to create training corpuses out of<br \/>\nnamed entities.<\/li>\n<\/ol>\n<p>Now we will introduce a third way to create a different kind of training corpus:<\/p>\n<ol class=\"org-ol\">\n<li>Using the KBpedia aspects linkages.<\/li>\n<\/ol>\n<p><code>Aspects<\/code> are aggregations of entities that are grouped according to their characteristics different from their direct types. Aspects help to group related entities by situation, and not by identity nor definition. It is another way to organize the knowledge graph and to leverage it. KBpedia has about <a href=\"http:\/\/cognonto.com\/docs\/80-aspects\">80 aspects<\/a> that provide this secondary means for placing entities into related real-world contexts. Not all aspects relate to a given entity.<\/p>\n<p>[extoc]<\/p>\n<p><!--more--><\/p>\n<h3>Creating New Domain Using KBpedia Aspects<\/h3>\n<p>To continue with the musical domain, there exists two aspects of interest:<\/p>\n<ol class=\"org-ol\">\n<li>Music<\/li>\n<li>Genres<\/li>\n<\/ol>\n<p>What we will do first is to query the KBpedia Knowledge Graph using the<a href=\"https:\/\/www.w3.org\/TR\/sparql11-query\/\">SPARQL query language<\/a> to get the list of all of the KBpedia reference concepts that are related to the <code>Music<\/code> or the <code>Genre<\/code> aspects. Then, for each of these reference concepts, we will count the number of named entities that can be reached in the complete KBpedia structure.<\/p>\n<pre class=\"example\">prefix kko: &lt;http:\/\/kbpedia.org\/ontologies\/kko#&gt;\nprefix rdfs: &lt;http:\/\/www.w3.org\/2000\/01\/rdf-schema&gt;\nprefix rdf: &lt;http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#&gt;\nprefix dcterms: &lt;http:\/\/purl.org\/dc\/terms\/&gt; \nprefix schema: &lt;http:\/\/schema.org\/&gt;\n\nselect distinct ?class count(distinct ?entity) as ?nb\nfrom &lt;http:\/\/dbpedia.org&gt;\nfrom &lt;http:\/\/www.uspto.gov&gt;\nfrom &lt;http:\/\/wikidata.org&gt;\nfrom &lt;http:\/\/kbpedia.org\/1.10\/&gt;\nwhere\n{\n  ?entity dcterms:subject ?category .\n\n  graph &lt;http:\/\/kbpedia.org\/1.10\/&gt;\n  {\n    {?category &lt;http:\/\/kbpedia.org\/ontologies\/kko#hasMusicAspect&gt; ?class .}\n    union\n    {?category &lt;http:\/\/kbpedia.org\/ontologies\/kko#hasGenre&gt; ?class .}\n  }\n}\norder by desc(?nb)\n<\/pre>\n<table border=\"2\" frame=\"hsides\" rules=\"groups\" cellspacing=\"0\" cellpadding=\"6\">\n<colgroup>\n<col class=\"org-left\" \/>\n<col class=\"org-right\" \/> <\/colgroup>\n<thead>\n<tr>\n<th class=\"org-left\" scope=\"col\">reference concept<\/th>\n<th class=\"org-right\" scope=\"col\">nb<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Album-CW<\/td>\n<td class=\"org-right\">128772<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Song-CW<\/td>\n<td class=\"org-right\">74886<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Music<\/td>\n<td class=\"org-right\">51006<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Single<\/td>\n<td class=\"org-right\">50661<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/RecordCompany<\/td>\n<td class=\"org-right\">5695<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition<\/td>\n<td class=\"org-right\">5272<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/MovieSoundtrack<\/td>\n<td class=\"org-right\">2919<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Lyric-WordsToSong<\/td>\n<td class=\"org-right\">2374<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Band-MusicGroup<\/td>\n<td class=\"org-right\">2185<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Quartet-MusicalPerformanceGroup<\/td>\n<td class=\"org-right\">2078<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Ensemble<\/td>\n<td class=\"org-right\">1438<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Orchestra<\/td>\n<td class=\"org-right\">1380<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Quintet-MusicalPerformanceGroup<\/td>\n<td class=\"org-right\">1335<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Choir<\/td>\n<td class=\"org-right\">754<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Concerto<\/td>\n<td class=\"org-right\">424<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Symphony<\/td>\n<td class=\"org-right\">299<\/td>\n<\/tr>\n<tr>\n<td class=\"org-left\">http:\/\/kbpedia.org\/kko\/rc\/Singing<\/td>\n<td class=\"org-right\">154<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Seventeen KBpedia reference concepts are related to the two aspects we want to focus on. The next step is to take these 17 reference concepts and to create a new domain corpus with them. We will use the new version of KBpedia to create the full set of reference concepts that will scope our domain by inference.<\/p>\n<p>Next we will try to use this information to create two totally different kinds of training corpuses:<\/p>\n<ol class=\"org-ol\">\n<li>One that will rely on the links between the reference concepts and Wikipedia pages<\/li>\n<li>One that will rely on the linkages to external vocabularies to create a list of named entities that will be used as<br \/>\nthe training corpus<\/li>\n<\/ol>\n<div id=\"outline-container-org0580aad\" class=\"outline-3\">\n<h3 id=\"org0580aad\">Creating Model With Reference Concepts<\/h3>\n<div id=\"text-org0580aad\" class=\"outline-text-3\">\n<p>The first training corpus we want to test is one that uses the linkage between KBpedia reference concepts and Wikipedia pages. The first thing is to generate the domain training corpus with the <code>17<\/code> seed reference concepts and then to infer other related reference concepts.<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>use '<span style=\"color: #66d9ef;\">cognonto-esa.core<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>require '<span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #66d9ef;\">cognonto-owl.core<\/span> <span style=\"color: #ae81ff;\">:as<\/span> owl<span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>require '<span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #66d9ef;\">cognonto-owl.reasoner<\/span> <span style=\"color: #ae81ff;\">:as<\/span> reasoner<span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n\n<span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #f92672;\">def<\/span> <span style=\"color: #fd971f;\">kbpedia-manager<\/span> <span style=\"color: #66d9ef;\">(<\/span><span style=\"color: #66d9ef;\">owl<\/span><span style=\"color: #66d9ef;\">\/<\/span>make-ontology-manager<span style=\"color: #66d9ef;\">)<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #f92672;\">def<\/span> <span style=\"color: #fd971f;\">kbpedia<\/span> <span style=\"color: #66d9ef;\">(<\/span><span style=\"color: #66d9ef;\">owl<\/span><span style=\"color: #66d9ef;\">\/<\/span>load-ontology <span style=\"color: #e6db74;\">\"resources\/kbpedia_reference_concepts_linkage.n3\"<\/span>\n                                <span style=\"color: #ae81ff;\">:manager<\/span> kbpedia-manager<span style=\"color: #66d9ef;\">)<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #f92672;\">def<\/span> <span style=\"color: #fd971f;\">kbpedia-reasoner<\/span> <span style=\"color: #66d9ef;\">(<\/span><span style=\"color: #66d9ef;\">reasoner<\/span><span style=\"color: #66d9ef;\">\/<\/span>make-reasoner kbpedia<span style=\"color: #66d9ef;\">)<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #f92672;\">define-domain-corpus<\/span> <span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Album-CW\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Song-CW\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Music\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Single\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/RecordCompany\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MovieSoundtrack\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Lyric-WordsToSong\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Band-MusicGroup\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Quartet-MusicalPerformanceGroup\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Ensemble\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Orchestra\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Quintet-MusicalPerformanceGroup\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Choir\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Symphony\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Singing\"<\/span>\n                       <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Concerto\"<\/span><span style=\"color: #66d9ef;\">]<\/span>\n  kbpedia\n  <span style=\"color: #e6db74;\">\"resources\/aspects-concept-corpus-dictionary.csv\"<\/span>\n  <span style=\"color: #ae81ff;\">:reasoner<\/span> kbpedia-reasoner<span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #ae81ff;\">(<\/span>create-pruned-pages-dictionary-csv <span style=\"color: #e6db74;\">\"resources\/aspects-concept-corpus-dictionary.csv\"<\/span>\n                                    <span style=\"color: #e6db74;\">\"resources\/aspects-concept-corpus-dictionary.pruned.csv\"<\/span> \n                                    <span style=\"color: #e6db74;\">\"resources\/aspects-corpus-normalized\/\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<p>Once pruned, we end-up with a domain which has <code>108<\/code> reference concepts which will enable us to create models with 108 features. The next step is to create the actual semantic interpreter and the SVM models:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #75715e; font-style: italic;\">;; <\/span><span style=\"color: #75715e; font-style: italic;\">Load dictionaries<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>load-dictionaries <span style=\"color: #e6db74;\">\"resources\/general-corpus-dictionary.pruned.csv\"<\/span> <span style=\"color: #e6db74;\">\"resources\/aspects-concept-corpus-dictionary.pruned.csv\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #75715e; font-style: italic;\">;; <\/span><span style=\"color: #75715e; font-style: italic;\">Create the semantic interpreter<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>build-semantic-interpreter <span style=\"color: #e6db74;\">\"aspects-concept-pruned\"<\/span> <span style=\"color: #e6db74;\">\"resources\/semantic-interpreters\/aspects-concept-pruned\/\"<\/span> <span style=\"color: #66d9ef;\">(<\/span>distinct <span style=\"color: #a6e22e;\">(<\/span>concat <span style=\"color: #e6db74;\">(<\/span>get-domain-pages<span style=\"color: #e6db74;\">)<\/span> <span style=\"color: #e6db74;\">(<\/span>get-general-pages<span style=\"color: #e6db74;\">)<\/span><span style=\"color: #a6e22e;\">)<\/span><span style=\"color: #66d9ef;\">)<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #75715e; font-style: italic;\">;; <\/span><span style=\"color: #75715e; font-style: italic;\">Build the SVM model vectors<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>build-svm-model-vectors <span style=\"color: #e6db74;\">\"resources\/svm\/aspects-concept-pruned\/\"<\/span> <span style=\"color: #ae81ff;\">:corpus-folder-normalized<\/span> <span style=\"color: #e6db74;\">\"resources\/aspects-corpus-normalized\/\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #75715e; font-style: italic;\">;; <\/span><span style=\"color: #75715e; font-style: italic;\">Train the linear SVM classifier<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>train-svm-model <span style=\"color: #e6db74;\">\"svm.aspects.concept.pruned\"<\/span> <span style=\"color: #e6db74;\">\"resources\/svm\/aspects-concept-pruned\/\"<\/span>\n                 <span style=\"color: #ae81ff;\">:weights<\/span> <span style=\"color: #ae81ff;\">nil<\/span>\n                 <span style=\"color: #ae81ff;\">:v<\/span> <span style=\"color: #ae81ff;\">nil<\/span>\n                 <span style=\"color: #ae81ff;\">:c<\/span> 1\n                 <span style=\"color: #ae81ff;\">:algorithm<\/span> <span style=\"color: #ae81ff;\">:l2l2<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<p>Then we have to evaluate this new model using the <a href=\"gold-standard-full.csv\">gold standard<\/a>:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>evaluate-model <span style=\"color: #e6db74;\">\"svm.aspects.concept.pruned\"<\/span> <span style=\"color: #e6db74;\">\"resources\/gold-standard-full.csv\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<pre class=\"example\">True positive:  28\nFalse positive:  0\nTrue negative:  923\nFalse negative:  66\n\nPrecision:  1.0\nRecall:  0.29787233\nAccuracy:  0.93510324\nF1:  0.45901638\n<\/pre>\n<p>Now let&#8217;s try to find better hyperparameters using grid search:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>svm-grid-search <span style=\"color: #e6db74;\">\"grid-search-aspects-concept-pruned-tests\"<\/span> \n                       <span style=\"color: #e6db74;\">\"resources\/svm\/aspects-concept-pruned\/\"<\/span> \n                       <span style=\"color: #e6db74;\">\"resources\/gold-standard-full.csv\"<\/span>\n                       <span style=\"color: #ae81ff;\">:selection-metric<\/span> <span style=\"color: #ae81ff;\">:f1<\/span>\n                       <span style=\"color: #ae81ff;\">:grid-parameters<\/span> <span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #a6e22e;\">{<\/span><span style=\"color: #ae81ff;\">:c<\/span> <span style=\"color: #e6db74;\">[<\/span>1 2 4 16 256<span style=\"color: #e6db74;\">]<\/span>\n                                          <span style=\"color: #ae81ff;\">:e<\/span> <span style=\"color: #e6db74;\">[<\/span>0.001 0.01 0.1<span style=\"color: #e6db74;\">]<\/span>\n                                          <span style=\"color: #ae81ff;\">:algorithm<\/span> <span style=\"color: #e6db74;\">[<\/span><span style=\"color: #ae81ff;\">:l2l2<\/span><span style=\"color: #e6db74;\">]<\/span>\n                                          <span style=\"color: #ae81ff;\">:weight<\/span> <span style=\"color: #e6db74;\">[<\/span>1 15 30<span style=\"color: #e6db74;\">]<\/span><span style=\"color: #a6e22e;\">}<\/span><span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<pre class=\"example\">{:gold-standard \"resources\/gold-standard-full.csv\"\n :selection-metric :f1\n :score 0.84444445 \n :c 1\n :e 0.001 \n :algorithm :l2l2\n :weight 30}\n<\/pre>\n<p>After running the grid search with these initial broad range values, we found a configuration that gives us <code>0.8444<\/code> for the <code>F1<\/code> score. So far, this score is the best to date we have gotten for the full gold standard<sup><a id=\"fnr.2.100\" class=\"footref\" href=\"#fn.2\">2<\/a><\/sup><sup>, <\/sup><sup><a id=\"fnr.3.100\" class=\"footref\" href=\"#fn.3\">3<\/a><\/sup>. Let&#8217;s see all of the metrics for this configuration:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>train-svm-model <span style=\"color: #e6db74;\">\"svm.aspects.concept.pruned\"<\/span> <span style=\"color: #e6db74;\">\"resources\/svm\/aspects-concept-pruned\/\"<\/span>\n                 <span style=\"color: #ae81ff;\">:weights<\/span> <span style=\"color: #66d9ef;\">{<\/span>1 30.0<span style=\"color: #66d9ef;\">}<\/span>\n                 <span style=\"color: #ae81ff;\">:v<\/span> <span style=\"color: #ae81ff;\">nil<\/span>\n                 <span style=\"color: #ae81ff;\">:c<\/span> 1 \n                 <span style=\"color: #ae81ff;\">:e<\/span> 0.001\n                 <span style=\"color: #ae81ff;\">:algorithm<\/span> <span style=\"color: #ae81ff;\">:l2l2<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #ae81ff;\">(<\/span>evaluate-model <span style=\"color: #e6db74;\">\"svm.aspects.concept.pruned\"<\/span> <span style=\"color: #e6db74;\">\"resources\/gold-standard-full.csv\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<pre class=\"example\">True positive:  76\nFalse positive:  10\nTrue negative:  913\nFalse negative:  18\n\nPrecision:  0.88372093\nRecall:  0.80851066\nAccuracy:  0.972468\nF1:  0.84444445\n<\/pre>\n<p>These results are also the best balance between <code>precision<\/code> and <code>recall<\/code> that we have gotten so far<sup><a id=\"fnr.2.100\" class=\"footref\" href=\"#fn.2\">2<\/a><\/sup><sup>, <\/sup><sup><a id=\"fnr.3.100\" class=\"footref\" href=\"#fn.3\">3<\/a><\/sup>. Better <code>precision<\/code> can be obtained if necessary but only at the expense of lower <code>recall<\/code>.<\/p>\n<p>Let&#8217;s take a look at the improvements we got compared to the previous training corpuses we had:<\/p>\n<ul class=\"org-ul\">\n<li>Precision: <code>+4.16%<\/code><\/li>\n<li>Recall: <code>+35.72%<\/code><\/li>\n<li>Accuracy: <code>+2.06%<\/code><\/li>\n<li>F1: <code>+20.63%<\/code><\/li>\n<\/ul>\n<p>This new training corpus based on the KBpedia aspects, after hyperparameter optimization, did increase all the metrics we calculate. The more stiking improvement is the <code>recall<\/code> which improved by more than <code>35%<\/code>.<\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-org9fc78f9\" class=\"outline-3\">\n<h3 id=\"org9fc78f9\">Creating Model With Entities<\/h3>\n<div id=\"text-org9fc78f9\" class=\"outline-text-3\">\n<p>The next training corpus we want to test is one that uses the linkage between KBpedia reference concepts and linked external vocabularies to get a series of linked named entities as the positive training set of for each of the features of the model.<\/p>\n<p>The first thing to do is to is to create the positive training set populated with named entities related to the reference concepts. We will get a random sample of ~50 named entities per reference concept:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>require '<span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #66d9ef;\">cognonto-rdf.query<\/span> <span style=\"color: #ae81ff;\">:as<\/span> query<span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>require '<span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #66d9ef;\">clojure.java.io<\/span> <span style=\"color: #ae81ff;\">:as<\/span> io<span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>require '<span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #66d9ef;\">clojure.data.csv<\/span> <span style=\"color: #ae81ff;\">:as<\/span> csv<span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>require '<span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #66d9ef;\">clojure.string<\/span> <span style=\"color: #ae81ff;\">:as<\/span> string<span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #f92672;\">defn<\/span> <span style=\"color: #a6e22e;\">generate-domain-by-rc<\/span>\n  <span style=\"color: #66d9ef;\">[<\/span>rc domain-file nb<span style=\"color: #66d9ef;\">]<\/span>\n  <span style=\"color: #66d9ef;\">(<\/span><span style=\"color: #f92672;\">with-open<\/span> <span style=\"color: #a6e22e;\">[<\/span>out-file <span style=\"color: #e6db74;\">(<\/span><span style=\"color: #66d9ef;\">io<\/span><span style=\"color: #66d9ef;\">\/<\/span>writer domain-file <span style=\"color: #ae81ff;\">:append<\/span> <span style=\"color: #ae81ff;\">true<\/span><span style=\"color: #e6db74;\">)<\/span><span style=\"color: #a6e22e;\">]<\/span>\n    <span style=\"color: #a6e22e;\">(<\/span><span style=\"color: #f92672;\">doall<\/span>\n     <span style=\"color: #e6db74;\">(<\/span><span style=\"color: #f92672;\">-&gt;&gt;<\/span> <span style=\"color: #fd971f;\">(<\/span><span style=\"color: #66d9ef;\">query<\/span><span style=\"color: #66d9ef;\">\/<\/span>select\n           <span style=\"color: #f92672;\">(<\/span>str <span style=\"color: #e6db74;\">\"prefix kko: &lt;<a href=\"http:\/\/kbpedia.org\/ontologies\/kko#\">http:\/\/kbpedia.org\/ontologies\/kko#<\/a>&gt;<\/span>\n<span style=\"color: #e6db74;\">                 prefix rdfs: &lt;<a href=\"http:\/\/www.w3.org\/2000\/01\/rdf-schema\">http:\/\/www.w3.org\/2000\/01\/rdf-schema<\/a>&gt;<\/span>\n<span style=\"color: #e6db74;\">                 prefix rdf: &lt;<a href=\"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#\">http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#<\/a>&gt;<\/span>\n\n<span style=\"color: #e6db74;\">                 select distinct ?entity<\/span>\n<span style=\"color: #e6db74;\">                 from &lt;<a href=\"http:\/\/dbpedia.org\">http:\/\/dbpedia.org<\/a>&gt;<\/span>\n<span style=\"color: #e6db74;\">                 from &lt;<a href=\"http:\/\/www.uspto.gov\">http:\/\/www.uspto.gov<\/a>&gt;<\/span>\n<span style=\"color: #e6db74;\">                 from &lt;<a href=\"http:\/\/wikidata.org\">http:\/\/wikidata.org<\/a>&gt;<\/span>\n<span style=\"color: #e6db74;\">                 from &lt;<a href=\"http:\/\/kbpedia.org\/1.10\/\">http:\/\/kbpedia.org\/1.10\/<\/a>&gt;<\/span>\n<span style=\"color: #e6db74;\">                 where<\/span>\n<span style=\"color: #e6db74;\">                 {<\/span>\n<span style=\"color: #e6db74;\">                   ?entity dcterms:subject ?category .<\/span>\n<span style=\"color: #e6db74;\">                   graph &lt;<a href=\"http:\/\/kbpedia.org\/1.10\/\">http:\/\/kbpedia.org\/1.10\/<\/a>&gt;<\/span>\n<span style=\"color: #e6db74;\">                   {<\/span>\n<span style=\"color: #e6db74;\">                     ?category ?aspectProperty &lt;\"<\/span> rc <span style=\"color: #e6db74;\">\"&gt; .<\/span>\n<span style=\"color: #e6db74;\">                   }<\/span>\n<span style=\"color: #e6db74;\">                 }<\/span>\n<span style=\"color: #e6db74;\">                 ORDER BY RAND() LIMIT \"<\/span> nb<span style=\"color: #f92672;\">)<\/span> kb-connection<span style=\"color: #fd971f;\">)<\/span>\n          <span style=\"color: #fd971f;\">(<\/span>map <span style=\"color: #f92672;\">(<\/span><span style=\"color: #f92672;\">fn<\/span> <span style=\"color: #ae81ff;\">[<\/span>entity<span style=\"color: #ae81ff;\">]<\/span>\n                 <span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #66d9ef;\">csv<\/span><span style=\"color: #66d9ef;\">\/<\/span>write-csv out-file <span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #a6e22e;\">[<\/span><span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #66d9ef;\">string<\/span><span style=\"color: #66d9ef;\">\/<\/span>replace <span style=\"color: #66d9ef;\">(<\/span><span style=\"color: #ae81ff;\">:value<\/span> <span style=\"color: #a6e22e;\">(<\/span><span style=\"color: #ae81ff;\">:entity<\/span> entity<span style=\"color: #a6e22e;\">)<\/span><span style=\"color: #66d9ef;\">)<\/span> <span style=\"color: #e6db74;\">\"http:\/\/dbpedia.org\/resource\/\"<\/span> <span style=\"color: #e6db74;\">\"\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n                                           <span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #66d9ef;\">string<\/span><span style=\"color: #66d9ef;\">\/<\/span>replace rc <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/\"<\/span> <span style=\"color: #e6db74;\">\"\"<\/span><span style=\"color: #ae81ff;\">)<\/span><span style=\"color: #a6e22e;\">]<\/span><span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span><span style=\"color: #f92672;\">)<\/span><span style=\"color: #fd971f;\">)<\/span><span style=\"color: #e6db74;\">)<\/span><span style=\"color: #a6e22e;\">)<\/span><span style=\"color: #66d9ef;\">)<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n\n<span style=\"color: #ae81ff;\">(<\/span><span style=\"color: #f92672;\">defn<\/span> <span style=\"color: #a6e22e;\">generate-domain-by-rcs<\/span> \n  <span style=\"color: #66d9ef;\">[<\/span>rcs domain-file nb-per-rc<span style=\"color: #66d9ef;\">]<\/span>\n  <span style=\"color: #66d9ef;\">(<\/span><span style=\"color: #f92672;\">with-open<\/span> <span style=\"color: #a6e22e;\">[<\/span>out-file <span style=\"color: #e6db74;\">(<\/span><span style=\"color: #66d9ef;\">io<\/span><span style=\"color: #66d9ef;\">\/<\/span>writer domain-file<span style=\"color: #e6db74;\">)<\/span><span style=\"color: #a6e22e;\">]<\/span>\n    <span style=\"color: #a6e22e;\">(<\/span><span style=\"color: #66d9ef;\">csv<\/span><span style=\"color: #66d9ef;\">\/<\/span>write-csv out-file <span style=\"color: #e6db74;\">[<\/span><span style=\"color: #fd971f;\">[<\/span><span style=\"color: #e6db74;\">\"wikipedia-page\"<\/span> <span style=\"color: #e6db74;\">\"kbpedia-rc\"<\/span><span style=\"color: #fd971f;\">]<\/span><span style=\"color: #e6db74;\">]<\/span><span style=\"color: #a6e22e;\">)<\/span>\n    <span style=\"color: #a6e22e;\">(<\/span><span style=\"color: #f92672;\">doseq<\/span> <span style=\"color: #e6db74;\">[<\/span>rc rcs<span style=\"color: #e6db74;\">]<\/span> <span style=\"color: #e6db74;\">(<\/span>generate-domain-by-rc rc domain-file nb-per-rc<span style=\"color: #e6db74;\">)<\/span><span style=\"color: #a6e22e;\">)<\/span><span style=\"color: #66d9ef;\">)<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #ae81ff;\">(<\/span>generate-domain-by-rcs <span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Concerto\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/DoubleAlbum-CW\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Psychedelic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Religious\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/PunkMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/BluesMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/HeavyMetalMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/PostPunkMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/CountryRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/BarbershopQuartet-MusicGroup\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/FolkMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Verse\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/RockBand\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Lyric-WordsToSong\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Refrain\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-GangstaRap\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Klezmer\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/HouseMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-AlternativeCountry\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/PsychedelicMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/ReggaeMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/AlternativeRockBand\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/AlternativeRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Trance\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Ensemble\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/RhythmAndBluesMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/NewAgeMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/RockabillyMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Blues\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Opera\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Choir\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/SurfMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Quintet-MusicalPerformanceGroup\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-JazzRock\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Country\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/CountryMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-PopRock\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Romantic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Recitative\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Chorus\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/FusionMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MovieSoundtrack\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/GreatestHitsAlbum-CW\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Christian\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/ClassicalMusic-Baroque\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-NewAge\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-TraditionalPop\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/TranceMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Celtic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/LoungeMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Reggae\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Baroque\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Trio-MusicalPerformanceGroup\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Symphony\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-RockAndRoll\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/PopRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/IndustrialMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/JazzMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalChord\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/ProgressiveRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/GothicMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/LiveAlbum-CW\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/NewWaveMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/NationalAnthem\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/OldieSong\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Song-Sung\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/RockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Aria\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Disco\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/GospelMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/BluegrassMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/FolkRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/RockAndRollMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Opera-CW\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/HitSong-CW\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Tune\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Quartet-MusicalPerformanceGroup\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/RapMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/RecordCompany\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-ACappella\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Electronica\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Music\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/GlamRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/LoveSong\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Gothic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MarchingBand\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Punk\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/BluesRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/TechnoMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/SoulMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/ChamberMusicComposition\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Requiem\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/ElectronicMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/CompositionMovement\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/StringQuartet-MusicGroup\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Riff\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Anthem\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/HardRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-BluesRock\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Cyberpunk\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Industrial\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Funk\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Album-CW\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/HipHopMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Single\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Singing\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/SwingMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Song-CW\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/SalsaMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Jazz\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/ClassicalMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MilitaryBand\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/SkaMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/Orchestra\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/GrungeRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/SouthernRockMusic\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/MusicalComposition-Ambient\"<\/span>\n                         <span style=\"color: #e6db74;\">\"http:\/\/kbpedia.org\/kko\/rc\/DiscoMusic\"<\/span><span style=\"color: #66d9ef;\">]<\/span> <span style=\"color: #e6db74;\">\"resources\/aspects-domain-corpus.csv\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<p>Next let&#8217;s create the actual positive training corpus and let&#8217;s normalize it:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>cache-aspects-corpus <span style=\"color: #e6db74;\">\"resources\/aspects-entities-corpus.csv\"<\/span> <span style=\"color: #e6db74;\">\"resources\/aspects-corpus\/\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<span style=\"color: #ae81ff;\">(<\/span>normalize-cached-corpus <span style=\"color: #e6db74;\">\"resources\/corpus\/\"<\/span> <span style=\"color: #e6db74;\">\"resources\/corpus-normalized\/\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<p>We end up with <code>22<\/code> features for which we can get named entities from the KBpedia Knowledge Base. These will be the 22 features of our model. The complete positive training set has 799 documents in it.<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>load-dictionaries <span style=\"color: #e6db74;\">\"resources\/general-corpus-dictionary.pruned.csv\"<\/span> <span style=\"color: #e6db74;\">\"resources\/aspects-entities-corpus-dictionary.pruned.csv\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #ae81ff;\">(<\/span>build-semantic-interpreter <span style=\"color: #e6db74;\">\"aspects-entities-pruned\"<\/span> <span style=\"color: #e6db74;\">\"resources\/semantic-interpreters\/aspects-entities-pruned\/\"<\/span> <span style=\"color: #66d9ef;\">(<\/span>distinct <span style=\"color: #a6e22e;\">(<\/span>concat <span style=\"color: #e6db74;\">(<\/span>get-domain-pages<span style=\"color: #e6db74;\">)<\/span> <span style=\"color: #e6db74;\">(<\/span>get-general-pages<span style=\"color: #e6db74;\">)<\/span><span style=\"color: #a6e22e;\">)<\/span><span style=\"color: #66d9ef;\">)<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #ae81ff;\">(<\/span>build-svm-model-vectors <span style=\"color: #e6db74;\">\"resources\/svm\/aspects-entities-pruned\/\"<\/span> <span style=\"color: #ae81ff;\">:corpus-folder-normalized<\/span> <span style=\"color: #e6db74;\">\"resources\/aspects-corpus-normalized\/\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n\n<span style=\"color: #ae81ff;\">(<\/span>train-svm-model <span style=\"color: #e6db74;\">\"svm.aspects.entities.pruned\"<\/span> <span style=\"color: #e6db74;\">\"resources\/svm\/aspects-entities-pruned\/\"<\/span>\n                 <span style=\"color: #ae81ff;\">:weights<\/span> <span style=\"color: #ae81ff;\">nil<\/span>\n                 <span style=\"color: #ae81ff;\">:v<\/span> <span style=\"color: #ae81ff;\">nil<\/span>\n                 <span style=\"color: #ae81ff;\">:c<\/span> 1\n                 <span style=\"color: #ae81ff;\">:algorithm<\/span> <span style=\"color: #ae81ff;\">:l2l2<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<p>Now let&#8217;s evaluate the model with default hyperparameters:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>evaluate-model <span style=\"color: #e6db74;\">\"svm.aspects.entities.pruned\"<\/span> <span style=\"color: #e6db74;\">\"resources\/gold-standard-full.csv\"<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<pre class=\"example\">True positive:  9\nFalse positive:  10\nTrue negative:  913\nFalse negative:  85\n\nPrecision:  0.47368422\nRecall:  0.095744684\nAccuracy:  0.906588\nF1:  0.15929204\n<\/pre>\n<p>Now let&#8217;s try to improve this F1 score using grid search:<\/p>\n<div class=\"org-src-container\">\n<pre class=\"src src-clojure\"><span style=\"color: #ae81ff;\">(<\/span>svm-grid-search <span style=\"color: #e6db74;\">\"grid-search-aspects-entities-pruned-tests\"<\/span> \n                 <span style=\"color: #e6db74;\">\"resources\/svm\/aspects-entities-pruned\/\"<\/span> \n                 <span style=\"color: #e6db74;\">\"resources\/gold-standard-full.csv\"<\/span>\n                 <span style=\"color: #ae81ff;\">:selection-metric<\/span> <span style=\"color: #ae81ff;\">:f1<\/span>\n                 <span style=\"color: #ae81ff;\">:grid-parameters<\/span> <span style=\"color: #66d9ef;\">[<\/span><span style=\"color: #a6e22e;\">{<\/span><span style=\"color: #ae81ff;\">:c<\/span> <span style=\"color: #e6db74;\">[<\/span>1 2 4 16 256<span style=\"color: #e6db74;\">]<\/span>\n                                    <span style=\"color: #ae81ff;\">:e<\/span> <span style=\"color: #e6db74;\">[<\/span>0.001 0.01 0.1<span style=\"color: #e6db74;\">]<\/span>\n                                    <span style=\"color: #ae81ff;\">:algorithm<\/span> <span style=\"color: #e6db74;\">[<\/span><span style=\"color: #ae81ff;\">:l2l2<\/span><span style=\"color: #e6db74;\">]<\/span>\n                                    <span style=\"color: #ae81ff;\">:weight<\/span> <span style=\"color: #e6db74;\">[<\/span>1 15 30<span style=\"color: #e6db74;\">]<\/span><span style=\"color: #a6e22e;\">}<\/span><span style=\"color: #66d9ef;\">]<\/span><span style=\"color: #ae81ff;\">)<\/span>\n<\/pre>\n<\/div>\n<pre class=\"example\">{:gold-standard \"resources\/gold-standard-full.csv\"\n:selection-metric :f1\n:score 0.44052863\n:c 4\n:e 0.001\n:algorithm :l2l2\n:weight 15}\n<\/pre>\n<p>We have been able to greatly improve the <code>F1<\/code> score by tweaking the hyperparameters, but the results are still disappointing. There are multiple ways to automatically generate training corpuses, but not all of them are born equal. This is why having a pipeline that can automatically create the training corpuses, optimize the hyperparameters and evaluate the models is more than welcome since this is the bulk of the time a data scientist has to spend to create his models.<\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-org8942a21\" class=\"outline-2\">\n<h2 id=\"org8942a21\">Conclusion<\/h2>\n<div id=\"text-org8942a21\" class=\"outline-text-2\">\n<p>After automatically creating multiple different positive and negative training sets, after testing multiple learning methods and optimizing hyperparameters, we found the best training sets with the best learning method and the best hyperparameter to create an initial, optimal, model that has an accuracy of <code>97.2%<\/code>, a precision of <code>88.4%<\/code>, a recall of<br \/>\n<code>80.9%<\/code> and overall F1 measure of <code>84.4%<\/code> on a gold standard created from real, random, pieces of news from different general and specialized news sites.<\/p>\n<p>The thing that is really interesting and innovative in this method is how a knowledge base of concepts and entities can be used to label positive and negative training sets to feed supervised learners and how the learner can perform well on totally different input text data (in this case, news articles). The same is true when creating training corpuses for unsupervised leaning<sup><a id=\"fnr.4\" class=\"footref\" href=\"#fn.4\">4<\/a><\/sup>.<\/p>\n<p>The most wonderful thing from an operational standpoint is that all of this searching, testing and optimizing can be performed by a computer automatically. The only tasks required by a human is to define the scope of a domain and to manually label a gold standard for performance evaluation and hyperparameters optimization.<\/p>\n<\/div>\n<\/div>\n<div id=\"footnotes\">\n<h2 class=\"footnotes\">Footnotes:<\/h2>\n<div id=\"text-footnotes\">\n<div class=\"footdef\">\n<p class=\"footpara\"><sup><a id=\"fn.1\" class=\"footnum\" href=\"#fnr.1\">1<\/a><\/sup><a href=\"https:\/\/fgiasson.com\/blog\/index.php\/2016\/10\/24\/create-a-domain-text-classifier-using-cognonto\/\">Create a Domain Text Classifier Using Cognonto<\/a><\/p>\n<\/div>\n<div class=\"footdef\">\n<p class=\"footpara\"><sup><a id=\"fn.2\" class=\"footnum\" href=\"#fnr.2\">2<\/a><\/sup><a href=\"https:\/\/fgiasson.com\/blog\/index.php\/2016\/11\/17\/dynamic-machine-learning-using-the-kbpedia-knowledge-graph-part-1\/\">Dynamic Machine Learning Using the KBpedia Knowledge Graph \u00e2\u20ac\u201c Part 1<\/a><\/p>\n<\/div>\n<div class=\"footdef\">\n<p class=\"footpara\"><sup><a id=\"fn.3\" class=\"footnum\" href=\"#fnr.3\">3<\/a><\/sup><a href=\"https:\/\/fgiasson.com\/blog\/index.php\/2016\/11\/17\/dynamic-machine-learning-using-the-kbpedia-knowledge-graph-part-2\/\">Dynamic Machine Learning Using the KBpedia Knowledge Graph \u00e2\u20ac\u201c Part 2<\/a><\/p>\n<\/div>\n<div class=\"footdef\">\n<p class=\"footpara\"><sup><a id=\"fn.4\" class=\"footnum\" href=\"#fnr.4\">4<\/a><\/sup><a href=\"https:\/\/fgiasson.com\/blog\/index.php\/2016\/09\/28\/using-cognonto-to-generate-domain-specific-word2vec-models\/\">Using Cognonto to Generate Domain Specific word2vec Models<\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In previous articles I have covered multiple ways to create training corpuses for unsupervised learning and positive and negative training sets for supervised learning 1 , 2 , 3 using Cognonto and KBpedia. Different structures inherent to a knowledge graph like KBpedia can lead to quite different corpuses and sets. Each of these corpuses or [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[293,287,84],"tags":[263,296,289,231],"class_list":["post-3512","post","type-post","status-publish","format-standard","hentry","category-artificial-intelligence","category-cognonto","category-semantic-web","tag-ai","tag-knowledgegraph","tag-machinelearning","tag-semanticweb"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/3512","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=3512"}],"version-history":[{"count":5,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/3512\/revisions"}],"predecessor-version":[{"id":3517,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/3512\/revisions\/3517"}],"wp:attachment":[{"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=3512"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=3512"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=3512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}