- Frederick Giasson’s Weblog - http://fgiasson.com/blog -
Loading DBpedia into the Open Semantic Framework
Posted By Frederick Giasson On January 22, 2014 @ 3:03 pm In Open Semantic Framework,OSF for Drupal,OSF Web Services,OSF Widgets,Semantic Web,Structured Dynamics | No Comments
Then to make things faster, we used a EC2
c3.4xlarge server with
75G of disk space.
In this tutorial, we are not re-configuring any passwords or settings for this vanilla instance. However, if you are to create an instance of your own, you should read the Creating and Configuring an Amazon EC2 AMI OSF Instance  manual to configure it for you own purpose and to make it secure.
Note that most of the steps to load DBpedia into Virtuoso come from Jorn Hees’ article  about this subject.
Also note that you should make sure to path the files in the following 3 commits. These issues have been found while writing this blog post, and haven’t (yet) made it into the AMI we use here: 88d6f1a782744a62bf83d52eceff695e0fee773b , 1389744b7dbf8f755a1bb9be468b3c51df75d6d8  and 719b4a776d43345e73847e6c785a4e9964b83a1c 
The second step is to download all the DBpedia files that you want to use in your OSF instance. For this tutorial, we focus on the files where we can get the titles, abstracts, descriptions, all the mapped properties, the geolocalization of the entities, etc. You can download all these files by running the following commands:
The next step is to use the Virtuoso’s RDF Bulk Loader  to load all the DBpedia triples into Virtuoso. To do so, the first step we have to do is to create a new OSF dataset where the DBpedia entities will be indexed. To create the new dataset, we use the DMT  (Datasets Management Tool) to create it. Note that the DMT is already installed on that OSF AMI 3.0.
Then we have to create and configure the RDF Bulk Loader. The first step is to create the procedure file that will be used to import the tables and procedures into Virtuoso:
Then create a file called
VirtBulkRDFLoaderScript.vsql and add the following code in that new file:
Then we have to load it into Virtuoso using the following command:
Then we have to configure the RDF Bulk Loader. First enter in the
Then copy/paste the following SQL code into the
Then enter the
isql interface again:
And copy/paste the following SQL lines:
The next step is to properly configure the DMT to bulk load all the DBpedia entities into OSF.
Let’s step back, and explain what we are doing here. What we did with the steps above, is to use a fast method to import all the 3.5 million DBpedia records into Virtuoso. What we are doing now is to take these records, and to index them in the other underlying OSF systems (namely, the Solr full text search & faceting server). What the following steps will be doing is to load all these entities into the Solr index using the CRUD: Create web service endpoint. Once this step is finished, it means that all the DBpedia entities will be searchable and facetable using the OSF Search endpoint .
The first step is to edit the dmt.ini file to add information about the dataset to update:
Then add the following section at the end of the file:
Now we will cover a few more configurations that can be performed in order to improve the speed of the indexation into OSF. You can skip these additional configuration steps, but if you do so, do not index more than 200 records per slice.
First search and edit the
virtuoso.ini file. Then find the
ResultSetMaxRows setting and configure it for
Then we have to increase the maximum memory allocated for the CRUD: Create  web service endpoint. You have to edit the
Then check around line #17 and increase the memory (
Then we have to change the maximum number of URIs that the CRUD: Read  web service endpoint can get as input. By default it is 64, we will ramp it up to 500.
500 at line #25
before we start the process of importing the DBpedia dataset into OSF, we have to import the DBpedia Ontology  into OSF such that it uses what is defined in the ontology to optimally index the content into the Solr index. To import the ontology, we use the OMT  (Ontologies Management Tool).
This is the final step: importing the DBpedia dataset into the OSF full text search index (Solr). To do so, we will use the DMT (Datasets Management Tool) that we previously configured to fully index the DBpedia entities into OSF:
This process should take up to 24h with that kind of server.
At that point, the DBpedia dataset, composed of 3.5 million entities, is fully indexed into OSF. What that means is that all the 27 OSF web service endpoints  can be used to query, manipulate and use these millions of entities.
However, there is even much more that come out-of-the-box by having DBpedia loaded into OSF. In fact, as we will see in the next article, this means that DBpedia becomes readily available to Drupal 7 if the OSF for Drupal  module is installed on that Drupal 7 instance.
What that means is that the 3.5 million DBpedia entities can be searched via the Search API , can be manipulated via the Entity API , can be templated using the Drupal templating engine , etc. Then they can be searched and faceted directly on a map using the sWebMap OSF Widget . Then will be queriable via the OSF QueryBuilder  that can be used to create all kind of complex search queries. Etc.
All this out-of-the-box.
Article printed from Frederick Giasson’s Weblog: http://fgiasson.com/blog
URL to article: http://fgiasson.com/blog/index.php/2014/01/22/loading-dbpedia-into-the-open-semantic-framework/
URLs in this post:
 DBpedia: http://dbpedia.org
 Open Semantic Framework: http://opensemanticframework.org
 Drupal 7: http://drupal.org
 OSF Installer: http://wiki.opensemanticframework.org/index.php/OSF_Installer
 Creating and Configuring an Amazon EC2 AMI OSF Instance: http://wiki.opensemanticframework.org/index.php/Creating_and_Configuring_an_Amazon_EC2_AMI_OSF_Instance
 Jorn Hees’ article: http://joernhees.de/blog/2010/10/31/setting-up-a-local-dbpedia-mirror-with-virtuoso/
 88d6f1a782744a62bf83d52eceff695e0fee773b: https://github.com/structureddynamics/OSF-Datasets-Management-Tool/commit/88d6f1a782744a62bf83d52eceff695e0fee773b
 1389744b7dbf8f755a1bb9be468b3c51df75d6d8: https://github.com/structureddynamics/OSF-Datasets-Management-Tool/commit/1389744b7dbf8f755a1bb9be468b3c51df75d6d8
 719b4a776d43345e73847e6c785a4e9964b83a1c: https://github.com/structureddynamics/OSF-Web-Services/commit/719b4a776d43345e73847e6c785a4e9964b83a1c
 RDF Bulk Loader: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader
 DMT: http://wiki.opensemanticframework.org/index.php/Datasets_Management_Tool
 OSF Search endpoint: http://wiki.opensemanticframework.org/index.php/Search
 CRUD: Create: http://wiki.opensemanticframework.org/index.php/CRUD:_Read
 DBpedia Ontology: http://wiki.dbpedia.org/Ontology
 OMT: http://wiki.opensemanticframework.org/index.php/Ontologies_Management_Tool
 27 OSF web service endpoints: http://wiki.opensemanticframework.org/index.php/Introduction_to_OSF_Web_Services
 OSF for Drupal: https://drupal.org/project/osf
 Search API: https://drupal.org/project/search_api
 Entity API: https://drupal.org/project/entity
 Drupal templating engine: https://drupal.org/phptemplate
 sWebMap OSF Widget: http://wiki.opensemanticframework.org/index.php/SWebMap
 OSF QueryBuilder: http://wiki.opensemanticframework.org/index.php/Using_the_OSF_Query_Builder