Creating a Open Semantic Framework Instance
Then to make things faster, we used a EC2
c3.4xlarge server with
75G of disk space.
In this tutorial, we are not re-configuring any passwords or settings for this vanilla instance. However, if you are to create an instance of your own, you should read the Creating and Configuring an Amazon EC2 AMI OSF Instance manual to configure it for you own purpose and to make it secure.
Note that most of the steps to load DBpedia into Virtuoso come from Jorn Hees’ article about this subject.
Also note that you should make sure to path the files in the following 3 commits. These issues have been found while writing this blog post, and haven’t (yet) made it into the AMI we use here: 88d6f1a782744a62bf83d52eceff695e0fee773b, 1389744b7dbf8f755a1bb9be468b3c51df75d6d8 and 719b4a776d43345e73847e6c785a4e9964b83a1c
The second step is to download all the DBpedia files that you want to use in your OSF instance. For this tutorial, we focus on the files where we can get the titles, abstracts, descriptions, all the mapped properties, the geolocalization of the entities, etc. You can download all these files by running the following commands:
Loading DBpedia into Virtuoso
The next step is to use the Virtuoso’s RDF Bulk Loader to load all the DBpedia triples into Virtuoso. To do so, the first step we have to do is to create a new OSF dataset where the DBpedia entities will be indexed. To create the new dataset, we use the DMT (Datasets Management Tool) to create it. Note that the DMT is already installed on that OSF AMI 3.0.
Then we have to create and configure the RDF Bulk Loader. The first step is to create the procedure file that will be used to import the tables and procedures into Virtuoso:
Then create a file called
VirtBulkRDFLoaderScript.vsql and add the following code in that new file:
Then we have to load it into Virtuoso using the following command:
Then we have to configure the RDF Bulk Loader. First enter in the
Then copy/paste the following SQL code into the
Then enter the
isql interface again:
And copy/paste the following SQL lines:
Configure the Datasets Management Tool
The next step is to properly configure the DMT to bulk load all the DBpedia entities into OSF.
Let’s step back, and explain what we are doing here. What we did with the steps above, is to use a fast method to import all the 3.5 million DBpedia records into Virtuoso. What we are doing now is to take these records, and to index them in the other underlying OSF systems (namely, the Solr full text search & faceting server). What the following steps will be doing is to load all these entities into the Solr index using the CRUD: Create web service endpoint. Once this step is finished, it means that all the DBpedia entities will be searchable and facetable using the OSF Search endpoint.
The first step is to edit the dmt.ini file to add information about the dataset to update:
Then add the following section at the end of the file:
Other Configurations to Speed-Up the Process
Now we will cover a few more configurations that can be performed in order to improve the speed of the indexation into OSF. You can skip these additional configuration steps, but if you do so, do not index more than 200 records per slice.
First search and edit the
virtuoso.ini file. Then find the
ResultSetMaxRows setting and configure it for
Then we have to increase the maximum memory allocated for the CRUD: Create web service endpoint. You have to edit the
Then check around line #17 and increase the memory (
Then we have to change the maximum number of URIs that the CRUD: Read web service endpoint can get as input. By default it is 64, we will ramp it up to 500.
500 at line #25
Importing the DBpedia Ontology
before we start the process of importing the DBpedia dataset into OSF, we have to import the DBpedia Ontology into OSF such that it uses what is defined in the ontology to optimally index the content into the Solr index. To import the ontology, we use the OMT (Ontologies Management Tool).
Import DBpedia Into OSF
This is the final step: importing the DBpedia dataset into the OSF full text search index (Solr). To do so, we will use the DMT (Datasets Management Tool) that we previously configured to fully index the DBpedia entities into OSF:
This process should take up to 24h with that kind of server.
At that point, the DBpedia dataset, composed of 3.5 million entities, is fully indexed into OSF. What that means is that all the 27 OSF web service endpoints can be used to query, manipulate and use these millions of entities.
However, there is even much more that come out-of-the-box by having DBpedia loaded into OSF. In fact, as we will see in the next article, this means that DBpedia becomes readily available to Drupal 7 if the OSF for Drupal module is installed on that Drupal 7 instance.
What that means is that the 3.5 million DBpedia entities can be searched via the Search API, can be manipulated via the Entity API, can be templated using the Drupal templating engine, etc. Then they can be searched and faceted directly on a map using the sWebMap OSF Widget. Then will be queriable via the OSF QueryBuilder that can be used to create all kind of complex search queries. Etc.
All this out-of-the-box.