I am pleased to publish some information about mapping of the Musicbrainz relational database data into RDF using the Music Ontology as I promised some time ago. I know that I have been late on this one, but I was waiting after some things to be released before publishing this blog post.
This is the first step we have to do before getting a “physical” RDF dump of the musicbrainz data. This first step is to use a Virtuoso RDF View to view the musicbrainz relation database as a RDF triple store.
Introduction to Virtuoso RDF Views
Carl Blakeley of OpenLink Software Inc. just published a first Virtuoso RDF View tutorial called “Mapping Relational Data to RDF with Virtuoso’s RDF Views“. This article explains how to define RDF Views inside Virtuoso and how they work.
The first step would be to read that document to make sure you understand how the mapping of the Musicbrainz data into RDF has been performed using Virtuoso.
RDF/XML presentation of the mapping
I have written a RDF/XML file explaining where the data came from the Musicbrainz database schemas to create the actual RDF View. This is a good starting point to “feel” how the Music Ontology can be used to express musicals things such as Artists, Bands, Records, Tracks, etc.; and to see how the Musical Created Workflow supporting the Music Ontology is used in that case.
The Musicbrainz RDF View
This is the RDF View enabling the Musicbrainz relational database to be viewed as a RDF source “queriable” using SPARQL. This view will virtualizes the descriptions of mo:MusicArtist, mo:MusicGroup, mo:Records and mo:Tracks; as long as mo:Performance, mo:Signal, mo:Composition, etc.
Using the RDF View
Installing the Musicbrainz Database instance (the quick guide)
The first step is to download the Musicbrainz DB and to install it on a PostgreSQL server instance. Follow these steps.
Note: I will try to make that guide as short as possible, so if there are steps that you don’t understand or doesn’t work for you, please leave a comment on that blog post or send me an email.
Installing Virtuoso
To use the RDF View, you will first have to install the Virtuoso 5.0 on your computer. OpenLink Virtuoso comes in 2 different flavours: Open Source and Commercial. The difference, besides the obvious, is that the commercial versions include Virtual Database functionality, which makes the following step easier, as the relational data may remain in the PostgreSQL database.
Linking PostgreSQL tables to Virtuoso via ODBC
For the Open Source Edition:
With the Virtuoso Open Source Edition 5.0 you will have to export the data from PostgreSQL server and import to Virtuoso native DBMS.
For the Commercial Edition:
Once the Virtuoso instance will be running, open a browser window to access Conductor by going to http://localhost:8890/conductor/. This is a web-based dbms manager like myPhpAdmin but for Virtuoso. You may then use it to attach the tables though ODBC.Note: you should have a PostgreSQL ODBC driver installed to perform the following steps.
You should see the PostgreSQL instance connection in the list. You only have to click on “connect”, put the credentials, and you should get connected the Virtuoso server to the PostgreSQL running instance.
After that click on the “External Linked Objects” to connect the remote PostgreSQL tables with Virtuoso. Take a special look at schemes created by these links. The remote tables should be available via the schema “DB.[ODBC driver name].[remote table name]”
These Musicbrainz tables should be linked into Virtuoso:
track, albumjoin, album, albummeta, artist, artist_relation, artistalias, album_amazon_asin, country, l_album_url, l_artist_artist, l_artist_track, l_artist_url, l_track_track, l_album_album, l_album_artist, l_track_url, language, release, url, puid, puidjoin.
Installing the RDF View in Virtuoso
Before continuing, you will have to make a little modification to the RDF View document. You should replace all the “DB.MO.” string occurrences for “DB.[name of the DSN entry].”. This will specify to the RDF View where to take the relational data (in that case, from a remote PostgreSQL server instance).
Now click on the first item in the left sidebar menu “Interactive SQL (iSQL)”.
The next step is to copy the fixed RDF View code into this iSQL window and the clicking RUN.
After 1 or 2 minutes the view should be defined into Virtuoso.
Testing the view
Now the only thing that you have to do is testing this new RDF View. Use that simple query to make sure that you get triples from the view by running that simple SPARQL query inside iSQL:
sparql
define input:storage virtrdf:MBZROOT
select *
from <http://musicbrainz.org/>
where
{
?s ?p ?o.
};
Now the only thing you have to do is to query this RDF View like if you would query any triple store using SPARQL. Check out the Music Ontology Wiki for some examples of how this RDF graph can be queried.
Conclusions
The RDF View to convert Musicbrainz RDB into RDF is quite interesting on many aspects. First of all, we have a good representation of the Musicbrainz data in RDF using the Music Ontology. But this example also shows precisely how relation data can somewhat easily be converted into RDF.
Dan Brickley
April 17, 2007 — 12:12 pm
This is pretty cool!
Hopefully someone will convert the config and show it working with D2RQ as well. The more the merrier…
Could you post some queries that run OK (reasonably fast etc)? Anything using the Advanced Relationships in MB?
Fred
April 17, 2007 — 3:04 pm
Hi Dan,
Thanks for the enthusiasm 🙂
[quote post=”802″]Hopefully someone will convert the config and show it working with D2RQ as well. The more the merrier…[/quote]
I think that Yves Raimond thought about that possibility, you should probably contact him for more information.
Related to D2RQ, I don’t know if it is possible, but take care about some properties values: you need to do some transformation from the values that came from the RDB to the mapping into RDF. Check out the couple of SQL Procedure in the middle of the RDF View document for more information about these specific cases.
[quote post=”802″]Could you post some queries that run OK (reasonably fast etc)? Anything using the Advanced Relationships in MB?[/quote]
I will, but the problem right now is that Musicbrainz is quite huge, and that we had a performance isssue with some long row. Ivan is developing a fix that should be available soon to make this view performing well.
What I suggest is that as soon as this fix is available, I will setup a demo server where people will be able to send a series of SPARQL queries that I will write for that purpose. This demo server shoul be available next week, or in three weeks if the if is not finished next week (because I am in vacation from the 27 April to the 11 may in California).
So I will keep you updated via my blog. In the mean time, people will have another example of how the Music Ontology can be used 🙂
Take care,
Fred