Musicbrainz Relation Database mapped in RDF using the Music Ontology

I am pleased to publish some information about mapping of the Musicbrainz relational database data into RDF using the Music Ontology as I promised some time ago. I know that I have been late on this one, but I was waiting after some things to be released before publishing this blog post.

This is the first step we have to do before getting a “physical” RDF dump of the musicbrainz data. This first step is to use a Virtuoso RDF View to view the musicbrainz relation database as a RDF triple store.

Introduction to Virtuoso RDF Views

Carl Blakeley of OpenLink Software Inc. just published a first Virtuoso RDF View tutorial called “Mapping Relational Data to RDF with Virtuoso’s RDF Views“. This article explains how to define RDF Views inside Virtuoso and how they work.

The first step would be to read that document to make sure you understand how the mapping of the Musicbrainz data into RDF has been performed using Virtuoso.

RDF/XML presentation of the mapping

I have written a RDF/XML file explaining where the data came from the Musicbrainz database schemas to create the actual RDF View. This is a good starting point to “feel” how the Music Ontology can be used to express musicals things such as Artists, Bands, Records, Tracks, etc.; and to see how the Musical Created Workflow supporting the Music Ontology is used in that case.

The Musicbrainz RDF View

This is the RDF View enabling the Musicbrainz relational database to be viewed as a RDF source “queriable” using SPARQL. This view will virtualizes the descriptions of mo:MusicArtist, mo:MusicGroup, mo:Records and mo:Tracks; as long as mo:Performance, mo:Signal, mo:Composition, etc.

Using the RDF View

Installing the Musicbrainz Database instance (the quick guide)

The first step is to download the Musicbrainz DB and to install it on a PostgreSQL server instance. Follow these steps.

Note: I will try to make that guide as short as possible, so if there are steps that you don’t understand or doesn’t work for you, please leave a comment on that blog post or send me an email.

Installing Virtuoso

To use the RDF View, you will first have to install the Virtuoso 5.0 on your computer. OpenLink Virtuoso comes in 2 different flavours: Open Source and Commercial. The difference, besides the obvious, is that the commercial versions include Virtual Database functionality, which makes the following step easier, as the relational data may remain in the PostgreSQL database.

Linking PostgreSQL tables to Virtuoso via ODBC

For the Open Source Edition:

With the Virtuoso Open Source Edition 5.0 you will have to export the data from PostgreSQL server and import to Virtuoso native DBMS.

For the Commercial Edition:

Once the Virtuoso instance will be running, open a browser window to access Conductor by going to http://localhost:8890/conductor/. This is a web-based dbms manager like myPhpAdmin but for Virtuoso. You may then use it to attach the tables though ODBC.Note: you should have a PostgreSQL ODBC driver installed to perform the following steps.

You should see the PostgreSQL instance connection in the list. You only have to click on “connect”, put the credentials, and you should get connected the Virtuoso server to the PostgreSQL running instance.

After that click on the “External Linked Objects” to connect the remote PostgreSQL tables with Virtuoso. Take a special look at schemes created by these links. The remote tables should be available via the schema “DB.[ODBC driver name].[remote table name]”

These Musicbrainz tables should be linked into Virtuoso:

track, albumjoin, album, albummeta, artist, artist_relation, artistalias, album_amazon_asin, country, l_album_url, l_artist_artist, l_artist_track, l_artist_url, l_track_track, l_album_album, l_album_artist, l_track_url, language, release, url, puid, puidjoin.

Installing the RDF View in Virtuoso

Before continuing, you will have to make a little modification to the RDF View document. You should replace all the “DB.MO.” string occurrences for “DB.[name of the DSN entry].”. This will specify to the RDF View where to take the relational data (in that case, from a remote PostgreSQL server instance).

Now click on the first item in the left sidebar menu “Interactive SQL (iSQL)”.

The next step is to copy the fixed RDF View code into this iSQL window and the clicking RUN.

After 1 or 2 minutes the view should be defined into Virtuoso.

Testing the view

Now the only thing that you have to do is testing this new RDF View. Use that simple query to make sure that you get triples from the view by running that simple SPARQL query inside iSQL:

sparql
define input:storage virtrdf:MBZROOT
select *
from <http://musicbrainz.org/>
where
{
?s ?p ?o.
};

Now the only thing you have to do is to query this RDF View like if you would query any triple store using SPARQL. Check out the Music Ontology Wiki for some examples of how this RDF graph can be queried.

Conclusions

The RDF View to convert Musicbrainz RDB into RDF is quite interesting on many aspects. First of all, we have a good representation of the Musicbrainz data in RDF using the Music Ontology. But this example also shows precisely how relation data can somewhat easily be converted into RDF.

Why another Bibliographic Ontology?

This very good question by Peter Mika asked on the Bibliographic Ontology Specification Group yesterday.

So, why? Peter said:

I’ve read Frederick Giasson’s call for this group on PlanetRDF.com. But before getting started on the actual topic of developing an ontology for bibliographies, my question is: why develop a new ontology? What is lacking in SWRC/BuRST or PRISM that this new ontology would add? I’m asking this, because I’m concerned by (even) more fragmentation in this space.

I am not a citations a bibliographic references domain expert. In fact, my knowledge in the domain is somewhat limited. However, my recent blog posts about the integration of Zotero into the semantic web brought a lot of questions related with citations and bibliographic ontologies. Bruce D’Arcus appeared from the Zotero web forum, unsatisfied with current ontologies. Bruce knows a lot about all that stuff: he is a domain expert. So I asked to Bruce if he would be willing to start the development of a new Bibliographic Ontology project that would answer its need. In fact, as I noted on my blog and on the wiki, its needs are applied to real problems: OpenOffice and Zotero.

From there, I put in place the current communication infrastructure to start talking about these problems. In less than 1 day, 17 people subscribed to the mailing list, 11 comments have been posted on my latest blog post, etc.

This tells me that there is a real interest in the question. Why? Possibly because current ontologies doesn’t work well for everybody.

In fact, it wasn’t working well for me neither. When I tried to see what was the bibliographic ontologies landscape when I worked on that problem for Zitgist, I found that it was the jungle. There was so many possible ways to describe them, to describe what was a document, etc. There were no best practice guides, no examples, etc; people were doing anything they wanted. This was rendering the data useless for Zitgist. This is for that exact reason that I am putting time in that initiative right now.

An example to illustrate the problem

I will illustrate the current problem with bibliographic ontologies with the following example:

I gone to the BuRST home page and clicked on one of its example. I then checked the code, I saw some SWRC things… then I tried to dereference the URI of this ontology to get the schema explaining what these properties were. Then I tried to find the properties/classes: they were not there.

I think this simple example explains many the problems out there. There are no consistency, no good doc (I can’t find the good SWRC specification document at the moment), no examples, etc.

Next wave of users

The next wave users for these ontologies aren’t computer scientist students working on some academic projects. The next wave of users for these ontologies are Web developers that has only a basic knowledge of all that stuff. What these people need are good doc, consistent concepts and methods, good examples and a community backing the development of these projects.

This is not what I find right now.

Community driven ontology development

To answer to Peter’s mail, Bruce said:

The first corresponds to a narrow range of academic users (last I looked it wouldn’t work for the humanities or law), and the second is just a series of properties, mostly already covered by DC and maintained by a fairly closed industry group not very interested in RDF.

Later Chris Bizer wrote on my blog:

yes, it would really be nice to have a community-backed ontology for describing publications which is a bit more Semantic-Webby than Dublin Core. So developing a best practice for mixing DC, FOAF, SIOC and the event ontology would really useful.

Once you guys have developed this best practice, we are happy to change the D2R mapping of our DBLP server (http://www4.wiwiss.fu-berlin.de/dblp/) and the RDF book mashup(http://sites.wiwiss.fu-berlin.de/suhl/bizer/bookmashup/index.html) , so that they export RDF according to your best practice.

I think that these two examples describe what is happening. Now people are requesting open communities (could we talk about open-sources communities?) to develop these ontologies.

So why this ontology?

The idea here is to develop yet-another-bibliographic-ontology. But the goal isn’t to re-invent the wheel another time. The goal is to fill-in the blanks, to develop a sort of ontology framework developed in such a way that we can easily plug future extension modules, and to make it interacting easily with already existing ontologies. Yes in RDF you can “theorically” plug everything with everything, but in the reality, this is not that simpler and effective. This new ontology initiative should also act as a “best practices” guide for describing citations and bibliographic references on the Semantic Web for developers that has little knowledge in the semantic web.

This is a question of adoption of the semantic web by Web developers. These people that just don’t have the time to check all these little “fragmented” ontologies wrote in OWL, RDFS or whatever, without too explicit comments, without documentation, examples, etc. This is why microformats are going that well: because there are clear documentation, good examples, etc. Like microformats or not, they got the attention of developers because there is support, docs, examples and a strong community developing them.

Conclusion

So all these projects (the Music Ontology, the Bibliographic Ontology, the Linked-Open-Data community, etc.) make me wondering: now, as I write that, are the challenges that the Semantic Web has to face are more social than technical?

I think this is the time now to show to the World that these things work, and work quite well. Unfortunately for some people, we will have to ask these questions and create communities supervising such ontology developments. Entrepreneurs will tell you that the clients are always right. And the clients of ontologies are developers and they won’t spend their precious time in some bric-a-brac projects.

Finally, what I am proposing here is to create an open-community to supervise the development of an ontology describing citations and bibliographic references. This community will be composed of experts of the domain; companies and organizations that want to use it; developers and hobbyists that has interests in it. And as I said above: “The goal is to fill-in the blanks, to develop a sort of ontology framework in such a way that we can easily plug future extension modules, and to make it interacting easily with already existing ontologies. Yes in RDF you can “theorically” plug everything with everything, but in the reality, this is not that simpler and effective. This new ontology initiative should also act as a “best practices” guide for describing citations and bibliographic references on the Semantic Web for developers that has little knowledge in the semantic web.

The Bibliographic Ontology

Zitgist, Bruce D’Arcus, the Zotero team and Michael K. Bergman started a new initiative to develop a new citation and bibliographic references ontology. The idea of that project started a couple of days ago when we tried to find how Zotero could be integrated in a semantic web environment. This brainstorming leaded us to start a new ontology development project: The Bibliographic Ontology.

References

Some things are already in place to start the collaborative development of the ontology:

Starting the development of this ontology

As a starting point of the development of this ontology, we will take the “Citation Oriented Bibliographic Vocabulary” developed by Bruce D’Arcus. It is a start, but as he pointed out in the brainstorming, there are much work to do with it to create a better citations and bibliographic ontology. Also, Bruce wrote an introduction mail about what he has in mind to make it a better ontology, what he thinks we should work on, etc. Have in mind that Bruce has a big background and much experience in the domain of citations and bibliographic references.

Goals

The development of this ontology should be driven by its goals. Bruce outlined some goals for this ontology, and more could be added depending on how people are expecting to use it.

  1. Should be a superset of legacy formats like BibTeX, RIS, and so forth
  2. Must support the most demanding needs in the social sciences, humanities, and law, and those who deal with non-Western languages
  3. The class system must be able to map to the type system in the citation style language I [Bruce] designed. In short, it is not enough to just encode the data: it needs to be able to be formatted according to the often archaic details of citation styles
  4. Should be developer-friendly; I consider examples like DOAP and SKOS to be models here
  5. Behind all of these goals are a more concrete goal: it should be perfect for using in OpenDocument/OpenOffice citation support and should handle Zotero’s needs.

In fact, for the point 5, these systems will be the tests cases for the development of this new ontology. They are the same as Musicbrainz, Magnatune and any musical needs that were the tests cases for the development of the Music Ontology.

Users

Users can be many people or systems. Just to listen a couple of them:

  • OpenDocument/OpenOffice citation system
  • Zotero
  • Zitgist
  • Students or professors in a social science or law department
  • Book selling systems such as Amazon.com, Alibris.com or Abebooks.com
  • Book, journals, etc. publishers
  • Authors

As you can see, many things [people or systems] are potential users of this ontology: from people without computer background to heavy and complexes systems such as Amaon.com Zotero and OpenOffice.

Constraints

Users and goals define the development constraints of that ontology. However, we will try to take the same path as me and Yves Raimond has taken for the development of the Music Ontology: creating many levels of expressiveness for the ontology. These levels will be use depending on the user: does the user need to only describe a simple bibliographic reference? Yes, then he will use the level one. Does the user need to describe a collaborative work aggregating many medium sources like: writings, speeches, and conferences, in many languages and in a special timeframe? Yes, then he will use level three. It has been quite a successful approach in the Music Ontology so we should try it into the Bibliographic Ontology too.

Reuse of existing ontologies

This ontology will probably reuse many existing ontologies. Some of them could be:

  • FRBR: as the basement of the ontology
  • FOAF: as the way to describe authors
  • SIOC: as a way to describe everything related to the social software World: wiki pages, blog posts, mailing list threads, etc.
  • MO: as a way to describe everything related to musical things
  • DC: do I have to say why?
  • Event: as a way to describe some events like workshops, conferences, etc.
  • Timeline: as a way to describe complex temporal frameworks

Conclusion

If you are interested in that new ontology development project, I would suggest you to subscribe to the mailing list as well as creating a user on the Wiki and to start giving your ideas and expertise to develop the Bibliographic Ontology. What is great with that project is that it is already motivated by external projects such as its integration into the OpenDocument/OpenOffice citation support and its use by Zotero for its integration with Ping the Semantic Web and Zitgist.

When Zotero meet the Semantic Web

Yesterday I wrote a blog post about how Zotero could be integrated into the semantic web. Today more and more people seem interested into the idea and started to dream about the possibilities. In fact, such an initiative could have a deeper impact than only integrating a tool into an environment. It could certainly have an impact on how people are describing documents, citations and works. It could probably help people managing documents, creating and managing document portfolios, automatically generating bibliographic references, etc. It could even possibly help people and scientists in their daily work. Am I dreaming? The World had been built by dreams.

This morning I started the discussion on Zotero’s web forum and it continued all the daylong.

The Biblio Ontology

First of all, the discussion has been start by Bruce D’Arcus when he raised some possible issues related to documents’ URI and documents’ descriptions.

Snippets of Bruce’s Biblio Ontology are currently used by Zotero to describe citations data in RDF. He is pointing out that much work would have to be done on that ontology to let it be able to handle more issues related to document description on the semantic web.

The Zotero team got interested into the project

Dan Cohen, one of the presidents of Zotero, is quite interested into the project. Few considerations have been raised, but all in all, the project seems possible for everybody.

Next development phase

Now that I know that there is an interest from some people into that project, I think it would be good to start planning the next development phases of this initiative.

I am thinking about proceeding the same way I preceded for the Music Ontology and Musicbrainz. We should start by creating a community around the development of an ontology. If Bruce would be willing, I would suggest taking its ontology as the foundation of the project. From there, we could start thinking about what to change it and how to upgrade this ontology to meet Zetero’s need as well as scientific community’s.

In parallel I could work with the Zotero team to integrate it the Zitgist/PingtheSemanticWeb environment.

Also in parallel, another group would develop the Virtuoso Sponger Metadata Cartridges (the equivalent of Zotero’s Translators) to enable Virtuoso server instances to process the same citation data as Zotero does.

Finally, we would work with Zotero to create another Translator that would dereference URIs to get citations RDF data. This new Translator will be use, as I said in my previous post, to let Zotero be feed by Zitgist’s search results and browsing pages.

Conclusion

This is what is new with that idea. Now we should move on to consolidate the initial phase of the project: the creation of the community’s nucleus. Please leave me an email or a comment on this blog post if you would be interested in participating in that emerging project.

Integration of Zotero in a Semantic Web environment to find, search and browse the Web’s citations

Zotero is a great FireFox add-on that lets its users find, search, edit and create citations they find on the Web while browsing it. All the power of Zotero resides in its “translation modules“. These modules will detect citations in various types of web pages. When it detects one of these citations, it will notify its users to give them the opportunity to save them.

What interest me is that Zotero already use some ontologies to export users’ citations libraries using RDF. When I noticed that I started to wonder: what could we do with Zotero now?

The Zotero vision

Zotero is the best-integrated citation tool for the Web I know. A phenomenal amount of citations can be discovered on the Web via Zotero users community.

Remember what we have done with the Semantic Radar a couple of months ago? This FireFox add-on was detecting SIOC RDF documents in Web page. Then I contacted Uldis Bojar to ask him to ping PingtheSemanticWeb.com each time a user was detecting a RDF file while he was browsing the Web. Now a good source of RDF data pinged to PTSW come from Semantic Radar users. This is a sort of “social semantic web discovering” technique.

What I would like to do is the same thing but for Zotero.

zotero-ptsw-zitgist.jpg
[Click to enlarge to full size]

  1. Zotero users browse the Web, discover citations and save them into their personal libraries.
  2. Each time a Zotero instance discover a citation, it would send the URL where we can find it to PingtheSemanticWeb.com.
    1. Note: the user should be aware of that functionality via an option into Zotero that would explains him what this feature it is all about, and to gives him the possibility to disable it.
    2. Note: Zotero would ping PTSW each time it detects a citation (so that the icon appears in the FireFox’s URL bar), and not each time a user save it.
  3. Via the Virtuoso Sponger, PingtheSemanticWeb.com will check the incoming URL from Zotero users and will check to find citations too. If a citation is found, it will be added to its list of know citations and archive their content.
  4. PingtheSemanticWeb.com will then send the new citations to Zitgist so that it can include them into its database.
    1. Note: here Zitgist could be replaced by any web service wanting them. Remember that PTSW act as a data-multiplexer.
  5. Via Zitgist (that is a semantic web search engine), users from around the World will be able to search among these citations (discovered by Zotero users) and to browse them.

Zitgist has a Zotero citation provider

What is fantastic here is that Zitgist become a source of citations. So if a Zitgist user has Zotero installed, then he will be able to batch-save the list of results returned by Zitgist; and if the user is browsing Zitgist’s citations, he will be able to include them into their Zotero instance like if Zitgist would be Amazon.com or any other citations web sites.

That way, Zotero’s found data would be accessible to Zotero users via Zitgist that would then become a citations provider (mainly feed by the Zotero community).

You see the interaction?

What have to be developed?

Some things have to be developed to make that vision working. No major development, but only a couple of features to develop on each system.

Integration of Ping the Semantic Web into Zotero

The integration of Ping the Semantic Web into Zotero is quite straightforward.

Pinging PingtheSemanticWeb.com via a web service

The first step is to make Zotero notify PTSW each time it comes across a citation. It has to send the URL of that/these citation(s) via XML-RPC or REST.

That is it. Each time Zotero detect a citation, it sends a simple ping to PTSW via an XML-RPC or REST request.

Adding a pinging option to Zotero

Another thing that Zotero would have to add to their add-on is an option that would gives the possibility to their users to disable that feature in case they don’t want to send a notification to PTSW each time they discover a citation on a Web page while they are browsing the Web.

Development of Zotero translators into Sponger Metadata Cartridge

The biggest development effort that would have to be done is to convert the Zotero translators into Virtuoso Sponger’s Metadata Cartridge.

Right now, Metadata Cartridge exists for: Google Base, Flickr, microformats (hReview, hCalendar, etc.), etc. These cartridges are the same things as “Zotero translators” but for the Virtuoso Sponger. By developing these cartridges, everybody running Virtuoso will be able to see these citations (from Amazon, etc.) as RDF data (mapped using some ontologies).

Documentation about how to develop these cartridges will be available in the coming days. From there, we would be able to setup an effort to convert the Zotero Translators into Spongers Metadata Cartridges.

Conclusion

This is the vision I have of the integration of Zotero into the current Semantic Web environment that exists. Any ideas, suggestions, collaboration propositions would be warmly welcome.

Note: a discussion about this subject started on Zotero’s web forum