My Personal Library and the Semantic Web

Since the last couple of years, I was constantly reminding myself to put all the titles, authors and ISBN of my library in a database so that I could say to my insurances: there is a the list of books I had prior this thing that happen in my apartment that destroyed all of them. Considering the time that it would have taken, I always pushed this work for later.

Then recently this thought restarted the haunt me, so I asked to my girlfriend: would you like to do this for me before we move to the other apartment? So naturally, she said yes 🙂 So I explained her how we would proceed to save some time and to archive the maximum number of information about these books.

I told her: “take this laptop and open FireFox. You notice the small “Z” icon in the lower right corner of the screen? This is Zotero; this software will save you much time to get the work done.” Naturally, she was dubious.

So I told her to go to Amazon.com, to get the ISBN of each book, to get to the Amazon web page of the book, and to click on the Zotero icon to save the information about the book. So, in 2 clicks, we were saving all the information describing each book: its title, its authors, its ISBN, its publisher, etc. It was taking about 30 seconds per book.

With that procedure, we archived all the information about the books in my library in about 5 hours.

Then I told me: fantastic, now I even have all this information in RDF, thanks to Zotero! I had to do something with that, so I put the Zotero RDF exportation file into a Virtuoso triple store. In less than a minute I had all the information about my books inside a triples store, ready to be queried with SPARQL.

Querying and browsing my book library

Linking books authors’ quotes to each book of the library

Then I wanted to know what the authors that wrote the books in my library already said. So I took the QuotationsBook.com quotes database and I linked it to the information I have about my library.

It is why you can read some quotes of Nietzsche at that web page. (note that I created the totally random “foaf:quote” property to add the quotes into the Zotero’s author resource)

Getting more information about the authors

Then, I needed to get more information about the authors I read. To get that additional information, I linked the information I have about the authors in the library with the dbpedia (rdf version of wikipedia) database.

The result is quite impressive. Go to the books/authors page. Then, click on the Nietzsches’s URI (rdf:#$kajXe; it’s the first line in the result table). Then click on the “Explore” link once the contextual window appears. From there, you will see a “sameAs” property, so click on the http://dbpedia.org… link. Then click on the “Get Data Set (Dereference)” link once the contextual window appears. That is it; you get all the information, available on Wikipedia, related to this author in my library.

Then, I know that Nietzsche is born 1844 and died in 1900, etc. So now, I can browse this new and enhanced dataset to know facts about authors I read in the past.

The idea here is to say that the author described by Zotero (in the exportation RDF file created by the software) is the same as the one in dbpedia. So, knowing that the entity defined in Zotero is the same as the one defined in dbpedia, tell us that the facts about the first and the later are true for both entities (because the reality is that both entities (different URIs) are the same).

Geographical data

From there, we can think about integrating the current data with any other type of datasets. One of them could be a geographical dataset such as Geonames.org.

In fact, we know that Nietzsche is death in the city of Weinar. So, if we link the goenames dataset with the dbpedia dataset (it is supposed to be done, but it seems that some things changed in the dbpedia dataset and that the links are no longer available; anyway, it can easily be done), we could have much more information about the place where Nietzsche is dead.

Conclusion

So, as you can see, in a couple of hours, I have been able to digitalize my library. Then, I have been able to get quotes by the author of each of my book. Then, I have also been able to get more information about each author I read.

This is really fantastic. That way, I only have to browse this new dataset to find new facts about authors and books that I didn’t know before, and that would have took me days to find (for my entire library). Thanks to the semantic web, everything has been possible in only a couple of hours.

We could push the experience even further and displaying on a map where the authors of my books are born. So, I could find where most of the authors I read in my life are born. Do I mostly read books wrote in Europe, United-States or Canada? Where a part of my knowledge came from? From where part of the World I have been influenced? Etc.

The Music Data Space

Kingsley is talking about Data Spaces since a long time. But what is a Data Space? Nothing is better than an example to understand something, so I will try to explain you with a single data space that has been created yesterday, the Music Data Space:

mbz_rdfview_uris.jpg

This is the Music Data Space. This Data Space contains information about musical things. These things are described mainly by using the Music Ontology, but also by using other ontologies like FOAF. Finally, things (musical things) belonging to this space are accessible, on the Web, via dereferencable URIs.

So, the Music Data Space is a place where all musical things are defined on the Semantic Web, and accessible via the Web.

That is it, and it is what we created last Monday.

Now, some of you could wonder: why on earth Amazon.com belongs to the Music Data Space?

Amazon.com also belongs to the Music Data Space too!

Amazon.com live in the Music Data space too via their API. In fact, a simple experience with the OpenLink RDF Browser clearly demonstrates that Amazon.com’s data belongs to the Music Data Space too.

Open the RDF Browser by following that link

Now you will visualize RDF information about an album called “Chore of Enchantment”. Take a look at this line:

amazon_asin: http://amazon.com/exec/obidos/ASIN/B00003XAA7/searchcom07-20

Click on the link to Amazon. A window should popup. Select the Get Data Set (dereference) option.

At this point, some magic will happens. In fact, the new information that is displayed in the RDF Browser is coming directly from Amazon.com’s web server.

This is why I assume that Amazon.com belong to the Music Data Space too.

In fact, the Virtuoso Sponger will connect to Amazon.com via their API to get some information about that album. It will convert the data into RDF and will display it to the user via the RDF browser’s interface.

One step further: the JPG file also belongs to the Music Data Space!

Yes! Information about the JPG file, hosted on Amazon.com’s web servers, also belong to the Music Data Space and there is the proof:

Open that same RDF Browser page by following that link

Click on the Image (JPG) representing the cover of this album. A window should popup. Select the Get Data Set (dereference) option.

Check the triples that have been created from this image. The Virtuoso Sponger downloaded the JPG file, it analyzed its header, RDFized everything and sent the information back to the RDF Browser so that the user can see the information available for that image.

Where is the end? I have no idea… probably at the same place where the imagination ends too.

Unifying everything

This is that simple. All data sources (relational databases, remote data accessible via APIs, native rdf data, etc.) are unified together via the Music Data Space. And this Music Data Space is accessible, via URI dereferencing, at http://zitgist.com/music/

Other Data Spaces available

Conclusion

The Music Data Space is the starting point and many other type of data spaces should emerge soon.

Browsing Musicbrainz’s dataset via URI dereferencing

Musicbrainz’s dataset can finally be browsed, node-by-node, using URI dereferencing.

What this mean?

Since the Musicbrainz relational database has been converted into RDF using the Music Ontology, all relations existing between Musicbrainz entities (an entity can be a Music Artist, a Band, an Album, a Track, etc.) are creating a musical relations graph. Each node of the graph is a resource and each arc is a property between two resources. Welcome in the World of RDF.

madonna-rdf-description.jpg

This means that from a resource “Madonna” we can browse the musical relations graph to find other entities such as Records, People, Bands, Etc.

Kingsley, inspired by Diana Ross, said: “URI Everything, and Everything is Cool!

This is cool! Now Diana Ross has her own URI on the semantic web: http://zitgist.com/music/artist/60d41417-feda-4734-bbbf-7dcc30e08a83

Paul McCarney:
http://zitgist.com/music/artist/ba550d0e-adac-4864-b88b-407cab5e76af

The Beatles:
http://zitgist.com/music/artist/b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d

Madonna:
http://zitgist.com/music/artist/79239441-bfd5-4981-a70c-55c3f15c1287

Have their own too!


URIs for Musical Things

These URIs are not only used to refer to Musicbrainz entities. In fact, these URIs are used to refer to any Musical Entities that you can describe using the Music Ontology. In a near future, the Musicbrainz data will be integrated along with data from Jamendo and Magnatune. In the future, we will be able to integrate any sort of musical data at the same place (radio stations data, user foaf profiles relations to musical things, etc.). So from a single source (http://zitgist.com/music/) all these different sources of musical data will be queriable at once.

mbz-magnatune-jemendo-rdf.jpg

URI schemes

The URI schemes are defined in the Musicbrainz Virtuoso RDF View:

  • http://zitgist.com/music/artist/*******
  • http://zitgist.com/music/artist/birth/*******
  • http://zitgist.com/music/artist/death/*******
  • http://zitgist.com/music/artist/simlink/*******
  • http://zitgist.com/music/record/*******
  • http://zitgist.com/music/performance/*******
  • http://zitgist.com/music/composition/*******
  • http://zitgist.com/music/musicalwork/*******
  • http://zitgist.com/music/sound/*******
  • http://zitgist.com/music/recording/*******
  • http://zitgist.com/music/signal/*******
  • http://zitgist.com/music/track/*******
  • http://zitgist.com/music/track/duration/*******

All these URI schemes terms refer to their Music Ontology classes’ descriptions.

Conclusion

I am getting closer and closer to the first goal I set to myself when I first started to write the Music Ontology. This first goal was to make the Musicbrainz relational database available in RDF on the Web. Months later and with the help of the Music Ontology Community (specially Yves Raimond that worked tirelessly on the project) and the OpenLink Software Inc. Team, we finally make this data available through URI dereferencing.

From there, we will build-up new music services, integrate more musical datasets into the Music Data Space, etc. It is just the beginning of something much bigger.

Free text search on Musicbrainz literals using Virtuoso RDF Views

I introduced a Virtuoso RDF View that maps the Musicbrainz relational database into RDF using the Music Ontology a couple of weeks ago. Now I will show some query examples evolving a special feature of these Virtuoso RDF Views: full text search on literals.

How RDF Views work

A Virtuoso RDF View can be seen as a layer between a relational database schemas and its conceptualization in RDF. The role of this layer is to convert relation data in its RDF conceptualization.

That is it. You can see it as a conversion tool or as a sort of lens to see RDF data out of relation data.

How full text search over literals works

Recently OpenLink Software introduced the full text feature of their Virtuoso’s SPARQL processor with the usage of the “bif:contains” operator (it is introduced into the SPARQL syntax like a FILTER).

When a user sends a SPARQL query using the bif:contains operator against a Virtuoso triple store, the parser will use the triple store’s full text index to perform the full text search over the queried literal.

With Virtuoso RDF View, instead of using the triple store’s full text index, it will use the relational database’s full text index (if the relational database is supporting full text indexes, naturally).

Some queries examples

In this section I will show you how the full text feature of the Virtuoso RDF Views can be used to increase the performance of a query against the Musicbrainz RDF View modeled using the Music Ontology

Note: if the system asks you for a login and a password to see the page, use the login name “demo” and the password “demo” to see the results of these SPARQL queries.

Example #1

A user remember that first name of the music artist is Paul, and he remember that one of the albums composed by this artists is Press Play. So this user wants to get the full name of this artist with the following SPARQL query:

sparql
define input:storage virtrdf:MBZROOT
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?artist_name ?album_title
FROM <http://musicbrainz.org/>
WHERE
{
?artist rdf:type mo:SoloMusicArtist .
?artist foaf:name ?artist_name .
?artist mo:creatorOf ?album .

?album rdf:type mo:Record .
?album dc:title ?album_title .

FILTER bif:contains(?artist_name, “Paul”) .
FILTER bif:contains(?album_title, “Press and Play”) .
};

Results of this query against the musicbrainz virtuoso rdf view

As you can notice with that query, the user will use the full text capabilities of Virtuoso over two different literals: the objects of these two properties foaf:name and dc:title.

Example #2

In this example, the user wants to know the name of the albums published by Madonna between 1990 and 2000. The answer to this question is returned by the following SPARQL query:

sparql
define input:storage virtrdf:MBZROOT
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dcterms: <http://purl.org/dc/terms/>
prefix dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT ?albums_titles ?creation_date
FROM <http://musicbrainz.org/>
WHERE
{
?madonna rdf:type mo:SoloMusicArtist .
?madonna foaf:name ?madonna_name .
FILTER bif:contains(?madonna_name, “Madonna”) .

?madonna mo:creatorOf ?albums .
?albums rdf:type mo:Record .
?albums dcterms:created ?creation_date .
FILTER ( xsd:dateTime(?creation_date) > “1990-01-01T00:00:00Z”^^xsd:dateTime ) .
FILTER ( xsd:dateTime(?creation_date) < “2000-01-01T00:00:00Z”^^xsd:dateTime ) .
?albums dc:title ?albums_titles .
};

Results of this query against the musicbrainz virtuoso rdf view

Here the user will use the full text capabilities of the Virtuoso RDF Views to find artists with the name Madonna and he uses two filters on xsd:dateTime objects to find the albums that have been created between 1990 and 2000.

Examples #3

In this last example, the user wants to know the name of the members of the music group U2.

sparql
define input:storage virtrdf:MBZROOT
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
SELECT ?band_name ?member_name
FROM <http://musicbrainz.org/>
WHERE
{
?band rdf:type mo:MusicGroup .
?band foaf:name ?band_name .
?band_name bif:contains ‘”U2″‘ .
?band foaf:member ?members .
?members rdf:type mo:SoloMusicArtist .
?members foaf:name ?member_name .
};

Results of this query against the musicbrainz virtuoso rdf view

Here the user will use the full text feature to get the name of the music group, then the name of the members related to this (these) music group(s) will be returned as well.

Special operators of a full text search

Some full texts operators can be used in the literal parameter of the bif:contains clause. The operators are the same used in the full text feature of Virtuoso’s relational database. A list and a description of the operators can be found on that page.

I would only add that the near operator is defined as +/- 100 chars from the searched literal. And the wildcard ‘*’ operator should at least be placed after the third character of the literal. So, “tes*t” or “tes*” or “test*” are legal usages of the wildcard operator, but “*test”, “t*” or “te*st” are illegal usages of the operator.

Conclusion

Finally, as you can see, the full text feature available with the Virtuoso RDF Views is a more than essential feature that people should use to increase the performance of their SPARQL queries. The only two other options they have are: (1) using a normal “literal” that as to be well written and with the good cases; in one word this option render such queries useless and (2) they can use a FILTER with a regular expression with the “I” parameter that is far too slow for normal usages.