February 2007 – Frederick Giasson

Music Ontology revision 1.10: introduction of musical performances as events in time

February 26, 2007 Frederick Giasson

I am pleased to publish the revision 1.10 of the Music Ontology. As you can see in the change log:

many things changed in that revision of the ontology. In fact, all the ambiguities people noticed with the latest revisions should now be fixed. Yves Raimond worked hard to implement the Time Ontology along with its Event and Timeline Ontologies into the Music Ontology. The result is a major step ahead for the Music Ontology. Now different levels of expressiveness are available trough the ontology.

In fact, one can describe simple things like MusicBrainZ data and relations, or they can describe the recording of a gig on a cell phone, published on the web, by two different people. Take a look at the updated examples to see what each of these three levels of expressiveness can describe.

As you can see, all the examples are now expressed in N3 and XML. These examples are classified in three different levels of descriptiveness. Most people will describe musical things using the first level of expressiveness, however closed and specialized systems will be able to express everything they want related to music using the Musical Ontology (check examples level 2 and 3 for good examples of this new expressiveness power)

One of the last things we have to fix is how genres should be handled. Right now we are typing individuals with its genre. However considering that genres evolve and change really quickly and that they are strongly influenced by cultures, a suggestion has been made to create individuals out of the class Genre and then describing them. Also, we could create s mo:subGenre property (domain: mo:Genre; range: mo:Genre) that would relate a genre to its sub-genre(s).

This idea is really great and would probably be the best way to describe genres considering their “volatile” meaning over time. However the question is: how to link a mo:MusicalWork, mo:MusicalExpression, mo:MusicalManifestation and mo:Sound to its genre? If we create a mo:genre (domain mo:MusicalWork… etc; range rdf:resource), then people could use that property to link a MusicalWork, etc. to anything (anything that is a resource). Personally I think that it is not necessarily a good thing to introduce such a non-restricted property into the ontology.

Note that it is not the same thing as the property event:hasFactor since anything can be a factor of a musical event.

Now that the ontology is becoming pretty stable, the next step is starting to use it to describe things related to music. The first step will be to convert MusicBrainZ’s data into RDF using the Music Ontology. Soon enough I should make available a RDF dump of this data along with a Virtuoso PL that will enable people to re-create this RDF dump from a linked instance of a MusicBrainZ Postgre database into Virtuoso

Finally I would like to give a special thanks to Yves for its hard work and involvement for the publication of that new revision of the Music Ontology.

Zitgist: a semantic web search engine

February 18, 2007March 23, 2007 Frederick Giasson

Recently I started to talk about the project I am currently working on and people were wondering what it was. “Hey, what Zitgist is about Fred?” – “I heard a couple of things but I can’t find any information about it” – Etc. So I started to gives away some information about Zitgist (pronounced “zeitgeist”), what it was, what were the goals, etc.

Zitgist is basically a semantic web search engine. Some people could be wondering about what is a semantic web search engine? In what is it different from more traditional search engines such as Google, Yahoo! or MSN Search? It only differs in the information it aggregates, index and use to answer users queries. Instead of using human readable documents such as HTML, PDF or DOC, Zitgist will use semantic web documents (RDF).

The characteristic of these documents is that they describe things. In fact, these documents can describe anything: a person (its interests, its relations with other people, etc.), objects like books, music CDs; projects (computer projects, architectural projects, etc.), geographical locations such as countries, cities, mountains, etc. So, with semantic web documents one can describe virtually anything.

Since Zitgist is aware of the meaning of all these descriptions, powerful queries can be sent by users to let them find what they want. By example, a Zitgist user could send queries such as:

Give me the name of the people that are interested in writing leaving near London.
Give me the name of groups (group, organization, etc.) that has Brian Smith as member.
Give me the name of the computer projects programmed using C++ that work for Linux or Windows.
Give me the name of the discussion forums that are related to cooking.
Give me the name of the cities in UK that have more than 150 000 people.
Give me the name of the documents where its topic is a person named Paul.
Etc.

Note: these queries are not built using natural language (such as phrases), but with an easy to use user interface that help users to build the queries they want.

Once a user has built and sent these queries, the search engine will return results matching these criteria. Then if that user click on a result interesting him, he will be redirected to Zitgist’s semantic web browser. This interface display the semantic web document know by Zitgist to users.

This is what is interesting with Zitgist. Since semantic web documents are intended to be used by machines, humans can’t easily read them. This semantic web browser is a user interface that displays the information held in these semantic web documents such that humans can easily and intuitively read and understand them. So from a single result (say, a person), a user will be able to surf the semantic web like if he would be surfing the Web: from link to link. That way, users will use the same behaviors to surf the semantic web has they have when they surf the current Web.

The main goal with Zitgist is to create a semantic web search engine for users that doesn’t even know the existence of the semantic web. A user shouldn’t be aware of the concepts supporting the semantic web to use it. Their experience should be as close as the one they currently have with the current Web and the search engines they use daily.

What is the current state of Zitgist? All the things I mentioned above are working. A first private version should be released in the next months. Some demos to the semantic web community should be performed too. Slowly, in the next months, more and more things will be rolled-out to the public. However be aware that I am not hyping the project here. I will do things that way to make sure that the supporting architecture gives the best experience to all users. For this, we have to scale the architecture slowly to make sure that too much users do not make the service unusable.

In the next week, I should gives more technical information and write documentation about how web developers can make their data available to optimize their indexation into Zitgist. So best practice documents describing how web developer should create their semantic web document will eventually be put online.

Zitgist LLC. is a company founded by me and OpenLink Software Inc. I would like to specially thanks the OpenLink development team that gives me all the support I need to develop a first working version of this new search engine. I wouldn’t be able to write about Zitgist today without them.

Revision 1.03 of the Music Ontology

February 12, 2007 Frederick Giasson

I just published the revision 1.03 of the Music Ontology. You can notify the addition of mo:Movement, mo:has_movement, mo:movementNum and mo:tempo. Half of the modifications has been made to enhance the descriptiveness of classical music. The other half of the modifications has been made to make the distinction between production (MusicalExpression) and publication (MusicalManifestation) clearer.

All these modifications have been made possible by Ivan Herman and Yves Raimond and I would specially like to thanks them for their insightful comments and suggestions they made via the mailing list.

There is the changes log for the revision 1.03

Changed the range of mo:key from a rdfs:Literal to a http://purl.org/NET/c4dm/music.owl#Key
Added property mo:opus
Added mo:MusicalWork in the domain of: mo:bpm, mo:duration, mo:key, mo: pitch
Added class mo:Movement
Added property mo:has_movement
Added property mo:movementNum
Added property mo:tempo
Added mo:Instrument in the range of mo:title
Remove mo:MusicalWork and mo:MusicalManifestation and added mo:MusicalManifestation to the property’s domain mo: publisher, mo: producer, mo:engineer, mo:conductor, mo:arranger, mo:sampler, mo:compiler
Remove mo:MusicalWork and mo:MusicalManifestation and added mo:MusicalManifestation to the property’s range mo: published, mo: produced, mo:engineering, mo:conducted, mo:arranged, mo:sampled, mo:compiled

Distribution of semantic web data

February 4, 2007 Frederick Giasson

Three days ago I talked about the importance of both RDF data dumps and dereferencable URIs to distribute RDF data over the Web. However yesterday Marc from Geonames.org got some problems with an impolite semantic web crawler. In his article he point out that:

“It simply does not make sense to download a huge database record by record if a full dump is available.”

In the best of the world it doesn’t make sense, but unfortunately it is how the Web always worked. Think about Google, Yahoo! and MSN Search; this is exactly what they do, and it doesn’t make sense. The difference is that they are probably more polite. Marc did the only thing he as to do: banning the belligerent crawler.

The problem with data dumps is that they are generally not that easy to find (if available) on services web site. So some developers will note care taking the time to find them and will fetch everything from the Web server, page by page.

However all that story brings a question: how could we make these data dump more visible? Ecademy.com use a <link> element from their home page to link to a dump of the URLs to their FOAF profiles. However, if you don’t check at the HTML code of the page, you will never be aware of it. A first step would probably be to create a repository of these data dumps.

The SWEO Community Project started the “Linking Open Data on the Semantic Web” project that is basically a list of RDF dumps from different web site or projects.

Personally what I will do to help people finding these RDF dumps (and to make them aware of their existence) is to create a repository of these RDF dump on Pingthesemanticweb.com (should be available later this week).

That way, developers using Pingthesemanticweb.com will probably check that list and then download the data they need. After that, they will only use PTSW to synch their triple store with the remote service’s database (Geonames.org for example).

RDF dump vs. dereferencable URIs

February 1, 2007 Frederick Giasson

In a recent mail thread, someone was asking what was the best way to get RDF data from a source [having more than a couple of thousands of documents]: a RDF dump or a list of dereferencable URIs?

None is better than the other. Personally what I prefer is to use both.

If we take the example of Geonames.org, getting all the 6.4 million of RDF documents from dereferencable URI would take weeks. However, updating your triple store with new or updated RDF documents with a RDF dump would force you to download and re-index it completely every month (or so). This task would take some days.

So what is best way then? There is what I propose (and currently do):

In fact, the first time I indexed Geonames into a triple store, I requested a RDF dump to Marc. Then I asked him: would it be possible for you to ping Pingthesemanticweb.com each time a new document appears on Geonames or each time a document is updated? In less than a couple of hour he answered to my mail and then Geonames was pinging PTSW.

So, what it means? It means that I populated my triple store with geonames with a RDF dump for the first time. By proceeding that way I saved one to two weeks of work. Then I am now updating the triple store via Pingthesemanticweb.com. By proceeding that way, I save 2 or 3 days each month.

So what I suggest is to use both methods. The important point here is that Pingthesemanticweb.com acts as an agent that send you new and updated files for a specific service (Geonames in the above example). This simple infrastructure could save precious time to many semantic web developers.

Frederick Giasson

Machine Learning, Engineering & Data

Month: February 2007

Music Ontology revision 1.10: introduction of musical performances as events in time

Zitgist: a semantic web search engine

Revision 1.03 of the Music Ontology

Distribution of semantic web data

RDF dump vs. dereferencable URIs