Frederick Giasson – Page 38 – Machine Learning, Engineering & Data

Wikipedia concepts to help managing Semantic Web described subjects

January 22, 2007 Frederick Giasson

This article is an aggregation of thoughts I had while working on one of my recent semantic web projects. I have no idea what these ideas worth, but I hope to open a discussion regarding the problematic described bellow.

The problematic

What is beautiful with the semantic web is that anyone can define anything about anything. I can define myself; I can define a project; I can define a car; Etc.

What is so beautiful and powerful is also a huge problem and probably the bigger weakness of the semantic web.

In fact, I can define myself with a FOAF profile. This FOAF profile describes relations about me. But nothing stops anybody to define other relations about me. One can describe good things about me, but one can describe bad things as well.

How does it happen?

Readers: you will need some understanding of semantic web concepts to read the rest of that article.

Basically, I can use any URI (unique resource identifier) to describe relations about me. But anybody can define new relations with that URI as well.

The problem arises if I get each RDF files from these two people and that I index them in the same triple store. Then I would get many relations (defined by two different people) for the same subject: me.

Do you see what the situation is starting to look like?

Same as Wikipedia

Yeah, the problem seems to be the same as Wikipedia’s. In fact it is the same thing, except that Wikipedia is centralized on a same infrastructure, and that the semantic web is a decentralized over the Web.

Since we can’t restrict people to use a URI to describe it, we have to find another solution.

Some possible solutions

Possible solutions exist. By example, people indexing these documents could only index URI that are dereferenceable (this mean an URI I can look up over the HTTP protocol to get a RDF file about the identified resource).

That way, they would restrict Bob to define anything about my URI from another web server than my own.

A variant of that solution would be to get documents only from a list of trusted resources (like most of the current memetrackers does).

However all these solutions downgrade the power of RDF: defining anything about anything. This mean that if I want, I can define everything about my dog “bud” even if he is not resolvable over HTTP.

Another solution: a Wikipedia like supervising authority

One solution could be to create an authority that would supervise the evolution of the description of URIs in the semantic web.

You should see the interface as the one of Wikipedia expect that we would not use WikiWords but URI to define things.

So people could register to that “authority”. They could define things about these URIs. Conflicted URIs could be tagged as is. Discussions about URI description could be open, Etc.

How such a system could be use as a solution to the problematic?

The authority could create an ontology to define that “meta-information” about URIs. Then, they would get all the information from the service and publish it in RDF using the ontology.

From that point, everybody that display information about URIs could add the “meta-information” about URIs in their results. That way, their users would see if false information is included into the result, if the things defined for the URI (the subject) are conflicted, etc.

Conclusion

For semantic web developers this solution is as simple as indexing RDF data into a triple store: (1) download (2) index (3) query and (4) display.

This solution is not perfect, but it could help developers to display meaningful information to their system users while keeping the description power of RDF.

Reaching at least 600 000 people with 19 contacts

January 21, 2007January 22, 2007 Frederick Giasson

Sergio Fernández and Iván Frade lately started a really interesting experience called Futil. This is a small computer program that got Sergio’s FOAF profile as seed person to discover new people from its relations (the friend of a friend of a friend, etc). The experience is to discover how much people you can find only starting from people you know. So far, Sergio’s Futil program found about 600 000 people. I guess that it should discover around 2 500 000 people before it finishes.

The experience is quite interesting in many ways. It gives some insight on how people are connected together, and even more important in today’s web, how communities of users are interacting with the Web.

The graph bellows show how Futil is discovering these profiles. The Y-axis represent the number of pool of people it has to get from the Web and the X-axis is number of profiles it got so far.

The first 50 000 people Futil found were coming from different places on the web. It could be a personal web page, the web page of an organization, etc. Then, eventually, Futil found a couple of links to people belonging to an online community called Tribe. People of that community only link to other people of the same community. What is interesting is that as soon as Futil started to crawl a couple of people of that community, it eventually found all the 200 000 people belonging to that community. Now the same thing is currently happening with another community, much bigger, called Livejournal, with about 2 million of users.

Why Futil only crawled people from the same community? The answer is easy: because these communities are closed. They don’t interact with the rest of the Web. So one user can only link to other community users.

How to open a community and let its users interact with other users, of other online communities?

A first step would be to let people describing their relationship with other people outside of their community.

One example of such an online community is Talk Digger. This system let its users importing (and synching) their FOAF profile from another location on the Web. It also let its users defining their relationship with other people outside of the community. By example, a user can say that he knows the people X and Y on Talk Digger; but it can also specifies that he knows a person Z from outside of the community, or from another online community.

In fact, if other online communities would add such a feature to their system, inter-communities communications and relationships could then be possible.

You can read an old blog post that explains how Talk Digger is handling FOAF profiles.

Why online communities system should open themselves?

Why a user will use an online community and not the other? It depends; I would say that it principally depends on: the topic(s) of the community, the people he knows in that community, and the user interface of that community (after all, one interface don’t work for everybody).

So, why online communities shouldn’t let their users interacting with other online communities users?

I think it is an error caused by the fear of loosing users and it explains why Futil behaved that way: because current online communities doesn’t let its users interacting with people from outside of the community.

Futil is pinging Pingthesemanticweb.com as well

Well, each time Futil discover a new FOAF profile it pings Pingthesemanticweb.com. So far it pinged about 300 000 new FOAF profiles. It is a good example of how this semantic web pinging service can be used.

Now, everybody has access to these new FOAF files. The best thing would be that such online communities (like Tribe.net and Livejournal.com) would ping the service each time there is a new user, or each time a user update its profile. But in the mean time, independent crawlers such as Futil do the job very well.

Conclusion

The thing I wish now is that future online communities start to let their users interacting with users from other communities. A good start in that direction would be to let them describing their relationship not only with people of the same community, but also with people from outside of the community. By then, meta-communities should start to emerge.

Major revision (1.01) of the Music Ontology

January 6, 2007January 8, 2007 Frederick Giasson

This is a draft of the Music Ontology revision 1.01. I took all the propositions posted by people on the mailing list since I released the revision 1.00 and then I wrote that new revision.

The ontology took a major shift by its deep integration with the FRBR (Functional Requirements for Bibliographic Records) and FOAF (Friend Of A Friend) ontologies.

As you will see with the classes schemas bellow, all the MO classes related with music are sub classes of FRBR classes. The FRBR ontology is used as the basement of musical works. So as you will see, a mo:Album is a sub class of a mo:MusicalWork and this class is a sub class of the frbr:Work class. This means that an album is ultimately a work in the sense of the FRBR ontology. The FRBR ontology is better explained by reading the Functional Requirements for Bibliographic Records – Final Report.

In this revision of the MO ontology, you should read these terms as:

Work: An abstract notion of an artistic or intellectual creation.

Musical work: Distinct intellectual or artistic musical creation.

Expression: A realization of a single work usually in a physical form.

Musical expression: Intellectual or artistic realization of a musical work.

Manifestation: The physical embodiment of one or more expressions.

Musical manifestation: Physical embodiment of an expression of a musical work.

Item: An exemplar of a single manifestation.

Corporate body: Organization or group of individuals and/or other organizations involved in the music market.

Music Ontology Classes Hierarchy

There are the schemas of the hierarchy of the Music Ontology classes. The pink bubbles are super-classes of external ontologies, green bubbles are super-classes of the MO ontology and blue bubbles are subclasses of the MO ontology.

Change log

There are the changes I made to the revision 1.00 of the MO ontology. All the changes have been wrote into the RDFS document describing the ontology. Please read it to see what changed for each classes and properties. Also note that I cleaned and redefined all the definitions in relation with the FRBR and FOAF ontologies.

So I outlined the changed bellow:

Suppressed class mo:Type

Suppressed class mo:Other

These classes were useless and confusing.

Suppressed class mo:EP

Suppressed class mo:Longplay

Suppressed class mo:Single

I deleted these three classes because they are all albums with less or more tracks. Since it doesn’t add anything to the semantic of the ontology, I choose to remove them to focus the ontology. Now one should describe each of these type of release as a mo:Album.

Changed the name of mo:Status for mo:ReleaseStatus

The class has a better meaning that way since the subclasses of mo:Status were in fact release status.

Added classe mo:Genre

Added classe mo:Classical

Added classe mo:Rock

Added classe mo:Jazz

Added classe mo:World

Added classe mo:Hiphop

Added classe mo:Country

Added classe mo:Blues

Added classe mo:Electronica

Added classe mo:Gospel

Added classe mo:Funk

Added classe mo: Pop

Added classe mo:Melodic

Added classe mo:Reggae

These classes handle a first level musical genre. One can now extend the ontology by creating new genres and/or subgenres of the existing genres.

Added classe mo:Instrument

Added classe mo:String

Added classe mo:Woodwind

Added classe mo:Brass

Added classe mo: Percussion

Added classe mo:Keyboard

Added classe mo: Digital

These classes handle a first level of instrumentation. One can now extend the ontology by creating classes for instruments belonging to these instrument categories.

Added classe mo:MusicalManifestation

Added classe mo: Dat

Added classe mo: Dcc

Added classe mo:Cd

Added classe mo:Md

Added classe mo: Dvda

Added classe mo:Sacd

Added classe mo:Vinyl

Added classe mo:Megnetictape

Added classe mo:Steam

These classes handle a first set of possible mediums of musical recording and distribution technologies.

Added classe mo:MusicalWork

Added classe mo:MusicalExpression

Added classe mo:MusicalManifestation

Creating subclasses of the frbr:Work, frbr:Expression and frbr:Manifestation specifically to express musical works.

Added classe mo:SoloMusicArtist

Added classe mo:MusicGroup

Added classe mo:CorporateBody

Added classe mo:Label

These classes describe musical people, group of people or corporate body in relation with the FOAF ontology. That way, all the FOAF, BIO, etc. ontologies can now be use to define an artist, a group of artist a corporate body and their relationship.

Changed the name of mo:Artist for mo:MusicArtist

Specifying the semantic of the mo:Artist by renaming it to mo:MusicArtist to explicit the fact that the ontology talk about musicians (and not writers, etc.).

Added property mo: possess_item

Added property mo:want_item

Added property mo:sell_item

Added property mo:exchange_item

These new properties are used by people to express the fact they possess exemplar of a musical manifestation, and if they want to sell or exchange it, or if they want an exemplar of a musical manifestation, and that, directly in their FOAF profile.

These properties will be quite useful to let people trade CDs for example. Check bellow for an example of how this will be implemented.

Added property mo:instrument

Link a person, a group of person, a musical work, the expression of a musical work, the manifestation of a musical work or an exemplar of a manifestation to a musical instrument.

Added property mo:key

Added property mo:timber

Added property mo: pitch

Added property mo:lyric

Adding expressiveness to describe mo:Track.

Added property mo:encoding

Added property mo:stream_url

Suppressed property mo:miscellaneoused

Suppressed property mo: performance_name

Deleting useless properties.

Finally I changed the domain and the range of most of the existing properties to reflect the changes in the way classes now work (the FRBR and FOAF ontologies).

An example of how the new mo:has_item, mo:sell_item, mo:exchange_item and mo:want_item are working

These new properties are quite interesting since it will enable people to trade music using their FOAF profile. There is an example of how these properties should be use. You have my FOAF profile, the properties that explicit the fact that I have and want to sell or exchange some musical albums and you have the description of these albums.

RDF document demonstrating the integration of the MO and FOAF ontologies

The resulting graph can be queried with SPARQL queries like:

This query will return the name of the people selling the album “Kill ’em All”.

PREFIX mo:      <http://purl.org/ontology/mo/>
PREFIX foaf:    <http://xmlns.com/foaf/0.1/>        
SELECT ?name
WHERE 
{
    ?seller a foaf: Person;
            mo:sell_item <http://mm.Music.org/album/a89e1d92-5381-4dab-ba51-733137d0e431>;
            foaf:name ?name.
}

But what if someone doesn’t know the resource that define an album, or if the album is defined by more than one resource?

This query will return the name of the people wanting the albums having the word “Love” in its title.

PREFIX mo:      <http://purl.org/ontology/mo/>
PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf:    <http://xmlns.com/foaf/0.1/>        
SELECT ?name ?title
WHERE 
{
    ?seller a foaf: Person;
            mo:sell_item ?album;
            foaf:name ?name.

    ?album a mo:Album;
           rdfs:label ?title.
    FILTER regex(?title, "love", "i")
}

Would you like to make people happy? Then you can easily get the name of people selling an album and get the name of people that want that album.


PREFIX mo:      <http://purl.org/ontology/mo/>
PREFIX foaf:    <http://xmlns.com/foaf/0.1/>        
SELECT ?name_seller ?name_buyer
WHERE 
{
    {
        ?seller a foaf: Person;
                foaf:name ?name_seller;
                mo:sell_item <http://mm.Music.org/album/a89e1d92-5381-4dab-ba51-733137d0e431>.
    }
    UNION
    {
        ?seller a foaf: Person;
                foaf:name ?name_seller;
                mo:exchange_item <http://mm.Music.org/album/a89e1d92-5381-4dab-ba51-733137d0e431>.
    }
    UNION
    {    
        ?buyer a foaf: Person;
               foaf:name ?name_buyer;
               mo:want_item <http://mm.Music.org/album/a89e1d92-5381-4dab-ba51-733137d0e431>.
    }
}

Finally… do what you want 🙂

Conclusion

Please take the time to revise this draft if you are interested in the ontology. I will wait some days before rewriting the documentation of the ontology to make sure that people agree with the draft and to make sure that there is no major error.

First round of revisions for the Music Ontology

December 22, 2006 Frederick Giasson

After only one day 6 people are already registered to the mailing list of the Music Ontology and many suggestions have already be made.

Considering all the enthusiasm this new Music Ontology seems to generate, and considering all the feedbacks I get from many knowledgeable people, I chose to publish a first list of possible revisions to the ontology.

You can take a look at these propositions on the mailing list. If you have any thoughts about these propositions or have new ones, please subscribe to the group and share them with us via the mailing list.

More on the evolution of the ontology later.

The Music Ontology: a new ontology based on the MusicBrainz project

December 21, 2006March 16, 2008 Frederick Giasson

Internet changed the music industry. At first, sharing systems like Napster allowed people to share any song they had on their computer with millions other people. That new reality changed the music industry’s landscape for good, and many juridical battles followed. However, a biggest change followed a couple of years later. Communities like MySpace started to appear. Strong of millions of regular users, such communities helped garage bands and obscure musicians to create their musical niche: the longtail of the music industry.

This second change is more profound than the first one: now any musician has the possibility to reach their audience by sharing their work on the Web. In the mean time, a free database called MusicBrainz archiving million of between artists, albums and tracks appeared; music suggesting services like Pandora started to appear and Apple started to sell individual tracks at 1$ with iTunes.

At that point, the music industry of the eighties leaded by blockbusters was completely changed.

Introduction

I am pleased to announce you the publication of a new Music Ontology Specification. I spent the last days writing it having in mind to describe the new MusicBrainz metadatabase structure using RDF and ultimately to write a specification that any music content creator/publisher could use to export the data they are generating.

Please let me know if you find any error in that new ontology, if you have any suggestion to enhance it or if you have any comments.

You can leave comments/suggestions on this blog post or on the related Google Group: http://groups.google.com/group/music-ontology-specification-group

The Music Ontology

The Music Ontology is an attempt to link all the information about musical Artists, Albums and Tracks together: from MusicBrainz to MySpace. The goal is to express all relations between musical information to help people finding anything about music and musicians. It is based around the use of machine readable information provided by any web site or web service on the Web.

Why another music ontology?

Leigh Dodds wrote an ontology based on MusicBrainz about 3 years ago called the MusicBrainz Metadata Vocabulary. At that time, the MusicBrainz database was not as developed as the one available today.

For that reason, I choose to write a new ontology, also based on the MusicBrainz project considering that source of information about music. I developed that new ontology having three goals in mind:

I needed to stay as close as possible to the MusicBrainz database.
I need to reuse the basic principles of the MusicBrainz Metadata Vocabulary.
I need, at the same time, to develop a music ontology that people could use in their system (MySpace, Pandora, blogs, Etc.) and not just in conjunction with the MusicBrainz relational database.

The first goal explains why this new ontology is so influenced by the MusicBrainz database. In fact, most of the classes, properties came from the relations described in the database, and most of the descriptions of these relations came from the wiki of the project.

The second goal explains why the basic classes of the Music Ontology are the same as the one in the MusicBrainz Medata Vocabulary.

The third goal explains why the name and the namespace of the MusicBrainz Metadata Vocabulary have been changed.

What next?

From that point, I will export a RDF version of the MusicBrainz ontology using that new music ontology. Then I’ll index this new RDF data into the triple store, based on Ping the Semantic Web, I talked about a couple of weeks ago (a first version should be released soon by the way).

From that point, people will be able to query the MusicBrainz ontology using the SPARQL endpoint. As I shown in the specification with a couple of SPARQL queries, people will have much more ways to query the database to answer their questions about music things.

More Information

For more information please read the entire Music Ontology Specification.

The ontology has 19 classes and 58 properties.

The namespace of the ontology is http://purl.org/ontology/mo and the prefix I suggest to use is “mo”.