Archive for the 'Web' Category

Content negotiation: bad use cases I recently observed

Print This Post Print This Post

Given the current projects I am working on, I daily see misuse of content-negotiation methodology, particularly the misuse of the Accept and Content-type HTTP header parameters.

As you will see bellow, I came across many misuse of these HTTP header parameters: potentially by their misunderstanding, or simply by forgetting to set them properly when content is negotiated between their web servers and applications requesting pages.

In any way, people should take a greater care about setting the content-negotiation properly between their servers and other applications. In fact, I saw many examples, on many web servers: from the semantic web research groups, to the hobbyists.

The principle

The principle is simple, if a requester sends a HTTP query with the Accept header:

Accept: text/html, application/rdf+xml

The web server should check the priority of the mime types that the requester is requesting and send back the requested document type with the greater priority, along with the Content-type of the document in the HTTP header answer.

The Content-type parameter is quite important, since if a user application request a list of 10 mimes having all the same priority, it should know which of them as been sent by the web server.

Trusting the Content-type parameter

It is hard.

In fact, Ping the Semantic Web do not trust any web server that returns Content-type. This parameter is so misused that it makes it useless. So I had to develop procedures to detect the type and the encoding of files it crawls.

For example, sometime, people will return the mime TEXT/HTML when it facts it’s a RDF/XML or a RDF/N3 file; this is just one example among many others.

The Q parameter

Another situation I came across recently was with the “priority” of each mime in an Accept parameter.

Ping the Semantic Web is sending this Accept parameter to any web server from which it receives a ping:

Accept: text/html, html/xml, application/rdf+xml;q=0.9, text/rdf+n3;q=0.9, application/turtle;q=0.9, application/rdf+n3;q=0.9, */*;q=0.8

The issue I came across is that one of the web servers was sending me a RDF/XML document for that Accept parameter string, even if it was able to send a TEXT/HTML document. In fact, if the server was reading “application/rdf+xml” in the Accept parameter, it was automatically sending a RDF document to it, even if it has a “lesser priority” than theTEST/HTML document.

In fact, this Accept parameter means:

Send me text/html or html/xml is possible.

If not, then send me application/rdf+xml, text/rdf+n3, application/turtle or application/rdf+n3.

If not, then send me anything; I will try to do something with it.

This is really important to consider the Q (or absence of the Q) parameter. Because its presence, or its non-presence, mean much.

Discrimination of software User-agents

Recently I faced a new kind of cyber-discrimination: discrimination based on the User-agent string of a HTTP request. In fact, even if I was sending “Accept: application/rdf+xml”, I was receiving a HTML document. So I contacted the administrator of the web server and he pointed me out to an example available on the W3C’s web site, called Best Practice Recipes for Publishing RDF Vocabularies, which explained why he has done that:

# Rewrite rule to serve HTML content from the namespace URI if requested
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^example5/$ example5-content/2005-10-31-docs/index.html [R=303]

# Rewrite rule to serve HTML content from class or prop URIs if requested
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^example5/(.+) example5-content/2005-10-31-docs/$1.html [R=303]

So, in the .htaccess file they published in the article, we can see that if the user agent is “Mozilla” it will send a HTML document.

However, it is wrote in the same document:

Note that, however, with RDF as the default response, a ‘hack’ has to be included in the rewrite directives to ensure the URIs remain ‘clickable’ in Internet Explorer 6, due to the peculiar ‘Accept:’ header field values sent by IE6. This ‘hack’ consists of a rewrite condition based on the value of the ‘User-agent:’ header field. Performing content negotiation based on the value of the ‘User-agent:’ header field is not generally considered good practice.

So no it is not a good practice, and people should really take care about this.

Conclusion

People should really take care of the Accept parameter when their server receive a request, and send back the good Content-type depending on the document they send to the requester. Content-negotiation is becoming the main way to find and access RDF data on the Web, and such behaviors should be fixed by web server administrators and developers.

Has Robert Scoble got some incentives to ‘finally’ get what the semantic web is?

Print This Post Print This Post

Everybody do errors and I have just done one.

Thanks for proving me that I was wrong.

I probably should have wrote a blog post about it instead of writing a comment, that way I would have been sure that you get it (like this blog post that created an instant reaction).

Robert, it seems you didn’t received my email 4 days ago, so I am sorry about that.

Anyway it doesn’t change the essence of this blog post, and my comment. This is not a good start, but a good way, to try to tie the link between the “Web 2.0″ (sorry but I don’t like that term ;) ) and the Semantic Web [academic] community. There are much things going on around that could benefit everyone.

The only thing I would like to say that people would remember is that the Semantic Web is not the result of one or a couple of companies, but the result of a Whole; the result of the interaction between all beings.

Robert, I hope you will continue to dig deeper to find all the things people are working on related with the Semantic Web. Sorry about that, and I hope you a beautiful day!

I am asking the question and I hope I am wrong.

Some days ago Robert Scoble wrote an enflaming post about what Radar Networks are currently developing. This “thing” (I refer to a “thing” because no body know what it really is (some type of semantic web system)) finally helped Robert to understand what the semantic web is.

At that moment I was happy to see that a “Web 2.0″ guru understood how Semantic Web technologies could help him; how they could be used to make the World a better place to live in.

Then I told myself: “Fred, help him to see what other people are doing in that direction too. Show him what you are working on; what other people are developing too; what they are writing on the subject; Etc.”

Then I wrote that comment on his blog post:

Hi Robert,

Could I suggest a couple of reading in that direction that could potentially interest you?:

 

  1. Zitgist Search Query Interface: A new search engine paradigm
  2. The Linked-Open-Data mailing list
  3. Planet RDF


From there, you will be able to dig deeper into the semantic web community, the ideas it plays with, what the Web is becoming, etc.

Hope it can helps some people to eventually understand what is going on with the semweb.

Take care,

Fred

This comment has never appeared on its blog post. It seems he rejected it by moderation. I sent him an email 3 days ago and he never replayed to me.

Why Robert rejected this innocent comment? I got my idea that lead to the topic of this blog post: “Has Robert Scoble got sone incensitives to ‘finally” get what the semantic web is?”

Does Robert rejected it because I was referring to Zitgist; and that is a possible competitor to what Radar Networks is working on right now?

I have no idea, but I am always frustrated to see when bloggers doesn’t tell to their readers they got some incentives to write articles about special things.

Otherwise, why my comment got rejected? I have no idea, but I would like to know.

At the end, these people will probably have to learn that the Semantic Web is more about cooperation between people, enterprises, other entities and honesty than a more traditional way to do things and business.

I think that the Semantic Web will change things in a major way, as long as people, societies and the way we live.

Making the bridge between the Web and the Semantic Web

Print This Post Print This Post

 

Many people think that the semantic web will never happens, at least in next few years, because there is not enough useful data published in RDF. This is fortunately a misconception. In fact, many things are already accessible in RDF, even if it doesn’t appear at the first sigh.

 

Triplr

Danny Ayers recently pointed out a new web service created by Dave Beckett called Triplr: “Stuff in, triples out”.

Triplr is a bridge between well-formed XHTML web page containing GRRDL, RSS and their RDF/XML or Turtle formatting.

Here is an example

 

Virtuoso’s Sponger

Another bridging service called the Sponger also exists. Its goal is the same as Triplr: taking different sources of data as input, and creating RDF as output.

The Virtuoso Sponger will do everything possible to find RDF triples from a given URL (via content-negotiation and checking for “link” elements in HTML files). If no RDF document is available from a URL, it will tries to convert the data source available at that URL into RDF triples. Converted data sources are: microformats, RDFa, eRDF, HTML meta data tags, HTTP headers, as well as APIs like Google Base, Flickr, Del.icio.us, etc.

 

How does it work?

  1. The first thing the Sponger is doing is trying to dereference a given URL to get RDF data from it. If it finds some, it returns it, otherwise, it continues.
  2. If the URL refers to a HTML file, the Sponger will try to find “link” elements referring to RDF documents. If he finds one or more of them, it will add their triples into a temporary RDF graph in and continue its process.
  3. If the Sponger finds microformat data into the HTML file, it will maps it using related ontologies (depending on the microformat) and will creates RDF triples from that mapping. It will add these triples to the temporary RDF graph and continues.
  4. If the Sponger finds eRDF or RDFa data into the HTML file, he will extracts them from the HTML file and add them into the RDF graph and continues.
  5. If the Sponger find that it is talking with a web service such as Google Base, it will maps the API of the web service with an ontology, creates triples from that mapping and includes the triples into the temporary RDF graph and continues.
  6. If nothing is found and that there is some HTML meta-data, it will maps them with some ontologies, creates triples and add them to the temporary RDF graph.
  7. Finally, if nothing is found, it will returns an empty graph.

The result is simple: from any URL, it is most than likely sure that you will get some RDF data related to that URL. The bridge is now made between the Web and the Semantic Web.

 

Some examples

There are some examples of data sources converted by the Sponger:

 

Conclusion

What is fantastic for a developer is that he only has to develop its system according to RDF to make its application communicating with any of these data sources. The Virtuoso Sponger will do all the job of interpreting the information for him.

This is where we really meet the Semantic Web.

With such tools, it is like looking at the semantic web in a lens.

Zitgist Search Query Interface: A new search engine paradigm

Print This Post Print This Post

 

People start to talk about Zitgist: What is it? How does it work? When will it be released? Etc. This is the first article of a series to come that will explain the first portion of the service: the search query builder user interface. As you will see bellow, there are many considerations to take into account when dealing with the development of a semantic web search query builder.

The difference between a traditional search engine like Google and a semantic web search engine like Zitgist is that the aggregated, indexed and queried data is different. Google mostly use text files such as HTML, PDF, DOC, Etc., and Zitgist use RDF files from genuine or converted data sources.

This difference has a big impact on how users build queries to answer to their questions. Google users use keywords to try to define what they are searching for. Then the search engine will check in their database to find these keywords into the texts they aggregated and indexed.

The new paradigm introduced by semantic web search engine such as Zitgist is different: users will describe the characteristics of the subjects of their search instead of using keywords.

As you will see, the difference between using keywords and describing characteristics of subjects will have a great impact on the user interface used to build these search queries.

 

A first query

 

1.jpg
[Click to enlarge]

 

The first step is to choose which type of subject a user is searching for. In the first version of Zitgist, we let users choosing among some type of subjects: musical things such as artists, bands, albums, tracks, performances, or people, groups, projects, geographical locations, documents and discussion forums.

Once the user choose what he was searching for he has to describe the characteristics of that subject.

2.jpg

[Click to enlarge]

In that first example, the user tries to find a person. As you can see, there are some characteristics describing a person that can be defined by the user. Depending on the user interface (basic or advanced) more or less characteristic will be available for the description of that subject.

So the user chooses to search for a person that has the name “Chris” and that is interested in the “Semantic Web”.

 

3.jpg

[Click to enlarge]

 

The search engine will then return results matching subjects know by Zitgist having these two characteristics.

Using Google, the user would have use the query string “chris semantic web” that has three distinct keywords: (1) “chris” (2) “semantic” and (3) “web”. The problem is that there is no relation between these keywords. Is he searching for someone named Chris that is working in the semantic web domain or that is interested in the semantic web? Is the user searching for something else? There is no way to know. The best Google can do, is putting their algorithmic magic into action to try to find what the user is searching for, and hoping it is really what he wants.

But for Zitgist, if the person [the subject of the search] defined himself as having the name Chris and having an interest in the Semantic Web (defining himself using RDF) than we will know that the results are definitely what the user is searching for.

Note: one of the next article will be dedicated to what will happens once a user get results from Zitgist.

 

Describing relationship between more than one subjects

The first example was quite simple. However Zitgist’s query builder interface take all its senses once we try to push it a little further.

How a user could easily describe a subject, with its own set of characteristics, that knows another subject, also with its own set of characteristics?

 

4.jpg

[Click to enlarge]

 

In this example, we have a user that search for a person knows as “Alice”. But he doesn’t search for any person named “Alice”, no. This user wants to find a person knows as “Alice” that know another person named “Bob”.

As you can see in the image above, it is quite easy to do using Zitgist. The user described the subject he wants to search for: “Alice”. This subject is a person with the name “Alice” that “knows” a person called “Bob”.
As you can see, the user interface changed its color when we introduced a new subject into the query ["Bob"]. That way, users can easily see which subject they are describing.

After that, the user could always add new characteristics to Bob. He could say that Bob is interesting in writing and that he lives near London for example.

 

 

In fact, the possibilities are endless.

 

One more step

What is interesting with the semantic web is that anybody can describe anything. One of these interesting example is when we start to think about Document. In fact, what are documents? What describe a Document? Etc.

A document can be described with an author, a creation data, a publication date, an editor, a publisher, its medium, etc. But its content can also be described such as its topics.

 

5.jpg

[Click to enlarge]

 

If someone describes one of the documents he created and that explicited the topic(s) of that document, Zitgist could easily find it that way:

 

6.jpg

[Click to enlarge]

 

As you can see a user can search for a “Document” that as a “Person” named “Alice” that “knows” another person named “Bob” as the “Topic” of the “Document”.

So, if someone would start to describe novels that way, we could easily search for books where its protagonist is called Alice and that is living in London. Wouldn’t that be a terrific way to find books you could like to read? The only thing we need at the moment is people starting to describe books that way: hobbyist, authors or publishers.

 

We always knows the data we are manipulating

What is fantastic with the seman

tic web is that we always know what is the data we are manipulating. As you will see in next articles, this characteristic of the system is the main one when comes the time to talk about users interfaces. However, I will introduce it in this article using the query builder interface.

 

7.jpg

[Click to enlarge]

In the example above a user tries to find a geographic location near a certain place. However the question for the user here is: how should I describe that location? By a name? Which name? By a latitude and longitude? How to? Etc.

By the fact that we know what is the type of the data the user is looking for we can try to assist it with some widgets.

In the example above we know that the user is searching for a geographical location. Ultimately, a geographical location on Earth is defined by a longitude and latitude. So what we do is showing a map widget to the user. The only thing he will have to do is to click on the map to choose the location. That is it. The user interface widget is intuitive for users, and he doesn’t have to bother about how to describe the location.

Another example:

 

8.jpg

[Click to enlarge]

 

Now the user try to find a ” Music Artist” that “composed” “Albums” between “1980″ and “1990″.

In such a case, how the user is supposed to describe that fact? Would he writes dates like “1980-01-03″? “1980-03-01″? ” 3 January 1980″? Etc…

Since Zitgist knows what the user is trying to describes, it only popup a small widget that will assists the user in the creation of its search query.

This is by far the greatest strength of the semantic web when come the time to talk about user interfaces. Since the interface knows what is the type of the data being manipulated, it can do a full of things to help users to do what he really want to. And what a user really want to do is certainly not asking questions like: how should I describe this thing, Etc.

And as you will see in next articles, this is just the beginning.

 

More information about Zitgist

There are a list of blogs post I wrote about Zitgist, explaining what the is project, its goals, its vision, its release, etc.

 

Conclusion

Zitgist’s goal is not to be a replacement to traditional search engines such as Google. In short and middle term its goal is to be complementary to traditional search engines; to be another tool in Web users’ toolkit.

As you can see by the description of this semantic web search engine query interface, the semantic web and semantic web search engines like Zitgist will be quite useful to make some order, classify and search in all the data that has been created so far and that is yet to be created on the Web.

In the next articles I will continue to roll out what Zitgist is, where we are with the project and how it integrates into the semantic web that is now emerging on the Web.

The only thing you have to do is to sit down and check the show.

No I am wrong, the only thing you have to do is board the train and continue with us by asking question, making comments and suggestions, describing your data using RDF, letting Zitgist integrating it into its database, etc.

Welcome aboard.

Dynamic Data Web Page

Print This Post Print This Post

What is a dynamic data web page? It is a shape-shifter data source. That is it. It is a source of data that will change its shape depending on the request that has been made on the data source.

 

Shapes of the data source

The data source will shape the format of its output depending on what you need. If you are a human, you would like to have something that you can read and understand like a HTML web page. However, if you are a web service, you would probably like to get the data in a shape that you could easily understand such as RDF, XML, JSON, etc.

It is as simple as that: a Dynamic Data Web Page is web page that will outputs data in different formats depending on what the requesting users wants.

There are many formats:

  1. HTML – Human readable
  2. RDF/XML
  3. RDF/N3
  4. XML
  5. JSON
  6. Others could be easily implemented if needed.

 

In Dynamic Data Web Page there is: a Web Page and Data

A DDWP is two things:

  1. A Web Page: as we saw above, it is a way to present/publish the data of the source formatted in some way.
  2. Data: as we will see bellow, it is the source of data.

This said a DDWP is nothing else than a source of data published in some ways.

 

Dissection of a Dynamic Data Web Page

 

 

0. Creation of the data source. The preliminary step for the data source (triple store) is to continually index RDF data sources. If we are talking about a generic service, then it should aggregate RDF data from everywhere: the Web, specialized databases such as Musicbrainz, Wikipedia, etc. If it is a specialized system such as the products catalogue of a company, it should constantly synch its triple store with the catalogue. This constant operation will create a valuable data source.

1. Creation of a SPARQL query. An end user wants information. This end user can be anything: a user, a developer, a web service, etc. This user will build a SPARQL query that will returns the results from the data source.

2. Saving the SPARQL query. The SPARQL query will then be saved on the web server of the service. As soon as the query will be saved on the server.

3. Assigning a URL to the SPARQL query. A URL will be assigned by the web server for that saved SPARQL query. From there, anybody will be able to access to the results of the query by looking at that URL.

4. Accessing the URL

4.a. Sending the HTTP query. In our example, a web service tries to get the results returned by the SPARQL query from the DDWP. To get them, it will send a HTTP query to the web server for that URL.

4.b. Doing content negotiation with remote server. However, the web server wants a XML representation of the results since it is the only format it understands. This request will be done via content negotiation with the web server. It is where the shapes of the DDWP are important: depending on want the user want, results of a SPARQL query will be formatted in one of the possible shapes, depending on what the user wants (content negotiation), and will send them to him.

5. Generating the DDWP according to the content negotiated. The Dynamic Data Web Page will be generated by the web server depending on the content negotiation the two parties agreed on.

6. Sending results to the web service. Finally the results, formatted to meet the user’s needs, will be returned to the user.

 

What this means?

This means that data only matters. In fact, the only thing one need now is to build a good data source. Once the data source is well built (remember, the data source can be anything here, from a search engine database to the products catalogue of a company, or even the personal web page of a 14 years old geek).

From that data source, everything can be generated for each web page (URL). If the content requested is a HTML page, then the data source can generate XML, run a XSLT skin template with and then send a HTML page: just like any other web page. However, from the same data source, a semantic web crawler could request the RDF/N3 data for the same URL. Then the DDWP would send the RDF/N3 representation of the URL.

So from one data source, you can get its data the way you want.

From that point a URL (or a web page, call it the way you want) become a presentation page web, a web service, etc; All-in-one!

 

Some examples

Everything is made simpler with examples, so there we are. All the concept of Dynamic Data Web Page is possible thanks to Virtuoso. All the examples above are using this database management system.

Okay, to illustrate the case, we will use this Google Base Jobs page for example:

Step #0

The triple store will get that Google Base Jobs page, convert it into RDF and then will index the triples into the triple store. This will be the data we will try to access.

Step #1

A user created a SPARQL query that will request all that data. The query look-like:

SPARQL
SELECT ?s ?p ?o
FROM <http://www.google.com/base/feeds/snippets/-/ jobs?start-index=30&max-results=30&key= ABQIAAAA7VerLsOcLuBYXR7vZI2NjhTRERdeAiwZ9EeJWta3L_ JZVS0bOBRIFbhTrQjhHE52fqjZvfabYYyn6A>
WHERE
{
?s ?p ?o .
}

Step #2

The user will save the SPARQL query on the web server in the directory “/DAV/home/demo/Public/Queries/DataWeb/” with the file name “google_base_jobs_dataspace.isparql”

Step #3

The web service will assign a URL to that file:
http://demo.openlinksw.com/DAV/home/demo/Public/ Queries/DataWeb/google_base_jobs_dataspace.isparql

Now the user wants to see the results of the query he just built, he can see them only by putting this URL into its web browser. Then a HTML web page will be generated and displayed so that he can easily consult it.

This is a generic html page. But what about generating XML instead of HTML and then applying a XSLT skin template to generate the HTML for the user? Yeah, you just got another way to create traditional dynamic web pages.

Step #4.a / #4.b / #5 / #6

Now what we want is showing what happen when a web service request results not in HTML but in something else like RDF/XML.

To show you how it happens, we will use the OAT RDF Browser. This is a web service that will get RDF data from somewhere on the Web and that will display it to users via a web interface.

This web service do exactly the steps 4, 5 and 6: it send a HTTP query for a URL. It will do some content negotiation with the remote web server to get RDF data, it will download the RDF data sent by the web server, consume it and display it to the user via the interface.

The result for our example is there. As you can see, from the same URL, the DDWP will send RDF/XML data instead of HTML. Then the web service will consume it, and display the same information in a different way. How different? Well, click on the Yahoo! Map Tab and you will see. You see? The same information displayed on a map that shows where the jobs are in the United-States.

 

Conclusion

Dynamic Data Web Page is not a theory. It is a reality; it is something that already exists in Virtuoso and that can be used by anyone who cares about simplifying the exchange of data between its system and other systems. It is all about Web communication. Instead of talking about language (real world) we are talking about formats (web world).

Music Ontology Revision 1.11: the music creation workflow

Print This Post Print This Post

 
A new revision of the Music Ontology has been released today. The main changes have been made to clarify the description of the music creation workflow. We also added the possibility to describe musical shows and festivals.

All in all, the revision 1.11 of the Music Ontology is certainly the most stable and crystallized incarnation of the ontology.

The change log is available here

 
New projects using the Music Ontology

New projects started to use the Music Ontology to describe musical things. There are a couple of them:

Oscar’s Pendora and Last.fm recommendation system:

Oscar also describe how to:

Yves’s mapping of Magnatune using the Music Ontology.

Also, the Musicbrainz RDF dump using the Music Ontology should be released soon too. I know that I said that it should have been available by this week, however some issues with the rdf views forced me to wait until releasing the dump. I hope having the possibility to make it available by next Monday, as long as with the Virtuoso RDF View files.

 
Description of the music creation workflow

We also worked hard to clarify the music creation workflow used by the Music Ontology. It is the backbone of the ontology: it explains how people should use the ontology to describes musical things. A complete description of the workflow is available here.

 

 

 
Describing shows and festivals

As discussed on the mailing list, we introduced two new concepts in the Music Ontology: Shows and Festivals. It is now possible to describe where and when a show or a festival will happens, as long as who will give a performance at that event.

Using these new concepts, one could easily describe:

The International Jazz Festivial of Montreal will happens between June 28 and July 8 in Montreal. There will be sub-events at the Spectrum. In fact, there will be a show at the Spectrum the 28 June at 10pm. The performer of that show will be the Dave Holland Quintet.

Then the place of International Jazz Festival of Montreal and the Spectrum will be linked with their Geonames. The Dave Holland Quintet will be linked with their Musicbrainz artist description. All that is possible thanks to the RDF data dumps of each of these services and the Music Ontology.

 
Conclusion

The Music Ontology evolved greatly in the past few months. Now we have something solid agreed by many people. Many level of descriptiveness are possible and all of them are compatible together. A garage band can now easily describes itself and their records using the Music Ontology. But also, a music expert can describe everything about Beethoven’s Work and all its incarnations played by other musicians over the centuries.

 

Give it a name if you wish: the Semantic Web; but personally I don’t care.

Print This Post Print This Post

Freebase has been made public recently; it is a sort of Google Base with the goal to “[contribute to collect] data from all over the internet to build a massive, collaboratively-edited database of cross-linked data.”

Tim O’Reilly praised it [1] [2]; he had some thoughts about it and about the semantic web; he gave some opinions that leaded to a storm of blog post on planetrdf.com; Etc.

I will not enter in that debate. I have nothing to say about the Web1.0, Web2.0, Web3.0 or the NextWebVersion.NumberSomething except that these terms make people from around the World quite… unproductive. We have a system that let anybody write, publish and share documents on a space called the Web; and that since its beginning. So let see what we can do that such a system.

What I will do is telling you where I am with my vision of the Web, and to what it could evolve to. I will describe how the projects I am currently working on could make the Web different, hoping to make it better. I will only show you a schemas, with some explanations, of how I see the environment that such projects are currently creating; how users and developers will be able to use and contribute to these systems with the only goal of making the web open and better.

The next Web environment

Click to enlarge this schemas

1. Describing resourcesAs you imagine, the first actors of the system are Web users. The goal of these Web users is to describe things (resources). This behavior is not different from the past: Web users always described things; the only difference is that they have to use new methodologies, but even that, it is not always the case as we will see later.

Users will be able to describe things such as themselves, projects they are working on, relations between musical artists, albums, Etc. by using specialized software that will help them to describe these sort of things. Systems such as Wikipedia, Musicbrainz, Talkdigger, Livejournal, blog system using a SIOC Exportation add-on, Etc. (note: one could wonder why I name Wikipedia or Musicbrainz that doesn’t export anything in RDF; for them I would redirect them to the dbpedia and Music Ontology projects for more information).

But in the future, people will also develop specialized software that will help people to describe virtually everything.

2. Save descriptions

Systems will archive all these descriptions. Dedicated system for that task, some type of portals, personal web pages, specialized blog systems, specialized wiki systems such as the Semantic Media Wiki, Etc. These systems will publish the information to anyone who request it, exactly the same way as Web server publish web page content in HTML. The only difference is that it will use RDF instead of HTML to publish the same data.

3. Notify for new/updated descriptions

Most of these archiving systems will notify “notification [pinging] systems” such as PingtheSemanticWeb.com

That way, new/updated descriptions of something will be published to a multiple of web applications, software, crawlers, software agents, Etc.; requesting new/update descriptions from PingtheSemanticWeb.com

One of these systems is called Zitgist.

What is Zitgist?

In the past, I described Zitgist as a Semantic Web Search Engine. Great you will tell me, but what it means? I will refer you to this blog post I wrote a couple of weeks ago about what Zitgist is.

4. Send references of descriptions

PingtheSemanticWeb.com will send its pings to any system requesting the list of new/updated descriptions. On of these system is Zitgist. That way, it will be able to get the latest new and updated descriptions, and this, nearly in real time.

But any system can do the same.

5. Linked-Open-Data

This is a new project started by many Semantic Web enthusiasts, researchers and companies from around the World.

The goal of this project is to create a meta-database of interlinked databases such as Wikipedia, Musicbrainz, US Census data, DBLP database, Etc.

Such meta-database will be indexed into Zitgist to extend the descriptions its knows.

6. Other database

Other database, not part of the Linked-Open-Data project will also be indexed into Zitgist. In fact, any relational database can easily be converted into RDF and then indexed into Zitgist. One of the good examples of this is the conversation of the Musicbrainz.com database into RDF using the Music Ontology.

7. Describe resources

Zitgist is at the same level as any other application in this environment. So from its interface, Web users will eventually be able to describe things, and relations between these things, directly from its user interface.

8. Search

The more interesting feature of Zitgist is that it lets Web users searching in all that data. By using Zitgist, Web users are able to send queries such as:

  • Give me the name of the albums published by Madonna between 1990 et 2000.
  • Give me the name of the people that are interested in writing leaving near London.
  • Give me the name of groups (group, organization, etc.) that has Brian Smith as member.
  • Give me the name of the computer projects programmed using C++ that work for Linux or Windows.
  • Give me the name of the discussion forums that are related to cooking.
  • Give me the name of the cities in UK that have more than 150 000 people.
  • Give me the name of the documents where its topic is a person named Paul.
  • Etc.

9. Browse

Then from the results returned by Zitgist, Web users are able to browse the information about things. Things can be a:

  • Person
  • Project
  • Geographical location
  • Music artist
  • Band
  • Album
  • Single track
  • Etc.
  • And relations between all these things

10. Possible duplications of Zitgist

None owns anything in that Web, so most of Zitgist could eventually be duplicated in other service. Why? For the only reason that most of the information it uses is available to anyone who wants to do something with it.

Examples

So, as you can see, this infrastructure enables a wide range of possibilities. In fact, a Web user could use Zitgist to find a thread in a Discussion Forum about the Semantic Web. Then he could browse the thread directly using Zitgist. If he has something to say on that thread, he could describe new facts about this particular thread directly on its blog. From there, its blogging system would publish the description to anyone who want it (like a normal HTML Webpage). After having published the data, its blogging system could ping Pingthesemanticweb.com to notify a wide range of applications that new descriptions has been published about a certain thread on a certain discussion forum. Then Zitgist would be notified of this new description and it will index it into its system.

After that other users will be able to see the new facts described by this user directly in the thread and Zitgist’s Web interface.

You could easily replace the subject of the above example by a musical artist, and change the blogging system for the Musicbrainz.com user interface.

Possibilities are endless… but the result is always the same: a distributed, open, meta-web. A sort of distributed meta-wiki created by a wide range of users, applications, services and systems.

Conclusion

This is my vision of my next web. I called it semantic web (as many other does), but you have the leisure to called it the way you want. I am developing Zitgist that should be released for a first round of users in the next few months; I am developing Pingthesemanticweb.com; I started the Musical Ontology; I am also participating to some projects like the SIOC ontology, the linked-open-data initiative; and this is the vision I have that lead my works.

I cited all the projects I am currently working on, but none of them could be possible without the tight collaboration of the semantic web community and the guys at OpenLink Software Inc. I owe them this vision and all my knowledge of the field. Thank you guys.

Music Ontology revision 1.10: introduction of musical performances as events in time

Print This Post Print This Post

I am pleased to publish the revision 1.10 of the Music Ontology. As you can see in the change log:

View the change log

many things changed in that revision of the ontology. In fact, all the ambiguities people noticed with the latest revisions should now be fixed. Yves Raimond worked hard to implement the Time Ontology along with its Event and Timeline Ontologies into the Music Ontology. The result is a major step ahead for the Music Ontology. Now different levels of expressiveness are available trough the ontology.

In fact, one can describe simple things like MusicBrainZ data and relations, or they can describe the recording of a gig on a cell phone, published on the web, by two different people. Take a look at the updated examples to see what each of these three levels of expressiveness can describe.

As you can see, all the examples are now expressed in N3 and XML. These examples are classified in three different levels of descriptiveness. Most people will describe musical things using the first level of expressiveness, however closed and specialized systems will be able to express everything they want related to music using the Musical Ontology (check examples level 2 and 3 for good examples of this new expressiveness power)

One of the last things we have to fix is how genres should be handled. Right now we are typing individuals with its genre. However considering that genres evolve and change really quickly and that they are strongly influenced by cultures, a suggestion has been made to create individuals out of the class Genre and then describing them. Also, we could create s mo:subGenre property (domain: mo:Genre; range: mo:Genre) that would relate a genre to its sub-genre(s).

This idea is really great and would probably be the best way to describe genres considering their “volatile” meaning over time. However the question is: how to link a mo:MusicalWork, mo:MusicalExpression, mo:MusicalManifestation and mo:Sound to its genre? If we create a mo:genre (domain mo:MusicalWork… etc; range rdf:resource), then people could use that property to link a MusicalWork, etc. to anything (anything that is a resource). Personally I think that it is not necessarily a good thing to introduce such a non-restricted property into the ontology.

Note that it is not the same thing as the property event:hasFactor since anything can be a factor of a musical event.

Now that the ontology is becoming pretty stable, the next step is starting to use it to describe things related to music. The first step will be to convert MusicBrainZ’s data into RDF using the Music Ontology. Soon enough I should make available a RDF dump of this data along with a Virtuoso PL that will enable people to re-create this RDF dump from a linked instance of a MusicBrainZ Postgre database into Virtuoso

Finally I would like to give a special thanks to Yves for its hard work and involvement for the publication of that new revision of the Music Ontology.

Technorati: | | | | | | | | |

Zitgist: a semantic web search engine

Print This Post Print This Post
Recently I started to talk about the project I am currently working on and people were wondering what it was. “Hey, what Zitgist is about Fred?” – “I heard a couple of things but I can’t find any information about it” – Etc. So I started to gives away some information about Zitgist (pronounced “zeitgeist”), what it was, what were the goals, etc.

Zitgist is basically a semantic web search engine. Some people could be wondering about what is a semantic web search engine? In what is it different from more traditional search engines such as Google, Yahoo! or MSN Search? It only differs in the information it aggregates, index and use to answer users queries. Instead of using human readable documents such as HTML, PDF or DOC, Zitgist will use semantic web documents (RDF).

The characteristic of these documents is that they describe things. In fact, these documents can describe anything: a person (its interests, its relations with other people, etc.), objects like books, music CDs; projects (computer projects, architectural projects, etc.), geographical locations such as countries, cities, mountains, etc. So, with semantic web documents one can describe virtually anything.

Since Zitgist is aware of the meaning of all these descriptions, powerful queries can be sent by users to let them find what they want. By example, a Zitgist user could send queries such as:

  • Give me the name of the people that are interested in writing leaving near London.
  • Give me the name of groups (group, organization, etc.) that has Brian Smith as member.
  • Give me the name of the computer projects programmed using C++ that work for Linux or Windows.
  • Give me the name of the discussion forums that are related to cooking.
  • Give me the name of the cities in UK that have more than 150 000 people.
  • Give me the name of the documents where its topic is a person named Paul.
  • Etc.

Note: these queries are not built using natural language (such as phrases), but with an easy to use user interface that help users to build the queries they want.

Once a user has built and sent these queries, the search engine will return results matching these criteria. Then if that user click on a result interesting him, he will be redirected to Zitgist’s semantic web browser. This interface display the semantic web document know by Zitgist to users.

This is what is interesting with Zitgist. Since semantic web documents are intended to be used by machines, humans can’t easily read them. This semantic web browser is a user interface that displays the information held in these semantic web documents such that humans can easily and intuitively read and understand them. So from a single result (say, a person), a user will be able to surf the semantic web like if he would be surfing the Web: from link to link. That way, users will use the same behaviors to surf the semantic web has they have when they surf the current Web.

The main goal with Zitgist is to create a semantic web search engine for users that doesn’t even know the existence of the semantic web. A user shouldn’t be aware of the concepts supporting the semantic web to use it. Their experience should be as close as the one they currently have with the current Web and the search engines they use daily.

What is the current state of Zitgist? All the things I mentioned above are working. A first private version should be released in the next months. Some demos to the semantic web community should be performed too. Slowly, in the next months, more and more things will be rolled-out to the public. However be aware that I am not hyping the project here. I will do things that way to make sure that the supporting architecture gives the best experience to all users. For this, we have to scale the architecture slowly to make sure that too much users do not make the service unusable.

In the next week, I should gives more technical information and write documentation about how web developers can make their data available to optimize their indexation into Zitgist. So best practice documents describing how web developer should create their semantic web document will eventually be put online.

Zitgist LLC. is a company founded by me and OpenLink Software Inc. I would like to specially thanks the OpenLink development team that gives me all the support I need to develop a first working version of this new search engine. I wouldn’t be able to write about Zitgist today without them.

Technorati: | | | | | | |

Revision 1.03 of the Music Ontology

Print This Post Print This Post

I just published the revision 1.03 of the Music Ontology. You can notify the addition of mo:Movement, mo:has_movement, mo:movementNum and mo:tempo. Half of the modifications has been made to enhance the descriptiveness of classical music. The other half of the modifications has been made to make the distinction between production (MusicalExpression) and publication (MusicalManifestation) clearer.

All these modifications have been made possible by Ivan Herman and Yves Raimond and I would specially like to thanks them for their insightful comments and suggestions they made via the mailing list.

There is the changes log for the revision 1.03

  • Changed the range of mo:key from a rdfs:Literal to a http://purl.org/NET/c4dm/music.owl#Key
  • Added property mo:opus
  • Added mo:MusicalWork in the domain of: mo:bpm, mo:duration, mo:key, mo: pitch
  • Added class mo:Movement
  • Added property mo:has_movement
  • Added property mo:movementNum
  • Added property mo:tempo
  • Added mo:Instrument in the range of mo:title
  • Remove mo:MusicalWork and mo:MusicalManifestation and added mo:MusicalManifestation to the property’s domain mo: publisher, mo: producer, mo:engineer, mo:conductor, mo:arranger, mo:sampler, mo:compiler
  • Remove mo:MusicalWork and mo:MusicalManifestation and added mo:MusicalManifestation to the property’s range mo: published, mo: produced, mo:engineering, mo:conducted, mo:arranged, mo:sampled, mo:compiled

 

Technorati: | | | | |