Update to the discussion about RSS 1.0 vs. RSS 2.0

Why using RDF instead of XML? [25 May 2006]

“Make everything as simple as possible, but not simpler”. – Albert Einstein.

I love that quote of Albert Einstein. Few words that tell so much to designers. Make things simple to the user, make it such that he does not even know that he his using what you designed (okay, it is an utopia); but beware: do not make it simpler, do not compromise on the capabilities of what you are designing to make it simple (this is all the art of design).

This said, I am currently rewriting the Talk Digger RSS feeds generator for the next release planned in a week or two. While working on it, I found that I done the error: I make it simpler while whishing to make it simple.

Let me explain the situation. Some months ago, I choose to create the feeds in RSS 2.0 instead of RSS 1.0. But what is the problem then? RSS 2.0 should be much more evolved then RSS 1.0, isn’t? No, it is not. RSS 2.0 is about 2 years younger than RSS 1.0, but much simpler. Why do I say that the file format is much simpler? Because RSS 1.0 feeds are serialized in RDF and RSS 2.0 feeds are serialized in XML.

Where is the problem then? XML serialized files are much easier to read than RDF serialized ones; in fact, RDF files are only cluttered XML files, isn’t it? No, definitely not. It is sure that RDF/XML serialized files (because there exist other serialization format like N3 that will also serialize RDF files) are less intuitive to read for humans, but they are much more powerful to answer to some needs.

Personally I see RSS 2.0 as a lesser version of RSS 1.0. Why? Because applications that support RSS 2.0 are much simpler (a thing that we do not want) considering that it only have to handle XML files instead of full RDF ones.

Fred, you are telling us that RSS 1.0 is much powerful than RSS 2.0? Yes, all the power of RSS 1.0 resides in the fact that it supports modules. This capability is given by RDF and his ability to import external RDF schemas to extend his vocabulary. What is a module? A module gives the possibility to the content publisher to extend his file format’s vocabulary by importing external RDF schemas.

Okay, but what is the advantage of using these modules? I will explain it with an example using Talk Digger. I am currently thinking about creating a RDF schema that would model some semantic relations that Talk Digger will compute with the search engines’ returned results. Personally I want to make that information publicly available to anyone who would like to have access to it and do something with it. This said, I am also thinking to broadcast the information directly in the RSS feed: I want to create only one source of information that would broadcast everything. RSS 1.0 gives me that possibility (in fact, a RSS 1.0 web feed is a normal RDF/XML file using the RSS 1.0 schema). It is beautiful, I can make all the information I want available to any one, in a unique source. If a software that read the feed do not understand a part of the information I broadcast (in reality, he do not know the RDF schema I am using) he simply skip it and continue to read the source of information (the web feed) and do what he have to do with the information he understand. I can’t do that with RSS 2.0 because it is serialized in XML and not RDF. I could even add OWL elements in my feeds to model some relations between the knowledge represented in the web feeds. That way an application could be able to infer knowledge from it! An example of a popular module is the Dublin Core metadata initiative.

You are probably thinking: yeah Fred, but readers only have to support both formats, and publishers also only have to support both formats as well and everybody will be happy. Bad design thinking: do not forget that the goal is to make application. How do you think that I will explain the difference between RSS 1.0 and RSS 2.0 to my mother? How do you think that I will explain her which one to choose if she have the possibility to subscribe to more than one feed? Will she choose RSS 0.91, RSS 0.92, RSS 1.0, RSS 2.0, ATOM 0.9, or ATOM 1.0 (because some websites propose them all)? Sorry, but I do not want to.

One of the current problems

On of the problem are the way applications handle all these file formats and serializations. I will explain it with a problem I faced today while testing the new RSS feed of Talk Digger with Bloglines.

A thing I wanted was to use the Dublin Core element “Description” instead of the normal “<description></description>” tag of the RSS 1.0 specification. I first thought that it would scale much more because the Dublin Core RDF schema is widely use by many, many applications over the Internet. First I tested it using RSS Bandit. It worked like a charm. All the Dublin Code elements I added to my RSS feed were handled by it. Wow! Then a tested it with Bloglines: nothing. Bloglines just doesn’t handle that Dublin Core tag: deception.

Then I included this namespace into my RDF file: “xmlns:ct=”http://purl.org/rss/1.0/modules/content/””. Then I re-tested it: nothing. Wow, it should works, isn’t? Then I tested something else, I changed the alias “ct” for the namespace “content”: it worked. What a deception I had: Bloglines is not caring about the namespace local alias, in fact it seems that it parse the RSS 1.0 feeds (in fact a RDF file) with fixed strings. The system should know that “ct” is related to the namespace instead of “content” because they are just aliases that I use to define the namespace in my local file. It is a perfect example of bad implementation of specifications in softwares.

The problem here is that Bloglines is the most popular web feed reader out there. So I have to change the way I build my feeds to handle that fact, but I shouldn’t be supposed to (it is really frustrating). Will I have to change the way I build my feeds each time I discover that an application is not parsing and using them properly? I hope no, I shouldn’t be supposed to because I follow the specification to build them.

I hope they will check that problem with their parser and hire somebody to develop a robust system that will parse and handle the RDF specification, and not only parsing RSS 1.0 feeds as simple text files with some format… (Could I change that skill requirement “Familiarity with RSS and blogs” for “Strong understanding of RDF, RSS and blogs”, cited in that job proposition, to answer that responsibility “maintain and improve RSS crawling and parsing processes”.

I hope to be able to show you how RSS 1.0 could be extended, using a future version of Talk Digger, soon.

Technorati: | | | | | | | | | |

One thought on “RSS 1.0, RSS 2.0: make it simple not simpler

  1. Your are absolutely right. using 1.0 lets you extend the RSS file but more importantly, lets you do things like reason and translate RSS. Say you wnat to translate your blog into Spanish and you
    want to rename want to institute a semantic search engine that does a best statistical fit based
    on context. The keywords will not always translate as they would in the body of the text. To do
    that you have to add be able to reason about the item.

    Can you do that in 2.0? oh no, no, no.

    As to Microsoft, why use OWL when you can use OLAP? Just kidding here folks OLAP can’t reason. Besides you can get an English to SPARQL translator. What has Microsoft got? Nothing!
    If you want to make the web friendly try getting rid of SQL and let people use English (or Spanish, whatever), not some language that MS owns. O.K. they don’t own RSS2.0 or OLAP but with proprietary extentions they can slice out a much bigger piece of the pie.

Leave a Reply

Your email address will not be published. Required fields are marked *