May 2006 – Frederick Giasson

Why using RDF instead of XML?

May 25, 2006 Frederick Giasson

Danny Ayers dropped a line about that post I wrote yesterday:

Fred Giasson has a nice approach to encouraging feedreaders to do more interesting stuff (one nit – it isn’t clear from this why it’s better to use RDF that Plain Old XML with extensions).

He is right: it was really not clear. The fact is that I am not sure that I will be able to make it clearer.

The thing is that we could certainly do that using Plain Old XML and it would work quite fine. However for me it is partly a question of vision: the Semantic Web. Why not starting to implement the base of the system even is the technologies that would play with that content are not right now ready for prime time? It is sure that it is a tradeoff: adding complexity to explicit meaning in data. But I think that it could pay on the long run.

Semantic Web or not, I think that there is some reasons why we should use RDF instead of XML to perform that task. I will use the words of some mastermind of the Semantic Web and its technologies to try to sketch an answer to that question asked by Danny.

I will start by using Danny’s own word:

You can’t judge the benefits of RDF by judging individual applications in isolation, any more than you could judge the web by looking at a single host. Each application probably could be simpler if it used vanilla XML instead of RDF. But by using RDF they have a level of interoperability that goes far beyond what would be available otherwise. FOAF MusicBrainz data can be inserted directly into RSS 1.0 feeds and their meaning remains globally well-defined – the creator of a resource has the same meaning whether that resource is a document, blog item or piece of music. It means that the same tools, with little or no modification, can be used across all domains. The wheel doesn’t have to be reinvented for every purpose.

The main concepts here are interoperability and re-usability. So I could use the module I am developing for Talk Digger that handle FOAF profiles with little or no modification to handle FOAF information incorporated into a RSS 1.0 feed for example. I think that this is a question of good system design (in the present case, the system we are talking about here is the Web).

In the same blog post Dan Brickley, another brain of the Semantic Web, added in the comment section:

One way to think about this: the Resource Description Framework (RDF) is a family of XML applications who agree to make a certain tradeoff for the sake of cross-compatibility. In exchange for accepting a number of constraints on the way they use XML to write down claims about the world, they gain the ability to have their data freely mixed with that of other RDF applications.

Since many descriptive problems are inter-related, this is an attractive offer, even if the XML syntax is a little daunting. MusicBrainz can focus on describing music, RSS 1.0 on describing news channels, FOAF on describing people, Dublin Core on describing documents, RdfGeo on places and maps, RdfCal on describing events and meetings, Wordnet on classifying things using nouns, ChefMoz on restaurants, and so on.

Yet because they all bought into the RDF framework, any RDF document can draw on any of these ‘vocabularies’. So an RSS feed could contain markup describing the people and places and music associated with a concert; a calendar entry could contain information about it’s location and expected attendees, a restaurant review could use FOAF to describe the reviewer, or a FOAF file could use Dublin Core to describe the documents written by its author, as well as homepages and other information about those authors.

So, for any particular application, you could do it in standalone XML. RDF is designed for areas where there is a likely pay-off from overlaps and data merging, ie. the messy world we live in where things aren’t so easily parceled up into discrete jobs.

But it is a tradeoff. Adopting RDF means that you just can’t make up your XML tagging structure at random, but you have to live by the ‘encoding’ rules expected of all RDF applications.

This is so that software written this year can have some hope of doing useful things with vocabularies invented next year: an unpredictable ‘tag soup’ of arbitrary mixed XML is hard to process. RDF imposes constraints so that all RDF-flavoured XML is in broadly the same style (for example, ordering of tags is usually insignificant to what those tags tell the world). Those constraints take time to learn and understand and explain, and so adopting RDF isn’t without its costs.

And so the more of us who use RDF, the happier the cost/benefit tradeoff gets, since using RDF brings us into a larger and larger family of inter-mixable data.

Tim Berners-Lee also wrote a note at the W3 in 1998 to explain Why RDF model is different from the XML model.

Finally Lee W. Lacy in his book OWL: Representing Information using the Web ontology language is saying:

While XML provides features for representing and interchanging information, it is insufficient for supporting Semantic Web requirements. The primary reasons that XML cannot support the Semantic Web directly are that XML defines syntax, not semantic, and XML descriptions are ambiguous to a computer.

XML is a data formatting language that only provides a grammar (syntax). XML tag names provide few hints to support importing software. XML tags are no better than natural language for providing meaning. XML tag names are simply strings that may be meaningful to humans but are meaningless to software. While XML tag names may suggest meaning to a human reader, they have no meaning by themselves to parsing software. A tag name (e.g., “<orange>”). The meaning of the tags must be agreed to a priori to support interoperability. XML languages often require voluminous companion documentation to explain the meaning of tags [important point for the discussion that currently interest us].

XML does not represent the semantics, the meaning of a concept. Software must understand how to interpret XML tags to perform more meaningful applications. However, there is no straightforward way to describe semantics in traditional XML. XML formats can be specified in DTDs and XML schemas, but these specifications do not describe the relationship between resources.

XML adds structure to data on the current web. However, it is too focused on syntax and exhibits ambiguiyy that make it insufficient to support the Semantic Web itself. A common problem with using XML is that there are too many ways to describe the same thing. Semantic Web developers leverage XML’s features in different ways. They can define their own tag names, nesting structures, identifiers, and representation styles.

While XML provides a number of useful features, there are serious problems with using XML by itself to support the Semantic Web. RDF overcomes many of these challenges […]

At the light of these writings, for the case that interest us, it is a question of cross-compatibility and inter-mixability of data. In the long run, it is for the emergence of the Semantic Web and having meaningful documents to machines.

Next step with Web Feed Readers: from Passive readers to Active users!

May 24, 2006May 25, 2006 Frederick Giasson

You can download 129 different web feed readers at download.com. Primary, they will all do the same thing: aggregating RSS and Atom feeds content. After they will differ in the way they will manage and present the information. That’s it.

In that case, what is the next step with Web Feed Readers… if any?

If I check the big picture, I can find out one recurrent user state: they sat in front of their web feed reader passively reading their uninterrupted incoming flow of feed content.

From passive readers to active users!

I am playing with this idea since I answered to a comment from Hussein on one of my old blog post talking about the security of the Gmail Atom feeds. There is what I wrote:

Google is supposed to have tested a RSS feed service for Gmail in their GoogleLabs in 2004. I can not confirm if the service is always available because I do not have any Gmail accounts and I can not sing-in for one. This service put new incoming messages of a Gmail account into a RSS feed. Then if you subscribe to that feed you will see your new Gmail messages directly into your web feed reader. What an excellent idea! However, I was surprised to found that they used SSL to create a secure channel between the feed and the feed reader.

Then I thought about all the things that we can aggregate in these days: blog content, incoming emails, UPS package delivery status, calendar events, etc, etc, etc. Then I realized: people have all that content in their face, but what can they do with it? Some web feed readers and other services now implement a “blog this item” feature. It lets the user instantly blog about that specific item. Great, users can act accordingly to aggregated content via their feed reader. Why not extending this behavior to everything else?

The email example

In a hypothetical world, I am receiving my incoming email in my web feed reader via a RSS feed service provided by my mail provider.

What is cool is that I will receive my news, my emails, my UPS delivery status, my calendar events, etc, at the same place.

So, I just received an email from Sophie. Instead of opening my email client to answer her (what would be really, really unproductive), my web feed reader detect that the incoming web feed item is an email and let me answer directly from its interface.

Wow, one single application to do all these things.

How it would works?

Technologies are already available to be able to do that. I will not re-open the RSS 1.0 vs. RSS 2.0 debate here but this example is just another one in favor of using RSS 1.0 instead of RSS 2.0.

The discussion about RSS 1.0 and RSS 2.0 continue here :

Why using RDF instead of XML? [25 May 2006]

Fred, you are telling us that RSS 1.0 is much powerful than RSS 2.0? Yes, all the power of RSS 1.0 resides in the fact that it supports modules. This capability is given by RDF and his ability to import external RDF schemas to extend his vocabulary. What is a module? A module gives the possibility to the content publisher to extend his file format’s vocabulary by importing external RDF schemas.

– RSS 1.0, RSS 2.0: make it simple not simpler

What web feed readers need is to know what particular feed item is (a sort of type). What we need is something to tell to the feed reader that this feed item is in fact an email, and not a normal feed item, and that there are its characteristics.

This is what RSS 1.0 modules are all about. This is a way to extend the information about an item in a web feed.

That way, I could tell to feed readers that this particular web feed item is not a normal one, it is an email, and there are its characteristics (sender email, receiver email, subject, body, attached files, etc, etc).

What is wonderful is that if the web feed reader cannot understand the content provided by the module, then he just doesn’t have to care about it and display the item as if it was a normal feed item. This is what is great with modules: you can act according to or just ignore them, it doesn’t change anything.

The email example with RSS 1.0 modules

Now, how it would works? Simple, we could create a RSS 1.0 module that would describe what is an email (a module is an ontology that describe classes (sender, receiver, etc.) and their properties (subject, from, to, etc.)

I will use the mailont ontology used in MailSMORE for my example.

Considering this module, a RSS 1.0 feed of a Gmail email feed would look like something like:

<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:dc=”http://purl.org/dc/elements/1.1/”
xmlns:email=”http://www.wam.umd.edu/~roark/mailont.rdf#”
xmlns=”http://purl.org/rss/1.0/”
>

<channel rdf:about=”http://gmail.com/?_fl=rss1.0″>
<title>GMail feed</title>
<link>http://gmail.com</link>
<description>GMail secure feed</description>
<items>
<rdf:Seq>
<rdf:li resource=”hhttp://gmail.com/getemail?r123″ />
</rdf:Seq>
</items>
</channel>

<item rdf:about=”http://gmail.com/getemail?r123″>
<title>Hello Bob! Tonight’s dinner!</title>
<link>http://gmail.com/getemail?r123</link>

<!– only used to describe the content provider, in that case it is GMail –>
<dc: publisher>Google</dc: publisher>
<dc:creator>GMail system</dc:creator>
<dc:rights>Copyright © 2006 Google</dc:rights>

<email:Message rdf:ID=”Current Message”>
<email: DateTime>10:05:03 25/10/2006</email: DateTime>
<email:Subject>Hello Fred! Tonight’s dinner!</email:Subject>
<email:To>[email protected]</email:To>
<email:From>[email protected]</email:From>
<email:Cc></email:Cc>
<email:MessageId></email:MessageId>
<email:InReplyTo></email:InReplyTo>
<email:ArchiveUrl></email:ArchiveUrl>
<email:References></email:References>
<email:Body>Hello Fred! it was to let you know that it’s working for me for tonight’s dinner. Take care! Sincerely yours, Sophie xxx</email:Body>
</email:Message>

</item>

</rdf:RDF>

Now a Web Feed Reader could act upon this meta-information if he is able to understand it.

Giving this information, I could create a web feed reader that understand the “email” RSS 1.0 module (ontology) and act vis-à-vis its content. So my web feed reader would not only display static information to its users, but it would also let them act (reply to the email) accordingly to that information!

This simple schemas only shows how a RSS reader would act accordingly to the module he understand and not.

Technorati: Web | feed | rss | rss1.0 | rss2.0 | semantic | web | readers | atom |

EventSites: Web Services architecture pushed to the extreme

May 20, 2006May 21, 2006 Frederick Giasson

I just finished reading that article on Ajaxian that pointed me to Eventsites.

What is EventSites?

Eventsites is a mashup. It’s a web application that uses data and functionality provided by external web services.

Eventsites uses no database of its own

The only server logic is a single redirect script enabling cross-domain XHR requests

All the functionality of the site comes from other websites

So how it works? Simple: it use existing web services likes EVDB, Google Maps and Flickr to create a service of its own: EventSites. EventSites is only an interface, a sort of hub that links these three web services together. EVDB will archive the events’ information, Google will show where the events are in the World and Flickr will host the events’ photos.

Some problems with EventSites

One of the small problems I see with EventSites (like too many other Ajax web sites) is that it only works on FireFox. The problem with Ajax interfaces is that you have to use tricks to make it works the same way on all browsers. This is really time consuming and it is why some developers only make it works on FireFox and/or Internet Explorer. Personally if I am not able to make a functionality working on Opera, Safari, FireFox and Internet Explorer I do not implement it in the Web interface.

Another thing is that most of EventSites’ source code is publicly available because it uses Javascript scripts. It uses some server side things but not that much. This is not a problem in itself but we will see much more copycat web sites wandering on the Web than we currently do see with bookmarking web sites for example.

Sharing the functionalities

The idea of a Web built by web services is somewhat old. However it is not until recently that Web developers (most without any budgets) have been able to create useful mashups with using the existing web services. A mix of powerful open source technologies, low cost hardware and bandwidth, created a great environment that let us see see such services emerging on the Web, and EventSites is one great example.

Sharing the content

More and more functionalities are freely shared on the Web via web services. Functionalities are a thing, but content is another thing. In the best world we will have both: services that share functionalities, and services that share content.

The first step in that direction have been done: a major content publisher created an architecture to freely share its content, meaningful to machines (using ontologies and ontology languages to format the content files), over the Internet. Yup, I am talking about the BBC.

Naturally it is in that direction where I push Talk Digger (as a content sharing web service). Using an ontology describing what a [Web] conversation is, other people will be able to retrieve, analyze and display the information in their mashups.

I can’t wait to see which services will emerge in such an network environment.

Other resources about EventSites:

Eventsites: serverless web-development
Eventsites Architecture
Componentising the web

The CSSWS 2006 program is available

May 19, 2006May 21, 2006 Frederick Giasson

As I said, I will be at the CSWWS in Quebec City the 6th June 2006. The program of the event is now available. I just got a look at it and it seems really interesting because they have been able to have speakers that will talk about a wide range of Semantic Web technologies, techniques, domains and problematics.

There will be two tutorials:

Model-Driven Architecture (MDA) Standards for Ontology Development
State of Affairs in Semantic Web Services (SWS)

and many presentations divided in 4 main themes:

Architectures and Systems
Rules, Description Logic and Uncertainty
Applications
Foundations

It will be really interesting to attend to that workshop because there is a lot of room for really interesting discussions about the Semantic Web.

The WayBack Machine told me that I was blogging 6 years ago and that I developed a blogging software in Perl!

May 9, 2006 Frederick Giasson

I was playing with the WayBack Machine when I had the idea to look at the first domain name I bought about 8 years ago: decatomb.com

The archives from the WayBack Machine are quite impressive: from 2000 to 2005 (have in mind that I didn’t update it from 2002 to 2005).

Then I started to look at the first real website I developed with nostalgia. I was amazed to found it back online. What was Decatomb? At that time I was defining it as:

“Decatomb Production is a web site designed to help other computer professionals and enthusiasts in their fields of expertise. We provide a database of useful information from Decatomb’s Community. We also archive and make available a selection of the best technical papers, articles and analyses. In addition, we link other useful websites, newsgroups, events and books. The website is organized in 25 sections of the mosts popular programming languages and subjects.”

With amazement I found that I tried to create a developer community with that website. Then I started to remember things from the past: that I had about 200 subscribed users with profiles, that I developed a IM chat system in Perl, and finally I remembered (and found while surfing the website using the WayBack Machine) that I developed a “blogging” platform (in Perl) integrated in the layout of the Decatomb’s website: it was 6 years ago.

I was amazed to remember, and found, what is now called a blog. I just got out of the high school at that time and I didn’t know what a blog was a that time (I found what it was about 2 years ago).

Then what I found astonished me.

1- I found that I was displaying my last published items at the top of the page.

2- I found that each article had their own permalink, title, body and “view comments” and “post comment” options.

3- I found that the “post comment” section was quite similar with the current comment section on my blog.

4- People even posted comments on this “blog”! What surprised me was that the “friendly ship” of the conversation is the same as the one I have with my blog readers now. Nothing official, only people wanting to talk about something.

The only thing I can say is: thank to the WayBack Machine for that re-discovery. It seems that I was predestinated to rediscover blogs and blogging.

Frederick Giasson

Machine Learning, Engineering & Data

Month: May 2006

Why using RDF instead of XML?

Next step with Web Feed Readers: from Passive readers to Active users!

EventSites: Web Services architecture pushed to the extreme

The CSSWS 2006 program is available

The WayBack Machine told me that I was blogging 6 years ago and that I developed a blogging software in Perl!