Why using RDF instead of XML?

Danny Ayers dropped a line about that post I wrote yesterday:

Fred Giasson has a nice approach to encouraging feedreaders to do more interesting stuff (one nit – it isn’t clear from this why it’s better to use RDF that Plain Old XML with extensions).

He is right: it was really not clear. The fact is that I am not sure that I will be able to make it clearer.

The thing is that we could certainly do that using Plain Old XML and it would work quite fine. However for me it is partly a question of vision: the Semantic Web. Why not starting to implement the base of the system even is the technologies that would play with that content are not right now ready for prime time? It is sure that it is a tradeoff: adding complexity to explicit meaning in data. But I think that it could pay on the long run.

Semantic Web or not, I think that there is some reasons why we should use RDF instead of XML to perform that task. I will use the words of some mastermind of the Semantic Web and its technologies to try to sketch an answer to that question asked by Danny.

I will start by using Danny’s own word:

You can’t judge the benefits of RDF by judging individual applications in isolation, any more than you could judge the web by looking at a single host. Each application probably could be simpler if it used vanilla XML instead of RDF. But by using RDF they have a level of interoperability that goes far beyond what would be available otherwise. FOAF MusicBrainz data can be inserted directly into RSS 1.0 feeds and their meaning remains globally well-defined – the creator of a resource has the same meaning whether that resource is a document, blog item or piece of music. It means that the same tools, with little or no modification, can be used across all domains. The wheel doesn’t have to be reinvented for every purpose.

The main concepts here are interoperability and re-usability. So I could use the module I am developing for Talk Digger that handle FOAF profiles with little or no modification to handle FOAF information incorporated into a RSS 1.0 feed for example. I think that this is a question of good system design (in the present case, the system we are talking about here is the Web).

In the same blog post Dan Brickley, another brain of the Semantic Web, added in the comment section:

One way to think about this: the Resource Description Framework (RDF) is a family of XML applications who agree to make a certain tradeoff for the sake of cross-compatibility. In exchange for accepting a number of constraints on the way they use XML to write down claims about the world, they gain the ability to have their data freely mixed with that of other RDF applications.

Since many descriptive problems are inter-related, this is an attractive offer, even if the XML syntax is a little daunting. MusicBrainz can focus on describing music, RSS 1.0 on describing news channels, FOAF on describing people, Dublin Core on describing documents, RdfGeo on places and maps, RdfCal on describing events and meetings, Wordnet on classifying things using nouns, ChefMoz on restaurants, and so on.

Yet because they all bought into the RDF framework, any RDF document can draw on any of these ‘vocabularies’. So an RSS feed could contain markup describing the people and places and music associated with a concert; a calendar entry could contain information about it’s location and expected attendees, a restaurant review could use FOAF to describe the reviewer, or a FOAF file could use Dublin Core to describe the documents written by its author, as well as homepages and other information about those authors.

So, for any particular application, you could do it in standalone XML. RDF is designed for areas where there is a likely pay-off from overlaps and data merging, ie. the messy world we live in where things aren’t so easily parceled up into discrete jobs.

But it is a tradeoff. Adopting RDF means that you just can’t make up your XML tagging structure at random, but you have to live by the ‘encoding’ rules expected of all RDF applications.

This is so that software written this year can have some hope of doing useful things with vocabularies invented next year: an unpredictable ‘tag soup’ of arbitrary mixed XML is hard to process. RDF imposes constraints so that all RDF-flavoured XML is in broadly the same style (for example, ordering of tags is usually insignificant to what those tags tell the world). Those constraints take time to learn and understand and explain, and so adopting RDF isn’t without its costs.

And so the more of us who use RDF, the happier the cost/benefit tradeoff gets, since using RDF brings us into a larger and larger family of inter-mixable data.

Tim Berners-Lee also wrote a note at the W3 in 1998 to explain Why RDF model is different from the XML model.

Finally Lee W. Lacy in his book OWL: Representing Information using the Web ontology language is saying:

While XML provides features for representing and interchanging information, it is insufficient for supporting Semantic Web requirements. The primary reasons that XML cannot support the Semantic Web directly are that XML defines syntax, not semantic, and XML descriptions are ambiguous to a computer.

XML is a data formatting language that only provides a grammar (syntax). XML tag names provide few hints to support importing software. XML tags are no better than natural language for providing meaning. XML tag names are simply strings that may be meaningful to humans but are meaningless to software. While XML tag names may suggest meaning to a human reader, they have no meaning by themselves to parsing software. A tag name (e.g., “<orange>”). The meaning of the tags must be agreed to a priori to support interoperability. XML languages often require voluminous companion documentation to explain the meaning of tags [important point for the discussion that currently interest us].

XML does not represent the semantics, the meaning of a concept. Software must understand how to interpret XML tags to perform more meaningful applications. However, there is no straightforward way to describe semantics in traditional XML. XML formats can be specified in DTDs and XML schemas, but these specifications do not describe the relationship between resources.

XML adds structure to data on the current web. However, it is too focused on syntax and exhibits ambiguiyy that make it insufficient to support the Semantic Web itself. A common problem with using XML is that there are too many ways to describe the same thing. Semantic Web developers leverage XML’s features in different ways. They can define their own tag names, nesting structures, identifiers, and representation styles.

While XML provides a number of useful features, there are serious problems with using XML by itself to support the Semantic Web. RDF overcomes many of these challenges […]

At the light of these writings, for the case that interest us, it is a question of cross-compatibility and inter-mixability of data. In the long run, it is for the emergence of the Semantic Web and having meaningful documents to machines.

Frederick Giasson

Machine Learning, Engineering & Data

Leave a Reply Cancel reply