Behind Oz’s Curtain

Benjamin Nowack, creator of ARC and Trice, wrote an interesting blog post about the place of Microformats and RDFa in the HTML 5 specification. I am not deep into the specification itself, and so may lack some history context. However, the most interesting point in this article is not related to Microformats, RDFx or the new HTML 5 specification.

The point is that apparently, some people believe that it is RDF or nothing. This is not new, but is that true?

People (and particularly enterprises) want the benefits of structured data, not necessarily RDF. In fact, many people don’t know about RDF, or don’t understand RDF, or just don’t care about RDF. But, is it because you don’t know, understand or care about RDF that you cannot benefit from it? No, certainly not. And I think that is what Benjamin is talking about when he mentions things such as: “[…] to get RDF to the broader developer community“, “[…] here could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers“. “[…] they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata”. RDF can be incarnated in multiple bodies, but it is still RDF. I think it is what Benjamin was suggesting, and it the path we took at Structured Dynamics.

We choose to use RDF behind Oz’s curtain. This means that at the core of any of our methodologies, systems and specifications, we use RDF. Why? Because it is the more flexible description framework available that helps us handle any other source of data. However, does that mean that we should push RDF in everybody’s face? Certainly not.

Our work with different enterprises from all kind of domains told us that we have to look beyond RDF while still using it (as paradoxically as that may appear). For example, we developed structWSF and conStruct such that people can upload (and manage) their data in different formats while being able to export it in all other different formats. At the core, these systems use RDF to manipulate all these different kind of formats, but from the outside, users simply use the format they care about, they use, or that they have available in their workflow. These users benefits from RDF without knowing it, understanding it or without caring about it. We don’t think RDF is for everyone, but everyone can benefit from RDF.

Another example of RDF behind Oz’s curtain is the irON description framework and its three serialization profiles: irJSON, irXML and commON that we developed. As stated in the Purpose section of this document, the goal was quite clear:

irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF. The notation supports writing RDF and schema in JSON (irJSON), XML (irXML) and comma-delimited (CSV) formats (commON). The notation specification includes guidance for creating instance records (including in bulk), linkages to existing ontologies and schema, and schema definitions. Profiles and examples are also provided for each of the irXML, irJSON and commON serializations.

irON is premised on these considerations and observations:

RDF (Resource Description Framework) is a powerful canonical data model for data interoperability
However, most existing data is not written in RDF and many authors and publishers prefer other formats for various reasons
Many formats that are easier to author and read than RDF are variants of the attribute-value pair construct [2], which can readily be expressed as RDF, and
A common abstract notation for converting to RDF would also enable non-RDF formats to become somewhat interchangeable, thus allowing the strengths of each to be combined.

The irON notation and vocabulary is designed to allow the conceptual structure (“schema”) of datasets to be described, to facilitate easy description of the instance records that populate those datasets, and to link different structures for different schema to one another. In these manners, more-or-less complete RDF data structures and instances can be described in alternate formats and be made interoperable. irON provides a simple and naive information exchange notation expressive enough to describe most any data entity.

I think this is what Benjamin was talking about in his article, and the kind of mindset he was suggesting the RDF community to adopt. At least this is the minding we adopted at Structured Dynamics, and apparently it is the minding Benjamin adopted for his own business. I am sure there are many other people and organizations out there that are adopting the same point of view according to RDF and its role in the current data ecosystem.

Frederick Giasson

Machine Learning, Engineering & Data

Leave a Reply Cancel reply