I recently started to follow discussions evolving around the Data Portability project. It is an emerging community of people that tries to define the principles and push technologies to encourage the “portability” of data between people and systems. Other such initiative exists, such the Linking Open Data Community (that emerged from the semantic web community more than one year ago), The Open Knowledge Definition, and there are probably many others too. However DP is the one that recently got the biggest media coverage considering “support” and covering from some people and groups.
An interesting thread emerged from the mailing list that was trying to get a better definition of what “Data Portability” means.
- Data Referencing
- Data Mobility (moving data from distinct locations via Import and Export using agreed data formats)
What the Semantic Web means in this context?
What these two critical points mean in terms of semantic web concepts and technologies?
Defining the context
This discussion will be articulated in one context: the Web. The current discussion will take into consideration that all data is available on the Web. This means the use of Web technologies, protocols, standards and concepts. This could be extended to other networks, with other protocols and technologies, but we will focus the discussion on the Web.
How data referencing is handled on the semantic web? Well, much information is available about that question on the Linked Data Wikipedia page. Basically it is about referencing data (resources) using URIs (unique resources identifiers), and these URIs should ideally be “dereferencable” on the Web. What “dereferencable on the Web” means? It means that if I have a user account on a certain web service, and that I have one URI that define that account, and that this URI is in fact a URL, so that I can get data (normally a RDF document; in this example it would be a RDF document describing that user account) by looking at this URL on the Web (in this case we say that the URI is dereferencable on the Web).
This means one wonderful thing: if I get a reference (URI) to something, this means that in the best of the cases, I can also get data describing this thing by looking on the Web for its description. So, instead of getting a HTML page describing that thing (this can be the case, but is not limited to) I can get the RDF description of that thing too (via web server content negotiation). This RDF description can be use by any web service, any
software agent, or whatever, to helps me to perform specific tasks using this data (Importing/Exporting my personal data? Merging two agendas in the same calendar? Planning my next trips? And so on).
Now that I have a way to easily reference and access any data on the Web, how that accessible data can become “mobile”?
RDF and Ontologies to makes data “mobile”
RDF is a way to describe things called “resources”. These resources can be anything: people, books, places, events, etc. There exists a mechanism that let anybody describing things according to their properties (predicates). The result of this mechanism is a graph of relationships describing a thing (a resource). This mechanism do not only describes properties of a Thing, but also describe relationship between different things. For example, a person (a resource) can be described by its physical properties, but it can also be described with its relation with other people (other resources). Think about a social graph.
What is this mechanism? RDF.
Ontologies as vocabularies standards
However, RDF can’t be used alone. In order to make this thing effective, one need to use “vocabularies”, called ontologies, to describe a resource and its properties. These ontologies can be seen as a controlled vocabulary defined by a community of experts to describe some domains of things (books, music, people, networks, calendar, etc). It is much more than a controlled vocabulary, but it is easier to understand what it is that way.
FOAF is one of these vocabularies. You can use this ontology to describe a person, and its relation with other people, in RDF. So, you will say: this resource is named Fred; Fred lives near Quebec City; and Fred knows Kingsley. And so on.
By using RDF + Ontologies, the data is easily made Mobile. By using such standards that communities, people and enterprises agree to uses; systems will be able to read, understand and manage data coming from multiple different data sources.
Ontologies are standards ensuring that all the people and systems that understand these ontologies can understand the data that is described, and then accessible. It is where data becomes movable (mobility is not only about accessibility for download, it is also about understanding the transmitted data).
Data description robustness
But you know what is the beauty with RDF? It is that if one of the system doesn’t know one ontology, or do not understand all classes and properties of an ontology used to describe a resource, it will only ignore that data and concentrate its effort to understand the thing being described with the ontologies it knows. It is like if I would speak to you, in the same conversation, in French, English, Italian and Chinese. You would only understand what I say in the languages you know, and you will act considering the things you understood of the conversation. You will only discard the things you don’t understand.
Well, it is hard to put all these things in one single blog post, but I would encourage people that are not familiar with these concepts, terminologies and technologies, and that are interested in the question, to start reading what the semantic web community wrote about these things, what are the standards supported and developed by the W3C, etc. There are so many things that can change the way people use the Web today. It is just a question of time in fact!