The current problem of the Web
The current problem of the Web is that most (virtually all) documents it holds are formatted for humans. By example, HTML is a markup language that is used to present information to humans, to make documents easily understandable by them.
You wonder why I say that it is the current problem of the web? The problem resides in the fact that these human-oriented documents are not easily processable by computers. The information is not formatted for them. They can’t easily understand what a document is about, his subject, his meaning, his semantic, etc.
A possible solution to that problem
A solution we could use to try to solve this problem is annotating these human-processable documents with computer-processable metadata. This is the exact purpose of new sort of file formats like RDF or OWL. The primary and only purpose of these new file formats is to make digital documents (file, photo, video, anything that is digital) computer-processable.
Such document would describe the meaning and the semantic of a digital document that could be easily understood by computers. That way, software agents could easily read these documents, understand them and even infer new facts and knowledge from them. This is the idea behind the Semantic Web.
The possible problems with such annotated metadata
Remember the first time of the Web when people were using metadata in their HTML header files? Remember the time when search bots were using this information to return relevant data to users? Remember the time when search bots stop using them because people were only using them to tricks the search bots to bring people to their web pages even if their search queries where really not relevant with the content of the returned web page results? It is exactly why people lose faith in metadata. And it is exactly why I have doubts in social tagging (but this is another story).
The problem with the early principles of annotating metadata to documents is that people were able to annotate their web documents with any metadata information, related or not with the content of these documents. At the end, web publishers were not annotating their documents with relevant information in relation with their content, but only with information that would bring traffic to their web sites.
You are probably thinking something like this: “Fred, you said that the semantic web formats: RDF, OWL, or any other, are simply sort of metadata files that could be annotated to current web documents to describe them, their meaning and semantic. So, don’t you think that the result would be the same as the HTML headers’ metadata? That people would try to tricks the semantic web search engines, crawlers and software agents?”
The solution: Semantic-Web-Of-Trust
Bellow is a short description of the Web of Trust saw by Tim Berners-Lee, the father of the Web and the Semantic Web, wrote in 1997.
“In cases in which a high level of trust is needed for metadata, digitally signed metadata will allow the Web to include a “Web of trust”. The Web of trust will be a set of documents on the Web that are digitally signed with certain keys and contain statements about those keys and bout other documents. Like the Web itself, the Web of trust will not need to have a specific structure, such as a tree or a matrix. Statements of trust can be added in such a way as to reflect actual trust exactly. People learn to trust through experience and though recommendation. We change our minds about who we trust and for what purposes. The Web of trust must allow us to express this.”
At that time, Mr. Berners-Lee saw digital signatures as a way to ensure who the author of a metadata annotation is to add trust in that metadata. Some people could also think about PGP’s [PKI] Web of trust system.
Other people, like Shelley Powers, thought about annotating RDF content to links (by example, annotating descriptive information about a link to a local hardware store), and using reification principles to infer trust in the relation: I trust him, you trust me, so you trust him.
Many studies are done to try to find what is the best way to add trust to the Web and in a near future, the Semantic Web. Some techniques, like PGP’s are tested and effective. However, could they be applied for the Semantic Web? What is the best system we can use for the Semantic Web? Is the system already created? Is it to be created?
One thing is sure is that such a system will have to be present in the Semantic Web if we want it to succeed.
Sudar
November 13, 2005 — 11:12 am
Hi Fred,
One important difference is that in the early days of the web the meta data was created or maintained by the creator of the content itself (ie) the person who created or maintaing those HTML pages.
But now the meta data is created or mainted by the users of the content and not by the creator. (ie) By people who tag them in del.cio.us .
So I think that there is a large difference between the meta data present in the past and the meta data which is going to be present in the future.
Hmm what do you feel about it Fred?
Fred
November 13, 2005 — 2:41 pm
Hi Sudar!
Yup, you are right on that one: there is a new way to add meta-data to content with a social way (all the social networks like del.icio.us).
But the problem I see is that the only metadata del.icio.us add to a link is the number of people that tagged it. What I was thinking about while writing that article was to add metadata that describe, semantically, a digital document: a document computer-understandable annotated to a digital document.
But the idea of creating socially maintained repository of meta-data that annotate digital documents is interesting, but I little bit infrastructure consuming? ๐
However in the semantic web, companies will want to add meta-data to their products catalogue to help software agents to retrieve and understand what the products are really all about (on a semantic level). Content publisher, of any type, will want to describe their digital documents in such a way that software agents will be able to easily understand their semantic; and I think that it is really important to make this possible.
So your addition is really interesting, but the question it rises is: is it because something is not delicioused or digged that it is not trustable or doesn’t worth anything? We know that it is not the case, and this is the problem I think.
Salutations,
Fred