Semantic Web, Web

Making the bridge between the Web and the Semantic Web

 

Many people think that the semantic web will never happens, at least in next few years, because there is not enough useful data published in RDF. This is fortunately a misconception. In fact, many things are already accessible in RDF, even if it doesn’t appear at the first sigh.

 

Triplr

Danny Ayers recently pointed out a new web service created by Dave Beckett called Triplr: “Stuff in, triples out”.

Triplr is a bridge between well-formed XHTML web page containing GRRDL, RSS and their RDF/XML or Turtle formatting.

Here is an example

 

Virtuoso’s Sponger

Another bridging service called the Sponger also exists. Its goal is the same as Triplr: taking different sources of data as input, and creating RDF as output.

The Virtuoso Sponger will do everything possible to find RDF triples from a given URL (via content-negotiation and checking for “link” elements in HTML files). If no RDF document is available from a URL, it will tries to convert the data source available at that URL into RDF triples. Converted data sources are: microformats, RDFa, eRDF, HTML meta data tags, HTTP headers, as well as APIs like Google Base, Flickr, Del.icio.us, etc.

 

How does it work?

  1. The first thing the Sponger is doing is trying to dereference a given URL to get RDF data from it. If it finds some, it returns it, otherwise, it continues.
  2. If the URL refers to a HTML file, the Sponger will try to find “link” elements referring to RDF documents. If he finds one or more of them, it will add their triples into a temporary RDF graph in and continue its process.
  3. If the Sponger finds microformat data into the HTML file, it will maps it using related ontologies (depending on the microformat) and will creates RDF triples from that mapping. It will add these triples to the temporary RDF graph and continues.
  4. If the Sponger finds eRDF or RDFa data into the HTML file, he will extracts them from the HTML file and add them into the RDF graph and continues.
  5. If the Sponger find that it is talking with a web service such as Google Base, it will maps the API of the web service with an ontology, creates triples from that mapping and includes the triples into the temporary RDF graph and continues.
  6. If nothing is found and that there is some HTML meta-data, it will maps them with some ontologies, creates triples and add them to the temporary RDF graph.
  7. Finally, if nothing is found, it will returns an empty graph.

The result is simple: from any URL, it is most than likely sure that you will get some RDF data related to that URL. The bridge is now made between the Web and the Semantic Web.

 

Some examples

There are some examples of data sources converted by the Sponger:

 

Conclusion

What is fantastic for a developer is that he only has to develop its system according to RDF to make its application communicating with any of these data sources. The Virtuoso Sponger will do all the job of interpreting the information for him.

This is where we really meet the Semantic Web.

With such tools, it is like looking at the semantic web in a lens.

7 thoughts on “Making the bridge between the Web and the Semantic Web

  1. Very often peoples talk about successful Web2 technologies: Ajax, Blog, Wiki and express doubts about Semantic Web. For example see the recent Stephen Downes note “Why the Semantic Web Will Fail” (http://halfanhour.blogspot.com/2007/03/why-semantic-web-will-fail.html). I consider Semantic Web technology enough complex too, but this complex is justified.

    In my postgraduate education I tried to solve the described in your note problem, may be it will be interesting for someone. I tried to solve the Stonebraker’s THALIA integration tesbed (http://www.cise.ufl.edu/project/thalia.html).

    THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) is a publicly available testbed and benchmark for testing and evaluating integration technologies. This Web site provides researchers and practitioners with a collection of 40 downloadable data sources representing University course catalogs from computer science departments around the world. The data in the testbed provide a rich source of syntactic and semantic heterogeneities since we believe they still pose the greatest technical challenges to the research community. In addition, this site provides a set of twelve benchmark queries as well as a scoring function for ranking the performance of an integration system.

    THALIA testbed represented by 40 XML/XSLT files automatically produced from 40 education sites. I tried to solve the syntactic and semantic integration problems in its by using an education ontology and SWRL rules and represent these files as RDF store.

Leave a Reply to dulanov Cancel reply