Making the bridge between the Web and the Semantic Web

Many people think that the semantic web will never happens, at least in next few years, because there is not enough useful data published in RDF. This is fortunately a misconception. In fact, many things are already accessible in RDF, even if it doesn’t appear at the first sigh.

Triplr

Danny Ayers recently pointed out a new web service created by Dave Beckett called Triplr: “Stuff in, triples out”.

Triplr is a bridge between well-formed XHTML web page containing GRRDL, RSS and their RDF/XML or Turtle formatting.

Here is an example

Virtuoso’s Sponger

Another bridging service called the Sponger also exists. Its goal is the same as Triplr: taking different sources of data as input, and creating RDF as output.

The Virtuoso Sponger will do everything possible to find RDF triples from a given URL (via content-negotiation and checking for “link” elements in HTML files). If no RDF document is available from a URL, it will tries to convert the data source available at that URL into RDF triples. Converted data sources are: microformats, RDFa, eRDF, HTML meta data tags, HTTP headers, as well as APIs like Google Base, Flickr, Del.icio.us, etc.

How does it work?

The first thing the Sponger is doing is trying to dereference a given URL to get RDF data from it. If it finds some, it returns it, otherwise, it continues.
If the URL refers to a HTML file, the Sponger will try to find “link” elements referring to RDF documents. If he finds one or more of them, it will add their triples into a temporary RDF graph in and continue its process.
If the Sponger finds microformat data into the HTML file, it will maps it using related ontologies (depending on the microformat) and will creates RDF triples from that mapping. It will add these triples to the temporary RDF graph and continues.
If the Sponger finds eRDF or RDFa data into the HTML file, he will extracts them from the HTML file and add them into the RDF graph and continues.
If the Sponger find that it is talking with a web service such as Google Base, it will maps the API of the web service with an ontology, creates triples from that mapping and includes the triples into the temporary RDF graph and continues.
If nothing is found and that there is some HTML meta-data, it will maps them with some ontologies, creates triples and add them to the temporary RDF graph.
Finally, if nothing is found, it will returns an empty graph.

The result is simple: from any URL, it is most than likely sure that you will get some RDF data related to that URL. The bridge is now made between the Web and the Semantic Web.

Some examples

There are some examples of data sources converted by the Sponger:

RDF/XML from HTML via GRDDL (same as the Triplr example)
Following “link” HTML document to find linked RDF files (from my home page, to my FOAF profile hosted on another website)
From the Google Web service API to RDF/XML (There is the normal web page (a feed) where the triples are generated from)

Conclusion

What is fantastic for a developer is that he only has to develop its system according to RDF to make its application communicating with any of these data sources. The Virtuoso Sponger will do all the job of interpreting the information for him.

This is where we really meet the Semantic Web.

With such tools, it is like looking at the semantic web in a lens.

Very often peoples talk about successful Web2 technologies: Ajax, Blog, Wiki and express doubts about Semantic Web. For example see the recent Stephen Downes note “Why the Semantic Web Will Fail” (http://halfanhour.blogspot.com/2007/03/why-semantic-web-will-fail.html). I consider Semantic Web technology enough complex too, but this complex is justified.

In my postgraduate education I tried to solve the described in your note problem, may be it will be interesting for someone. I tried to solve the Stonebraker’s THALIA integration tesbed (http://www.cise.ufl.edu/project/thalia.html).

THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) is a publicly available testbed and benchmark for testing and evaluating integration technologies. This Web site provides researchers and practitioners with a collection of 40 downloadable data sources representing University course catalogs from computer science departments around the world. The data in the testbed provide a rich source of syntactic and semantic heterogeneities since we believe they still pose the greatest technical challenges to the research community. In addition, this site provides a set of twelve benchmark queries as well as a scoring function for ranking the performance of an integration system.

THALIA testbed represented by 40 XML/XSLT files automatically produced from 40 education sites. I tried to solve the syntactic and semantic integration problems in its by using an education ontology and SWRL rules and represent these files as RDF store.

7 thoughts on “Making the bridge between the Web and the Semantic Web”

dulanov

March 28, 2007 — 4:15 pm

Very often peoples talk about successful Web2 technologies: Ajax, Blog, Wiki and express doubts about Semantic Web. For example see the recent Stephen Downes note “Why the Semantic Web Will Fail” (http://halfanhour.blogspot.com/2007/03/why-semantic-web-will-fail.html). I consider Semantic Web technology enough complex too, but this complex is justified.

In my postgraduate education I tried to solve the described in your note problem, may be it will be interesting for someone. I tried to solve the Stonebraker’s THALIA integration tesbed (http://www.cise.ufl.edu/project/thalia.html).

THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) is a publicly available testbed and benchmark for testing and evaluating integration technologies. This Web site provides researchers and practitioners with a collection of 40 downloadable data sources representing University course catalogs from computer science departments around the world. The data in the testbed provide a rich source of syntactic and semantic heterogeneities since we believe they still pose the greatest technical challenges to the research community. In addition, this site provides a set of twelve benchmark queries as well as a scoring function for ranking the performance of an integration system.

THALIA testbed represented by 40 XML/XSLT files automatically produced from 40 education sites. I tried to solve the syntactic and semantic integration problems in its by using an education ontology and SWRL rules and represent these files as RDF store.

Kingsley Idehen's Blog Data Space
Virtuoso Open-Source Edition version 5 released at Frederick Giasson’s Weblog
Integration of Zotero in a Semantic Web environment to find, search and browse the Web’s citations at Frederick Giasson’s Weblog
AI3:::Adaptive Information » Blog Archive » OpenLink Plugs the Gaps in the Structured Web
The Music Data Space at Frederick Giasson’s Weblog
More Structure, More Terminology and (hopefully) More Clarity » AI3:::Adaptive Information

Frederick Giasson

Machine Learning, Engineering & Data

Making the bridge between the Web and the Semantic Web

7 thoughts on “Making the bridge between the Web and the Semantic Web”

dulanov

Leave a Reply Cancel reply