Many people think that the semantic web will never happens, at least in next few years, because there is not enough useful data published in RDF. This is fortunately a misconception. In fact, many things are already accessible in RDF, even if it doesn’t appear at the first sigh.
Triplr
Danny Ayers recently pointed out a new web service created by Dave Beckett called Triplr: “Stuff in, triples out”.
Triplr is a bridge between well-formed XHTML web page containing GRRDL, RSS and their RDF/XML or Turtle formatting.
Virtuoso’s Sponger
Another bridging service called the Sponger also exists. Its goal is the same as Triplr: taking different sources of data as input, and creating RDF as output.
The Virtuoso Sponger will do everything possible to find RDF triples from a given URL (via content-negotiation and checking for “link” elements in HTML files). If no RDF document is available from a URL, it will tries to convert the data source available at that URL into RDF triples. Converted data sources are: microformats, RDFa, eRDF, HTML meta data tags, HTTP headers, as well as APIs like Google Base, Flickr, Del.icio.us, etc.
How does it work?
- The first thing the Sponger is doing is trying to dereference a given URL to get RDF data from it. If it finds some, it returns it, otherwise, it continues.
- If the URL refers to a HTML file, the Sponger will try to find “link” elements referring to RDF documents. If he finds one or more of them, it will add their triples into a temporary RDF graph in and continue its process.
- If the Sponger finds microformat data into the HTML file, it will maps it using related ontologies (depending on the microformat) and will creates RDF triples from that mapping. It will add these triples to the temporary RDF graph and continues.
- If the Sponger finds eRDF or RDFa data into the HTML file, he will extracts them from the HTML file and add them into the RDF graph and continues.
- If the Sponger find that it is talking with a web service such as Google Base, it will maps the API of the web service with an ontology, creates triples from that mapping and includes the triples into the temporary RDF graph and continues.
- If nothing is found and that there is some HTML meta-data, it will maps them with some ontologies, creates triples and add them to the temporary RDF graph.
- Finally, if nothing is found, it will returns an empty graph.
The result is simple: from any URL, it is most than likely sure that you will get some RDF data related to that URL. The bridge is now made between the Web and the Semantic Web.
Some examples
There are some examples of data sources converted by the Sponger:
- RDF/XML from HTML via GRDDL (same as the Triplr example)
- Following “link” HTML document to find linked RDF files (from my home page, to my FOAF profile hosted on another website)
- From the Google Web service API to RDF/XML (There is the normal web page (a feed) where the triples are generated from)
Conclusion
What is fantastic for a developer is that he only has to develop its system according to RDF to make its application communicating with any of these data sources. The Virtuoso Sponger will do all the job of interpreting the information for him.
This is where we really meet the Semantic Web.
With such tools, it is like looking at the semantic web in a lens.
dulanov
March 28, 2007 — 4:15 pm
Very often peoples talk about successful Web2 technologies: Ajax, Blog, Wiki and express doubts about Semantic Web. For example see the recent Stephen Downes note “Why the Semantic Web Will Fail” (http://halfanhour.blogspot.com/2007/03/why-semantic-web-will-fail.html). I consider Semantic Web technology enough complex too, but this complex is justified.
In my postgraduate education I tried to solve the described in your note problem, may be it will be interesting for someone. I tried to solve the Stonebraker’s THALIA integration tesbed (http://www.cise.ufl.edu/project/thalia.html).
THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) is a publicly available testbed and benchmark for testing and evaluating integration technologies. This Web site provides researchers and practitioners with a collection of 40 downloadable data sources representing University course catalogs from computer science departments around the world. The data in the testbed provide a rich source of syntactic and semantic heterogeneities since we believe they still pose the greatest technical challenges to the research community. In addition, this site provides a set of twelve benchmark queries as well as a scoring function for ranking the performance of an integration system.
THALIA testbed represented by 40 XML/XSLT files automatically produced from 40 education sites. I tried to solve the syntactic and semantic integration problems in its by using an education ontology and SWRL rules and represent these files as RDF store.