Ping the Semantic Web: exporting pings list by Namespaces

 

    I just finished implementing a new feature, to PingtheSemanticWeb, that has been requested by many people: being able to get a list of pings based on the namespaces defined in a RDF document.

The motivation was that many RDF document like RSS 1.0 could be of no use for some type of web services or software agents. So we needed to filter them by defined namespaces.

 

How does it work?

Tim Finin from Swoogle sent me an associative list of the most commonly used namespaces prefixes for the most commonly tracked ontologies by Swoogle.

When I receive a ping for a RDF document, I extract all the namespaces defined, I check their “most commonly used prefixes” and I add them to the pinging list with these prefixes.

That way, if you want the list of RDF documents with the defined namespaces “foaf” or “doap”, you only have to fill the “Pings received with these namespaces” (in the export page) box with the string “foaf doap” (note: all prefixes are “spaces separated”). That way, you will receive the list of all pings received by PingtheSemanticWeb having the namespaces “foaf” and “doap” defined in their RDF documents.

Then a crawler will have to get the list of namespaces prefix in the “types” attribute of an “rdfdocument” element, split all the prefixes, and check in the associative array of the namespaces and their prefixes to know which ontologies are in the RDF document.

 

Modification to the export format

The version of the export format is now set to “1.1”. I added a new attribute to the “rdfdocument” called “ns”. Now a pings export file look like:

 

<pingthesemanticwebUpdate version=”1.1″ updated=”2006-08-11 11:20:54″>

<rdfdocument url=”http://b4mad.net/datenbrei/index.php?sioc_type=post&sioc_id=300″ created=”2006-08-11 11:21:00″ updated=”2006-08-14 09:57:26″ topics=”semantic web foaf technology WordPress sioc” ns=”foaf rss admin dc rdfs rdf content sioc” />

</pingthesemanticwebUpdate>

 

Where “ns” is the string that contains all the “most commonly used prefixes” for namespaces defined in this RDF document. Note that namespaces prefixes are spaces separated.

 

Bugs

I re-iter by bug warning: “there are probably some bugs left. If you find inconsistencies, please send me an email at [fred] at [fgiasson.com] so I’ll fix it as soon as possible. I will not be able to fix it this between the 23 to the 26 August considering that I’ll be off for my summer vacancies in the Maine; so it should work properly, without any problems, but we never know”

Technorati: | | | | | | | | | | | |

Semantic Radar for FireFox and the Semantic Web Services environment

 

Recently I developer a web service for the semantic web called PingtheSemanticWeb. The goal of this service is to be notified that a new semantic web document has been published on the Web; to archive its location; and to give its location to other web services: it is called a pinging system, a sort of hub for semantic web documents.

Many people think that the semantic web is only accessible and useful for researcher. They were probably right five years ago; but now I see the landscape of the Web changing.

More and more web services related to the semantic web are emerging. More and more of these services are starting to interact together. More and more people are using the semantic web technologies and services in their day to day work.

This blog post is about an emerging architecture of semantic web services.

 

The Semantic Radar for FireFox

One of the new comer is the Semantic Radar wrote by Uldis Bojars. This plug-in for FireFox will notify you if it finds a FOAF, SIOC or DOAP RDF document on the web pages your surf.

The characteristic of semantic web documents is that they are not intended for humans, but for software agents (like search engines crawlers, personal agent software like Web Feed Readers, etc). The consequence is that humans do not see these documents, so no body really knows that the Semantic Web is growing and growing on the current Web.

This is the purpose of this new Semantic Radar: unveiling the Semantic Web to humans.

 

The Semantic Radar: much more than that

This plug-in is much more than that. Effectively, each time it detects one of these semantic web documents, it will notify PingtheSemanticWeb.com web service.

This is where the interaction between semantic web services and applications are starting to emerge. Now Web browsers will detect semantic web documents and notify a web service acting as a central repository for semantic web documents.

 

The New Semantic Web Services Environment

A couple of years ago, everything was looking good on paper; now everything is starting to look good on the Web.

Bellow is a simple schema describing the interaction between some technologies and web services of the Semantic Web.

This is not the semantic web, but this is a small portion of it; this is an example of how it is all working together: this is a sort of Semantic Web Mashup.

 

 

 

Semantic Web Documents (RDF)

This is what the semantic web is: a sea of documents formatted in such a way that they explicit the semantic of their content.

  1. RDF. The more widespread type of semantic web documents.
  2. FOAF. Documents describing the person.
  3. SIOC. Documents describing online communities.
  4. DOAP. Documents describing a project.

 

Exporters

All these documents are generated (created) by special application called exporters. An exporter is a program that will generate a semantic web document with a source of data (normally a database).

Some of these exporters will be able to ping (sending a notification) to PingtheSemanticWeb.com pinging service when they generate or update a new semantic web document (like Talk Digger and ODS, or blogs using WordPress, b2Evolution or DotClear that added PinttheSemanticWeb.com to their servers to ping when they publish new articles on their blog).

 

Portal exporters

This is a sub-class of exporters. They are community web sites with thousands of users that are exporting some of their content using semantic web documents.

  1. Talk Digger. This web application is exporting the profile of its users as FOAF documents and exports its conversations (because it is a web service that search for conversations on the Web) as SIOC documents.
  2. LiveJournal. This is a blogging community web site with 10 million of registered users that export its user profiles as FOAF documents.
  3. ODS. This is a set of Web2.0 applications like blogs, Wikis, forums, etc. It exports all its data using documents like FOAF, SIOC, DOAP, and other RDF ontologies.

 

Individual exporters

This is another sub-class of exporters. They are generally plug-ins that individual users add to their software to let them export their data in one of these type of documents.

One good example of such exporters is plug-ins for blogging softwares:

  1. WordPress SIOC exporter. It let the WordPress blogging software export its data in SIOC document.
  2. b2Evolution SIOC exporter. It let the b2Evolution blogging software export its data in SIOC document.
  3. DotClear SIOC exporter. It let the DotClear blogging software export its data in SIOC document.

 

Individual pings

Even if many exporters will automatically ping PingtheSemanticWeb.com service, not all of them will (by example, LiveJournal is not). Also, individual people will create and publish semantic web documents without pinging the system too.

In such a case, the document could be “invisible” to the “semantic web” because nobody knows they exist.

This is the reason why you have another kind of tool that let people ping specific web pages. That way, they have the power to say: Hey! I found that semantic web document; you can find it there.

  1. Semantic Radar for FireFox. This tool will notify the pinging server if the user encounters a semantic web document while he is surfing the Web.
  2. Bookmarklet. This is the PingtheSemanticWeb.com bookmarklet that let a user click on a bookmark to notify the service that a semantic web document is present on the page is he currently looking at.
  3. Website. This is the web interface of the service that let people enter URL of document they found to include them to the service.

 

Ping the Semantic Web.com

This semantic web service is at the center of the architecture I present today. It will act as a multiplexer for semantic web documents location. It will receive the location of semantic web document from a multitude of sources; it will archive them; and it will re-distribute the location to other web services or software agents.

This is the place where the semantic web is truly unveiled; and this is the place where people will go to know where semantic web documents live.

 

Other web services and software agents

Other application like Swoogle has to crawls the Web to find Semantic Web documents. This is why Swoogle will integrate PingtheSemanticWeb in their infrastructure: they will directly have access to a full list of RDF documents ready to be included in their search engine.

Other web services like the SIOC explorer will import only the list of new SIOC documents.

So for all web services or software applications there is a place where they can find a list of new semantic web document ready to be used.

 

The future

What I think is that the synergy created by this architecture could propel the adoption of the Semantic Web.

More people will create semantic web documents if more web services are using them.

Web services will use more semantic web documents if more people will create them.

 

More web services will create semantic web documents if more people will use them.

More people will use semantic web documents if more web services create them.

In a case or another, this interaction will be driven by:

How many semantic web documents are accessible (quickly and easily)

The answer to this need is PingtheSemanticWeb service; pinging tools like the Semantic Radar; and a set of dedicated users that find and ping “semantic web documents”.

Technorati: | | | | | | | | | | | | | | | | | |

Ping the Semantic Web.com: new auto-discovery method and re-indexing of the database

In the first version of the pinging service I was only checking at the namespaces to know where was belonging a RDF documents (to SIOC, FOAF, DOAP or to another ontology). After some conversations I revised the method used and I upgraded the auto-discovery (classification system) of the pinging system.

    

Now I validate and classify each RDF documents in three steps:

  1. I check if the RDF header is present in the “RDF document”.
  2. Then I check if the namespace of the ontology (SIOC, FOAF or DOAP) is used in the RDF document (as base namespace of the RDF document or with prefix).
  3. Finally I check if I can find an instance of a class of the ontology somewhere in the RDF document.

If these three steps are true, I will classify the RDF document accordingly (a file can be classified in more than one section). Otherwise, I will add it to the list of pinged RDF documents (non-categorized ones).

 

Re-indexing

I deleted all the databases and re-indexed all the RDF documents using this new classification technique. Now all the little classification glitches some people experimented should be past. Now you should see the ping appearing in the list only if an instance of the ontology is defined in the RDF document.

 

Bugs

Naturally there are probably some bugs left. If you find inconsistencies, please send me an email at [fred] at [fgiasson.com] so I’ll fix it as soon as possible. I will not be able to fix it this weekend considering that I’ll be off in the wood without any possible communication; so it should work properly, without any problems, but we never know 🙂

 

Open question

There is still an open question raised by Tim Finin as a comment on my last blog post. I will check what I could do this next Monday; however if someone have an idea, please leave it as a comment on this blog post.

Technorati: | | | | | | | | |

Ping the Semantic Web.com update: Now support any type of RDF documents

 

    The Update

I have the pleasure to announce you that I updated PingtheSemanticWeb.com pinging service today. Now any type of RDF document will be archived and exported by the service.

I talked about it with Uldis; I thought about it; and people were right to ask me the question: why don’t you archive every RDF documents?

So there we are, the pinging service will not only archive FOAF, SIOC or DOAP RDF documents, but any others too.

However I will keep the distinction between all RDF documents and FOAF, SIOC and DOAP documents. The pinging service will do extra work to detect, analyze and categorize these 3 types. The reason is simple: explicating and create an architecture to quickly make them available to a lot of smaller web services that would particularly need these ones, that doesn’t care about the others and that doesn’t have the computer power to analyze all of them to find the ones they want. So I will explicit the major ontologies that way.

 

How it works

Nothing changed except that I archive and export RDF documents that as a RDF/XML header (I don’t validate the RDF documents, I only check if the RDF header is present. I let the validation job to the crawlers. Eventually I could add the feature, but not for now, this is only a pinging system after all).

Bellow is the updated workflow schemas:

 

 

Detection of RDF document in HTML files

There is probably 1001 ways to include RDF documents into HTML files.

Right now, I choose to only take into consideration the RDF documents linked with the <link> HTML element. That way I know what RDF document type I should expect, and the only thing I have to do is to make sure that the RDF document is really what the link says it is.

Eventually I will possibly detect and archive RDF documents that are embedded into XHTML files using RDFa, GRDDL, or whatever. Also, I could detect RDF documents embedded into HTML files using techniques like HTML comments, <script> element with XML comments, etc. However, I don’t expect to add such feature in the near future (next weeks) considering my lack of time.

 

Next things

Many things are coming up with the evolution of PingtheSemanticWeb.com pinging service. Many people are interested to interact with it (to give it some data or to use its data). Tomorrow I will wrote another blog post to announce some things that happened/will happen in the next few weeks in relation with the service.

So far, it has been appreciated by many people.

 

Bugs and glitches

I fixed many bugs reported by users in the last few days. So if you find one, think that something doesn’t work properly, or whatever, please contact me by email at [fred ( at ) fgiasson.com], so I’ll be able to investigate the issue as soon as possible.

Technorati: | | | | | | | | | | | |

Developments with PingtheSemanticWeb.com

PingtheSemanticWeb.com is 3 days old and many people already started to take a look at it. The purpose of this blog post is to know what are the next steps, what the open questions are and what changed since.

    

What changed?

In the last 3 days I fixed some little bugs, I upgraded the auto-discovery feature and I corrected the grammar (thanks to Uldis Bojars).

 

The open question: the export feature

The file format used to export the list of pings is still an open question. Right now, I am using a simple XML file (with two elements) to export the list of pings.

Some people wonder why I don’t use a custom RDF document, a Scutter Vocab document or a RSS document?

Christoph Görn proposed this implementation for the export feature

Personally the problem I have by exporting the list of pings in RDF instead of XML is the overhead (in number of characters) it adds without seeing significant benefit to do so. Read the comments on this blog post to be aware of the current positions on the question.

Please leave your comments about this question on this blog post. Personally I’ll wait until Tim Finin (of Swoogle) contact me with the method they wish to use to make Swoogle interacting with PingtheSemanticWeb.com. However, any new opinions are welcome.

 

Future developments

 

SIOC detector

Uldis is currently working on a new version of his SIOC detector. This new version will detect SIOC, DOAP and FOAF files on web pages (By the way, the SIOC detector is a FireFox plug-in). If the detector finds an instance of these ontologies on a web page, it will instantly ping PingtheSemanticWeb.com pinging service.

It will be a really great (and easy) way to find new documents. By example, if 100 people install that plug-in on their FireFox browser, each time one of those find a SIOC, DOAP or FOAF document, while they are surfing the web, the pinging server will be noticed.

 

Thanks

I would like to thanks Uldis, Alex Passant, Christoph and Harry Chen (do I forgot somebody) for their ideas, work and writing on the project.

 

Technorati: | | | | | | | | |