One of the problems I found with the semantic web is how it could be difficult to find new and fresh data. Recently I was confronted with a problem: how to notify a web service that Talk Digger had new and updted semantic web data ready to be crawled (the SIOC and FOAF ontology for people familiar with semantic web technologies). |
Then I questioned myself about why nobody, at my knowledge, developed a sort of weblogs.com or pingerati.net pinging service for semantic web documents? This solution already proved that it is working considering that weblogs.com archive and export millions of pings every day.
What is PingtheSemanticWeb.com?
PingtheSemanticWeb.com is a web service archiving the location of recently created/updated FOAF, DOAP or SIOC RDF documents on the Web. If one of those documents is updated, its author can notify the service that the document have been updated by pinging it with the URL of the document.
PingtheSemanticWeb.com is used by crawlers or other type of software agents to know when and where the latest updated FOAF, DOAP and SIOC documents can be found. So it requests a list of recently updated documents as a starting location to crawl the semantic web.
More information about supported ontologies can be found here:
- Semantically-Interlinked Online Communities (SIOC)
- Friend-of-a-Friend (FOAF) and PingtheSemanticWeb.com
- Description of a Project (DOAP) and PingtheSemanticWeb.com
Using the Bookmarklet
I greatly suggest to anyone to use pingthesemanticweb.com’s Bookmarklet. You only have to install this bookmarklet in your browser, and click on it from any Web page. If a FOAF, SIOC or DOAP document is found, it will be immediately indexed by the pinging service.
It is the easiest way for anyone to help PingtheSemanticWeb.com to find new documents to index.
How to install the Bookmarklet
Read the instructions on how to install the Bookmarklet (Browser Button) into your browser.
How does it works
You can use the URL of a HTML or RDF document when pinging PingtheSemanticWeb.com web service. If the service found that the URL points to a HTML document, it will check if it can find a link to a FOAF, a DOAP or a SIOC rdf document. If it founds one, it will follows the link and check the RDF document to see if SIOC, DOAP and/or FOAF elements are defined in the document. If the service found that the RDF document has SIOC, DOAP and/or FOAF elements, it will archive the ping and make it available to crawlers the export files. Otherwise it will discard it.
Custom needs, suggestions and bug reports
This service is new, so if you have any suggestions to improve it, if you find any bugs while pinging URLs or importing ping lists, or if you have any custom needs for you semantic web crawler of software agents, please contact me by email [fred ( at ) fgiasson.com], that way I’ll be able to help you out as quickly as possible.
Technorati: Ping | semantic | web | service | sioc | foaf | doap | web2.0
tim finin
August 14, 2006 — 5:13 pm
Great idea! Two questions: (1) to categorize a document as FOAF, DOAP or SIOC are you just checking to see if the appropriate namespace is declared? (2) Why discard documents not using any of those three vocabularies?
I guess one reason might be scalability, but you’ve already done the hard work of looking inside the document. We’d be interested in getting pings from all kinds of RDF documents to feed into Swoogle’s maw.
Fred
August 14, 2006 — 5:48 pm
Hi Mr. Finin,
Thanks for passing by and leaving a comment on this post.
To answer to you questions:
(1) Exactly
(2) This is not really a question of scalability (since the system is fully scalable), but a question of resources and development time. Right now I spent about 2 working days (16 or 18 hours) to create the version online right now. So I have to rework on the autodiscovery feature in the next days, but I also have to take the resources I have right now into consideration to do the job. But I’ll make it more powerful in the next few days/week for sure.
It is funny because I was talking with Uldis Bojar (one of the creator of the SIOC ongology) about an hour ago about the possibility that Swoogle could use PingtheSemanticWeb pings list as a starting point to crawl and index new RDF documents.
If you are always interested, you should contact me so we could check what is the most suitable method, for Swoogle, to fetch the data. Right now the exportation is done using a XML file, but tomorrow I’ll check that with some other people because it is still an open question (more information about it can be found here).
Thanks for you kind words about my work on the project, and keep me in touch with your interests/suggestions/anything-else.
Take care,
Salutations,
Fred
tim finin
August 14, 2006 — 6:12 pm
I’ll try to talk with Li Ding about what makes sense for how we might like to get the pings.
Fred
August 14, 2006 — 8:44 pm
Hi Mr. Finin,
Thank you for your fast answer. Great, take me in touch with your talk with Mr. Ding. Anything you decide I’ll take the time to develop what is needed to make it works.
That way, it will be a good proof a concept, a good example on how such systems could work together, and insist people to use both services, etc.
Thanks,
Take care,
Salutations,
Fred