The Update
I have the pleasure to announce you that I updated PingtheSemanticWeb.com pinging service today. Now any type of RDF document will be archived and exported by the service. |
I talked about it with Uldis; I thought about it; and people were right to ask me the question: why don’t you archive every RDF documents?
So there we are, the pinging service will not only archive FOAF, SIOC or DOAP RDF documents, but any others too.
However I will keep the distinction between all RDF documents and FOAF, SIOC and DOAP documents. The pinging service will do extra work to detect, analyze and categorize these 3 types. The reason is simple: explicating and create an architecture to quickly make them available to a lot of smaller web services that would particularly need these ones, that doesn’t care about the others and that doesn’t have the computer power to analyze all of them to find the ones they want. So I will explicit the major ontologies that way.
How it works
Nothing changed except that I archive and export RDF documents that as a RDF/XML header (I don’t validate the RDF documents, I only check if the RDF header is present. I let the validation job to the crawlers. Eventually I could add the feature, but not for now, this is only a pinging system after all).
Bellow is the updated workflow schemas:
Detection of RDF document in HTML files
There is probably 1001 ways to include RDF documents into HTML files.
Right now, I choose to only take into consideration the RDF documents linked with the <link> HTML element. That way I know what RDF document type I should expect, and the only thing I have to do is to make sure that the RDF document is really what the link says it is.
Eventually I will possibly detect and archive RDF documents that are embedded into XHTML files using RDFa, GRDDL, or whatever. Also, I could detect RDF documents embedded into HTML files using techniques like HTML comments, <script> element with XML comments, etc. However, I don’t expect to add such feature in the near future (next weeks) considering my lack of time.
Next things
Many things are coming up with the evolution of PingtheSemanticWeb.com pinging service. Many people are interested to interact with it (to give it some data or to use its data). Tomorrow I will wrote another blog post to announce some things that happened/will happen in the next few weeks in relation with the service.
So far, it has been appreciated by many people.
Bugs and glitches
I fixed many bugs reported by users in the last few days. So if you find one, think that something doesn’t work properly, or whatever, please contact me by email at [fred ( at ) fgiasson.com], so I’ll be able to investigate the issue as soon as possible.
Technorati: Pingthesemanticweb | semantic | web | ping | service | web2.0 | data | ontologies | foaf | rdf | sioc | doap |
tim finin
August 18, 2006 — 10:03 am
It looks like most of the pings are about RSS 1.0 documents. This is not surprising, since it’s easy to congiure virtaully all blogging software to ping a new pingserver (as I just did for oue ebiquity blog). My guess is, however, that most consumers of your ping stream will not be interested in RSS files, even if they include the foaf namespace. I’m not sure how to best address this, but here’s an idea. When you process the file, if it’s an RDF document, check the header it for the 10 or 20 most common RDF namespaces. We can give you a list of these. The add a field to the export format that includes the most common names (we can give you these) of the namespaces. Something like
A ping consumer can then easily filter the pings to select the desired ones and/or remove the undesired ones.
Fred
August 21, 2006 — 8:45 am
Hi Mr. Finin,
I checked that as I told you a couple of days ago, and I think your solution is a good one. So please, could you give me the associative list of the most used ontologies with their most common names (prefix)? You suggested a list of 10 to 20, but if it is possible, I would prefer a list a little bit bigger, like 40 to 50.
Thanks!
Take care,
Salutations,
Fred