Ping the Semantic Web.com update: Now support any type of RDF documents

 

    The Update

I have the pleasure to announce you that I updated PingtheSemanticWeb.com pinging service today. Now any type of RDF document will be archived and exported by the service.

I talked about it with Uldis; I thought about it; and people were right to ask me the question: why don’t you archive every RDF documents?

So there we are, the pinging service will not only archive FOAF, SIOC or DOAP RDF documents, but any others too.

However I will keep the distinction between all RDF documents and FOAF, SIOC and DOAP documents. The pinging service will do extra work to detect, analyze and categorize these 3 types. The reason is simple: explicating and create an architecture to quickly make them available to a lot of smaller web services that would particularly need these ones, that doesn’t care about the others and that doesn’t have the computer power to analyze all of them to find the ones they want. So I will explicit the major ontologies that way.

 

How it works

Nothing changed except that I archive and export RDF documents that as a RDF/XML header (I don’t validate the RDF documents, I only check if the RDF header is present. I let the validation job to the crawlers. Eventually I could add the feature, but not for now, this is only a pinging system after all).

Bellow is the updated workflow schemas:

 

 

Detection of RDF document in HTML files

There is probably 1001 ways to include RDF documents into HTML files.

Right now, I choose to only take into consideration the RDF documents linked with the <link> HTML element. That way I know what RDF document type I should expect, and the only thing I have to do is to make sure that the RDF document is really what the link says it is.

Eventually I will possibly detect and archive RDF documents that are embedded into XHTML files using RDFa, GRDDL, or whatever. Also, I could detect RDF documents embedded into HTML files using techniques like HTML comments, <script> element with XML comments, etc. However, I don’t expect to add such feature in the near future (next weeks) considering my lack of time.

 

Next things

Many things are coming up with the evolution of PingtheSemanticWeb.com pinging service. Many people are interested to interact with it (to give it some data or to use its data). Tomorrow I will wrote another blog post to announce some things that happened/will happen in the next few weeks in relation with the service.

So far, it has been appreciated by many people.

 

Bugs and glitches

I fixed many bugs reported by users in the last few days. So if you find one, think that something doesn’t work properly, or whatever, please contact me by email at [fred ( at ) fgiasson.com], so I’ll be able to investigate the issue as soon as possible.

Technorati: | | | | | | | | | | | |

Developments with PingtheSemanticWeb.com

PingtheSemanticWeb.com is 3 days old and many people already started to take a look at it. The purpose of this blog post is to know what are the next steps, what the open questions are and what changed since.

    

What changed?

In the last 3 days I fixed some little bugs, I upgraded the auto-discovery feature and I corrected the grammar (thanks to Uldis Bojars).

 

The open question: the export feature

The file format used to export the list of pings is still an open question. Right now, I am using a simple XML file (with two elements) to export the list of pings.

Some people wonder why I don’t use a custom RDF document, a Scutter Vocab document or a RSS document?

Christoph Görn proposed this implementation for the export feature

Personally the problem I have by exporting the list of pings in RDF instead of XML is the overhead (in number of characters) it adds without seeing significant benefit to do so. Read the comments on this blog post to be aware of the current positions on the question.

Please leave your comments about this question on this blog post. Personally I’ll wait until Tim Finin (of Swoogle) contact me with the method they wish to use to make Swoogle interacting with PingtheSemanticWeb.com. However, any new opinions are welcome.

 

Future developments

 

SIOC detector

Uldis is currently working on a new version of his SIOC detector. This new version will detect SIOC, DOAP and FOAF files on web pages (By the way, the SIOC detector is a FireFox plug-in). If the detector finds an instance of these ontologies on a web page, it will instantly ping PingtheSemanticWeb.com pinging service.

It will be a really great (and easy) way to find new documents. By example, if 100 people install that plug-in on their FireFox browser, each time one of those find a SIOC, DOAP or FOAF document, while they are surfing the web, the pinging server will be noticed.

 

Thanks

I would like to thanks Uldis, Alex Passant, Christoph and Harry Chen (do I forgot somebody) for their ideas, work and writing on the project.

 

Technorati: | | | | | | | | |

Ping the Semantic Web.com: a pinging service for the Semantic Web

 

    One of the problems I found with the semantic web is how it could be difficult to find new and fresh data. Recently I was confronted with a problem: how to notify a web service that Talk Digger had new and updted semantic web data ready to be crawled (the SIOC and FOAF ontology for people familiar with semantic web technologies).

Then I questioned myself about why nobody, at my knowledge, developed a sort of weblogs.com or pingerati.net pinging service for semantic web documents? This solution already proved that it is working considering that weblogs.com archive and export millions of pings every day.

 

What is PingtheSemanticWeb.com?

PingtheSemanticWeb.com is a web service archiving the location of recently created/updated FOAF, DOAP or SIOC RDF documents on the Web. If one of those documents is updated, its author can notify the service that the document have been updated by pinging it with the URL of the document.

PingtheSemanticWeb.com is used by crawlers or other type of software agents to know when and where the latest updated FOAF, DOAP and SIOC documents can be found. So it requests a list of recently updated documents as a starting location to crawl the semantic web.

More information about supported ontologies can be found here:

 

Using the Bookmarklet

I greatly suggest to anyone to use pingthesemanticweb.com’s Bookmarklet. You only have to install this bookmarklet in your browser, and click on it from any Web page. If a FOAF, SIOC or DOAP document is found, it will be immediately indexed by the pinging service.

It is the easiest way for anyone to help PingtheSemanticWeb.com to find new documents to index.

 

How to install the Bookmarklet

Read the instructions on how to install the Bookmarklet (Browser Button) into your browser.

 

How does it works

You can use the URL of a HTML or RDF document when pinging PingtheSemanticWeb.com web service. If the service found that the URL points to a HTML document, it will check if it can find a link to a FOAF, a DOAP or a SIOC rdf document. If it founds one, it will follows the link and check the RDF document to see if SIOC, DOAP and/or FOAF elements are defined in the document. If the service found that the RDF document has SIOC, DOAP and/or FOAF elements, it will archive the ping and make it available to crawlers the export files. Otherwise it will discard it.

 

 

Custom needs, suggestions and bug reports

This service is new, so if you have any suggestions to improve it, if you find any bugs while pinging URLs or importing ping lists, or if you have any custom needs for you semantic web crawler of software agents, please contact me by email [fred ( at ) fgiasson.com], that way I’ll be able to help you out as quickly as possible.

Technorati: | | | | | | |

Supervized Search Indexing with Yahoo! Search Builder

Yahoo! Search Builder: The idea is great: the power of Yahoo!’s search engine with its colossal database with all the advantages (no spam) of supervised indexing. In fact, niche networks (groupd of people) will probably use this new service to make search engines for their niche domains and will meticulously add new crawlable sources over time. That way, no spam website will be indexed, the results will be much more accurate and useful and the result will be that users will spend less searching time.

Other search engines [Rollyo and Eurekster] already do that. The main difference is that they developed “social” features around the search results and Yahoo! didn’t. Some people think it is sad, but personally I think that Yahoo! just don’t care. Social features are cool, but for some purposes only, not for everything. But personally, the big difference is Yahoo!’s database compared to Rollyo’s and Eurekster’s.

Technorati: | | | | | |