By: Fred

Fred — Sun, 04 Feb 2007 17:20:03 +0000

Hi Marc,

Yeah I read the story. What I will do, as I said in my last blog post, is that I will add a repository of available rdf data dump on PTSW, hoping it could prevent such situations in the future. However, you have done the only thing to do: banning the IP from crawling geonames. This is unfortunately the only thing that will really work (semweb or not ;) ).

Take care,

Fred

By: marc

marc — Sat, 03 Feb 2007 19:21:03 +0000

Frédérick,

Not only is a dump for huge datasets preferable from a crawler’s point of view it is also easing strain on the data provider’s resources. Fetching a database with millions of document row by row requires a lot of resources to create and deliver the documents. A semantic web crawler may thus have the effects of a denial-of-service attack. More about a recent episode of a semantic web crawler DDOS in my blog :

http://geonames.wordpress.com/2007/02/03/friendly-fire-semantic-web-crawler-ddos/

Marc

Comments on: RDF dump vs. dereferencable URIs

By: Fred

By: marc