Freshmeat.net now available in DOAP: 43 000 new DOAP projects

Three weeks ago, Rob Cakebread contacted me vis-à-vis some possible issues with the PingtheSemanticWeb.com web service. Rob is involved with the development of Gentoo Linux and he wanted to come-up with a method to let their development teams know when new packages are released for some projects. Since the RSS feeds of Freshmeat.net are limited in length, and that they do not support DOAP, he started to think about a way to convert Freshmeat.net’s data projects into RDF using the DOAP ontology. That way, he could easily create services to track the release of new packages and then increasing the effectiveness of their development teams.

Rob wrote:

I came across ptsw.com when trying to determine how many package indexes are using DOAP right now (so far I’ve only found Python Package Index and O’Reilly’s Code Zoo).

I’m a developer for Gentoo Linux. A website I created (Meatoo) checks Freshmeat twice a day and finds packages that have new releases available. We have a database of all our maintainers grouped into ‘herds’, a Python herd, desktop herd, PHP herd etc. The developers in the herds can query my website by a command-line client that uses XML-RPC, or subscribe to RSS feeds by herd or package name, or read the website itself and see which packages have new releases.

DOAP fits into this because I was thinking about creating DOAP records for each release from each package index and making this available so people can write tools to find out information about software packages easily.

Its how we got in contact. Rob had a practical problem, then he tried to find a way to resolve it and to help other people to resolve it too; and its how he found PingtheSemanticWeb.com and other semantic web related projects and communities (such as the Linked Open Data).

Freshmeat.net in DOAP

Then a couple of days ago, Rob re-contacted me to let me know that the Freshmeat.net’s 43 000 projects description is now available in RDF.

He created a prototype service that converts the data he has aggregated from Freshmeat.net into DOAP. Its project is called DOAPSpace. The idea is to make available the Freshmeat.net projects into DOAP, then to ping PingtheSemanticWeb.com to ultimately make them available on Doapstore.org that is feeded by PTSW.

People can get the DOAP description of each project by going to:

http://doapspace.gentooexperimental.org/doap/<project_name>

There are some examples of URIs:

http://doapspace.gentooexperimental.org/doap/0verkill
http://doapspace.gentooexperimental.org/doap/amanda

RDF dump

I was really pleased to see how Rob managed to generate that data. Then I asked him if a RDF dump of that precious data would eventually be available for download? It is exactly what he is doing at the moment, and as soon as he send me the dump, I will make it available via PingtheSemanticWeb.com. Then, it will be ready to be integrated into the Linking Open Data project.

Content Negotiation

At the same time, Rob added a new feature to its service; a user only has to append the “?zg=1″ parameter to the URL to get redirected to the Zitgist Browser. It was really nice from him to think about that; I really appreciated.

However, I introduced him at how he could use content-negotiation to do that and to make its service compatible with other tools such as other RDF browsers. So I pointed him to the How to Publish Linked Data on the Web draft document so that he can have a better understanding of the content-negotiation process.

Linking Open Data Community and Early Developers

Rob is certainly an early adopter of the Semantic Web. He is a developer that wants to solve problems with methods and technologies. He had the intuition that the DOAP ontology and other semantic web principles and technologies could help him to solve one of its problems. This intuition leaded him to discover what the semantic web community could do to help him.

It’s the kind of user we have to take care of; and that we have to help to release their projects. Its people like Rob that will make the Semantic Web a reality. Without such early adopters, from outside of the Semantic Web Community, the semantic web is probably doomed. We are there now; ready to help developers to integrate semantic web technologies into their projects; to generate their data into RDF and to link it with other data sources. It’s the goal of communities like the Linking Open Data Community and its what we are about to do.

4 Responses to “Freshmeat.net now available in DOAP: 43 000 new DOAP projects”


  1. 1 Tim Berners-Lee Aug 5th, 2007 at 2:03 pm

    Nice dataset … at the moment though the RDF files above http://doapspace.gentooexperimental.org/doap/0verkill return content-type text/html it seems so I can’t browse them with ethe tabulator.

    Tim

  2. 2 Fred Aug 5th, 2007 at 3:37 pm

    Hi Tim,

    Yeah its a nice one, thanks to Rob for it. However, as I said in this post, the next step for Rob is to fix the content-negotiation part of its prototype. I already explained him some things, and the told me that he would fix that as soon as he finished the dump.

    So, I think that this should be fixed sometime this week if Rob have some time.

    Take care,

    Fred

  3. 3 Leo Breebaart Aug 7th, 2007 at 11:13 am

    Perhaps a naive question: one of my Freshmeat-listed projects already had a manually created DOAP-record (accessible from its home page). So now there are two DOAP records for that project floating around the semantic web, only one of which I would consider to be authoritative. How can people (or software) browsing the semantic web (a) discover that the Doapspace record is not the only one out there, and (b) decide which one to trust in case of conflicting information? Or is it simply too early to be asking these kinds of provenance/synchronisation questions?

  4. 4 Fred Aug 8th, 2007 at 8:30 am

    Hi Leo,

    Well, the idea is the following: something or someone (a system or you) have to make the link between the twos. In fact, a owl:sameAs relation has to be defined to specify that the two resources are the same.

    Anyway, I agree that multiple resources describing a same thing is an issue atm. In fact, its not necessarly an issue since they never really have the same descriptions. So, this tell me that the two entities have been described by two different persons (so with different visions (perception) of this Thing).

    But it is where we are now: linking (inter-linking) rdf data (so, this interlinking, alse applies to find and links same entities).

    take care,

    Fred

Leave a Reply






This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about data mining, data integration, data publishing, the semantic Web, my researches and other related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 72 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN