Ping the Semantic Web version 3: a brand new system!

ptswlogo160.gif Pinging and receiving list of newly created and updated RDF resources has never been easier! I am pleased to announce the release of the latest version of Ping the Semantic Web.

In this brand new system you have access to a:

  1. Validated RDF resources
  2. Simplified pings list export system
  3. Faster pinging infrastructure
  4. Brand new user interface
  5. New statistics

1. Validated RDF resources

In the version 2.0, PTSW was doing a pseudo validation of RDF files. In the version 3.0, it fully validates RDF documents. This means that all pings the service export are valid RDF documents.

This is a major upgrade to the system since now all agents requesting pings from PTSW will know that each of them are valid RDF documents. That way, they will save time and bandwidth since they won’t try to process bad RDF documents.

2. Simplified pings list export system

Now all ping consumers need to be registered to the PTSW web service. This simple registration greatly helps consuming pings coming from PTSW. There are the steps to follow to get pings from the server:

  1. The user have to register an account on pingthesemanticweb.com
  2. He has to register the IP address of the server that will download the xml file listing all the latest pings received from the system
  3. Additionally, he has to setup his pings retrieval preferences in the user account section.
  4. The registered web server has to request the xml file at: http://pingthesemanticweb.com/export/

What improved is the way applications get pings. Now a web server only has to request the xml file, and PTSW will take care to created the xml file according to the user’s preferences.

Finally, PTSW is archiving the time of the latest request of the user. Next time the user’s server will request this document, it will receive the xml file with all the pings received by PTSW since its last request.

This is a major improvement since if the user’s web server was down for 2 days, for some reason, it won’t lose any pings since PTSW will send him all the pings received by the service in the last 2 days.

Note: all current Ping the Semantic Web ping consumers have to create an account and change their application accordingly.

3. Faster pinging infrastructure

The web service is now hosted on a much bigger server. We also switched from MySQL to Virtuoso. These changes result in a more powerful service that I estimate to be able to handle up to 5 million pings per day (in the best of the World with fast remote web server delivering the RDF content). In any case, it is probably enough for the next year’s expansion.

4. Brand new user interface

We also spent some time refreshing the user interface of the web service. This new interface will help us to easily integrate new features and sections to the service’s web site along with keeping it appealing to users.

5. New statistics

New statistics on the state of the service are now available.

  1. All stats about Namespaces. This is the list of namespaces used to describe entities in RDF. For example, if a RDF document has an entity types as a sioc:Post, then the SIOC namespace will be added and its stat counter will be incremented by one. There is currently 347 used namespaces know by PTSW.
  2. All stats about Types. This is the number of typed entities defined in each RDF document know by PTSW. For example, if a RDF document has four foaf:Person defined, then four will be added to the counter. If the same entity (URI) is defined in two different RDF documents, the type of the entity will be calculated twice. So take these numbers as a good approximation, but not as an absolute truth. There is currently 2773 types know by PTSW.

Some people will notice that the current numbers in the sidebar are completely different from the ones that were on the old website of the service. They are right, and there is the reason: I pruned the geonames.org and talkdigger.com pings from PTSW.

In fact, when I started the web service, I added these two RDF data dumps to PTSW. At that time, initiatives such as the Linking Open Data Community didn’t exist and people didn’t know how to export their RDF data dumps. So I choose to include them in the PTSW system. Since then, methods improved and things changed. Now RDF data dumps are available directly from these web sites, data dump repository exists, and people don’t use PTSW for that reason. In fact, they use the PTSW exportation feature to synch their service, and not to get complete datasets from them. This said, I pruned all these 7 000 000 documents from the system leaving about 845 000 “wild” RDF documents in the system.

It is the inclusion of these complete data sources that were increasing the stats compared to today’s stats.

Conclusion

When I created Ping the Semantic Web more than one year ago, I hoped developers would use the service to easily find RDF data without crawling the entire. I hoped that this web service would be a vector of semantic web application development. I think that it succeeded in some ways when I think about services such as SIndice and DOAPStore that emerged from the PTSW initiative.

This new version of Ping the Semantic Web tries to go further in that directing: making thinks even simpler for RDF data consumer and giving them a more powerful RDF discovery service.

Note: make sure to refresh the DNS cache of your desktops and servers so that you see the new, and not the old, PTSW web site.

Describing Documents, Articles, Series, Volumes and Conferences using the Bibliographic Ontology

The Bibliographic Ontology let you describe all these things, and much more, in RDF. In the last months the community developing BIBO has been quite fruitful. Many questions have been asked, many have been answered, and things are slowly getting shape.

It is for that reason that I started to create some more examples using the ontology; trying to see how people will use it; etc. I created some examples to see if I could easily describe two articles I wrote in the past few years: (1) and accepted article in a proceeding and (2) a refused article submitted for a conference. I was wondering if the current state of the ontology could easily cope with some weird cases. As you will notice bellow, it nicely described some weird cases that I encountered while describing these articles.

First example: Describing a Series, with volumes and articles

I wanted to describe an article I wrote with Uldis Bojars, Alexandre Passant and John Breslin. This article is part of a proceeding that is published in a series, as a volume (248). The series have a ISSN; however it is only published online (no paper is version available).

There is how BIBO describe such a case:

A Complex series + proceeding + article use case in RDF/XML

The series is a bibo:Series. This series has a title, a short title and a ISSN. Also, it is in relation with its publisher and has a status (published). Finally, this series is put in relation with its volume and a web document (a web page) that is a manifestation of the series.

This is something to have in mind for the remaining of this blog post: in BIBO, a web page is a document, like any other document. The only difference between a paper book and a webpage is their identifier(locator): a published paper book will have a ISBN, and a web page will have a URL. This said, we easily relates different documents’ formats using dcterms:relation. That way, we explicit a relation between two different documents (event if they only difference is their format (printer on paper, html, pdf, etc)).

After I described the proceeding that has been published. It is a bibo:Proceeding that has some properties, but particularly a bibo:volume property that describe its location into the series. Finally, the editors of the proceeding are described and are related to the proceeding they edited via a bibo:Contribution.

Contributions are at the core of the ontology; they are defined as:

“The contribution a person, group or organization makes to the creation or realization of a work.”

So, an editor and an author are contributors to the creation or realization of a work (a document).

Finally I described the article that is a bibo:Article. I described its properties, its authors, and the relation between the authors and the article. I also described its status: it has been peer-reviewed and has been published.

The links between the series, the proceeding and the article has been done by re-using the properties dcterms:hasPart and dcterms:isPartOf.

Second example: a rejected article submitted to a conference

For that second example, I wanted to describe an article I wrote a couple of years ago, that I submitted to a conference and that has been rejected. So, I had to describe the article, the conference, and the fact that it has been rejected after peer-reviewing.

There is how BIBO describes this use case:

Rejected article submitted to a conference in RDF/XML

This is basically the same thing has the above: describing a document with its authors.

However, in that case, I had to describe a conference. The Bibliographic Ontology use The Event Ontology to describe such things. The conference event has been described using the even:Event class, along with event:agent that relates the event with the organization that created the event and event:place that locates the event in the World.

However, the description of conference events will change in the next few weeks since Yves Raimond and me will create an extension module to this ontology to specifically describes conference events (so, we will talk about event:Conference, and event:organizer and event:sponsors, etc.).

Finally, I had something to say about this article I wrote. To say it, I created another type of document called a bibo:Note to annotate this document with some comments. A bibo:Note is a document of its own, like a bibo:Article. However, I relates the two documents (the bibo:Note and the bibo:Article) using the bibo:annotates property. That way, I describe the fact that a document is an annotation to another document.

Conclusion

These two examples explain how The Bibliographic Ontology can be used to describe some complex bibliographic use cases. It is just a start, and many questions are yet to be answered by the bibliographic ontology. However, many things are going forward and if you have been interested by this demonstration, I can only suggest you to join the community supporting BIBO’s development and help it evolving.

The Music Ontology revision 1.12

The Music Ontology is much easier to read with the new documentation and the normalized terms. In fact, Yves Raimond worked hard on this new release with some help from Chris, me and other people on the mailing list.

The list of major changes is available on Yves’s blog post about the release. Also, the complete list of changes is available in the change log.

Some things have yet to be finished related to this new revision. We have to update the examples on the wiki. And I have to modify the Musicbrainz RDF view that the generated RDF documents reflect these changes.

Finally this new revision is a major upgrade related to the user-friendliness of the ontology. Terms, descriptions and documentation of the ontology should be much clearer now.

Freshmeat.net now available in DOAP: 43 000 new DOAP projects

Three weeks ago, Rob Cakebread contacted me vis-à-vis some possible issues with the PingtheSemanticWeb.com web service. Rob is involved with the development of Gentoo Linux and he wanted to come-up with a method to let their development teams know when new packages are released for some projects. Since the RSS feeds of Freshmeat.net are limited in length, and that they do not support DOAP, he started to think about a way to convert Freshmeat.net’s data projects into RDF using the DOAP ontology. That way, he could easily create services to track the release of new packages and then increasing the effectiveness of their development teams.

Rob wrote:

I came across ptsw.com when trying to determine how many package indexes are using DOAP right now (so far I’ve only found Python Package Index and O’Reilly’s Code Zoo).

I’m a developer for Gentoo Linux. A website I created (Meatoo) checks Freshmeat twice a day and finds packages that have new releases available. We have a database of all our maintainers grouped into ‘herds’, a Python herd, desktop herd, PHP herd etc. The developers in the herds can query my website by a command-line client that uses XML-RPC, or subscribe to RSS feeds by herd or package name, or read the website itself and see which packages have new releases.

DOAP fits into this because I was thinking about creating DOAP records for each release from each package index and making this available so people can write tools to find out information about software packages easily.

Its how we got in contact. Rob had a practical problem, then he tried to find a way to resolve it and to help other people to resolve it too; and its how he found PingtheSemanticWeb.com and other semantic web related projects and communities (such as the Linked Open Data).

Freshmeat.net in DOAP

Then a couple of days ago, Rob re-contacted me to let me know that the Freshmeat.net’s 43 000 projects description is now available in RDF.

He created a prototype service that converts the data he has aggregated from Freshmeat.net into DOAP. Its project is called DOAPSpace. The idea is to make available the Freshmeat.net projects into DOAP, then to ping PingtheSemanticWeb.com to ultimately make them available on Doapstore.org that is feeded by PTSW.

People can get the DOAP description of each project by going to:

http://doapspace.gentooexperimental.org/doap/<project_name>

There are some examples of URIs:

http://doapspace.gentooexperimental.org/doap/0verkill
http://doapspace.gentooexperimental.org/doap/amanda

RDF dump

I was really pleased to see how Rob managed to generate that data. Then I asked him if a RDF dump of that precious data would eventually be available for download? It is exactly what he is doing at the moment, and as soon as he send me the dump, I will make it available via PingtheSemanticWeb.com. Then, it will be ready to be integrated into the Linking Open Data project.

Content Negotiation

At the same time, Rob added a new feature to its service; a user only has to append the “?zg=1” parameter to the URL to get redirected to the Zitgist Browser. It was really nice from him to think about that; I really appreciated.

However, I introduced him at how he could use content-negotiation to do that and to make its service compatible with other tools such as other RDF browsers. So I pointed him to the How to Publish Linked Data on the Web draft document so that he can have a better understanding of the content-negotiation process.

Linking Open Data Community and Early Developers

Rob is certainly an early adopter of the Semantic Web. He is a developer that wants to solve problems with methods and technologies. He had the intuition that the DOAP ontology and other semantic web principles and technologies could help him to solve one of its problems. This intuition leaded him to discover what the semantic web community could do to help him.

It’s the kind of user we have to take care of; and that we have to help to release their projects. Its people like Rob that will make the Semantic Web a reality. Without such early adopters, from outside of the Semantic Web Community, the semantic web is probably doomed. We are there now; ready to help developers to integrate semantic web technologies into their projects; to generate their data into RDF and to link it with other data sources. It’s the goal of communities like the Linking Open Data Community and its what we are about to do.

News at Zitgist: the Browser, PTSW, the Bibliographic Ontology and the Query Service

It is not because we had some issues with the Zitgist Browser‘s server that things stopped at Zitgist. In fact, many projects evolved at the same time and I outline some of these evolutions bellow.

New version of the Zitgist Browser

A new version of the browser is already on the way. In fact, the pre-release version of the browser was a use case; a prototype. Now that we know that it works and that we faced most of the issues that have to be taken into account to develop such a service, we hired Christopher Stewart to work on the next version of the browser. He is already well into the problem now, so you could expect a release of this new version sooner than you could be expecting. At first, there won’t be many modifications at the user interface level, however, many things will be introduced in this new version that will help us to push the service at another level in the future.

New version of Ping the Semantic Web

The version 3.0 of the PingtheSemanticWeb.com web service should be put online next week. It will be a totally new version of the service. It won’t use MySQL anymore; Virtuoso has replaced it. The service will now fully validate RDF files before including them in the index. More stats will be available too. It is much faster (as long as remote servers are fast too) and I estimate that this only server could handle between 5 to 10 million pings per day (enough for the next year’s expansion). This said, the service will be pushed at another level and be ready for more serious traffic. After its release, a daily dump of all links will be produced as well.

The first draft of the Bibliographic Ontology

The Bibliographic Ontology Specification Group is on fire. We are now 55 members and generated 264 posts in July only. Many things are going on here and the ontology is well underway. We should expect to release a first draft of the ontology sometime in August. If you are interested in bibliographic things, I think it’s a good place to be.

The Zitgist Semantic Web Query Service

Finally, Zitgist’s Semantic Web Query Service should be available for alpha subscribed users sometime in September. You can register to get your account here. Also, take a look at what I wrote about vis-à-vis this search module (many things evolved since, but it’s a good introduction to the service).

Conclusion

So, many things are going on at Zitgist and many exiting things should happen this autumn, so stay tuned!