A New Home for UMBEL Web Services

umbel_wsEight months ago we announced the dissolution of Zitgist LLC. This event led to the creation of a sandbox to keep alive all the online assets of the company. Since this sandbox server was not owned by Structured Dynamics, it was becoming hard for us to update UMBEL and its online services. It is why we took the time to move the services back on to our new servers.

A New Home

sd_logo_260Structured Dynamics LLC now hosts a new version for the UMBEL Web services. From the main menu at the SD Web site you can access these services under the “umbel ws” menu option (you can also bookmark the Web services site at umbel.structureddynamics.com or ws.umbel.org.)

This move of UMBEL’s Web services to a new home will make the future upgrade of UMBEL easier, and this will make the maintenance of the Web services endpoints easier as well. With this move, I am pleased to announce the release of five initial Web services and one visualization tool:

Lookup Web Services:

Inference Engine Web Services:

SPARQL endpoint Web Service:

Visual Tool:

Note that the visual tool is using Moritz Stefaner’s Relation Browser.


Ping the Semantic Web

ptswlogo160.gifAdditionally, the Ping the Semantic Web RDF pinging service is now the property of OpenLink Software Inc. OpenLink is now hosting, maintaining and developing the service.

Ping the Semantic Web version 3: a brand new system!

ptswlogo160.gif Pinging and receiving list of newly created and updated RDF resources has never been easier! I am pleased to announce the release of the latest version of Ping the Semantic Web.

In this brand new system you have access to a:

  1. Validated RDF resources
  2. Simplified pings list export system
  3. Faster pinging infrastructure
  4. Brand new user interface
  5. New statistics

1. Validated RDF resources

In the version 2.0, PTSW was doing a pseudo validation of RDF files. In the version 3.0, it fully validates RDF documents. This means that all pings the service export are valid RDF documents.

This is a major upgrade to the system since now all agents requesting pings from PTSW will know that each of them are valid RDF documents. That way, they will save time and bandwidth since they won’t try to process bad RDF documents.

2. Simplified pings list export system

Now all ping consumers need to be registered to the PTSW web service. This simple registration greatly helps consuming pings coming from PTSW. There are the steps to follow to get pings from the server:

  1. The user have to register an account on pingthesemanticweb.com
  2. He has to register the IP address of the server that will download the xml file listing all the latest pings received from the system
  3. Additionally, he has to setup his pings retrieval preferences in the user account section.
  4. The registered web server has to request the xml file at: http://pingthesemanticweb.com/export/

What improved is the way applications get pings. Now a web server only has to request the xml file, and PTSW will take care to created the xml file according to the user’s preferences.

Finally, PTSW is archiving the time of the latest request of the user. Next time the user’s server will request this document, it will receive the xml file with all the pings received by PTSW since its last request.

This is a major improvement since if the user’s web server was down for 2 days, for some reason, it won’t lose any pings since PTSW will send him all the pings received by the service in the last 2 days.

Note: all current Ping the Semantic Web ping consumers have to create an account and change their application accordingly.

3. Faster pinging infrastructure

The web service is now hosted on a much bigger server. We also switched from MySQL to Virtuoso. These changes result in a more powerful service that I estimate to be able to handle up to 5 million pings per day (in the best of the World with fast remote web server delivering the RDF content). In any case, it is probably enough for the next year’s expansion.

4. Brand new user interface

We also spent some time refreshing the user interface of the web service. This new interface will help us to easily integrate new features and sections to the service’s web site along with keeping it appealing to users.

5. New statistics

New statistics on the state of the service are now available.

  1. All stats about Namespaces. This is the list of namespaces used to describe entities in RDF. For example, if a RDF document has an entity types as a sioc:Post, then the SIOC namespace will be added and its stat counter will be incremented by one. There is currently 347 used namespaces know by PTSW.
  2. All stats about Types. This is the number of typed entities defined in each RDF document know by PTSW. For example, if a RDF document has four foaf:Person defined, then four will be added to the counter. If the same entity (URI) is defined in two different RDF documents, the type of the entity will be calculated twice. So take these numbers as a good approximation, but not as an absolute truth. There is currently 2773 types know by PTSW.

Some people will notice that the current numbers in the sidebar are completely different from the ones that were on the old website of the service. They are right, and there is the reason: I pruned the geonames.org and talkdigger.com pings from PTSW.

In fact, when I started the web service, I added these two RDF data dumps to PTSW. At that time, initiatives such as the Linking Open Data Community didn’t exist and people didn’t know how to export their RDF data dumps. So I choose to include them in the PTSW system. Since then, methods improved and things changed. Now RDF data dumps are available directly from these web sites, data dump repository exists, and people don’t use PTSW for that reason. In fact, they use the PTSW exportation feature to synch their service, and not to get complete datasets from them. This said, I pruned all these 7 000 000 documents from the system leaving about 845 000 “wild” RDF documents in the system.

It is the inclusion of these complete data sources that were increasing the stats compared to today’s stats.

Conclusion

When I created Ping the Semantic Web more than one year ago, I hoped developers would use the service to easily find RDF data without crawling the entire. I hoped that this web service would be a vector of semantic web application development. I think that it succeeded in some ways when I think about services such as SIndice and DOAPStore that emerged from the PTSW initiative.

This new version of Ping the Semantic Web tries to go further in that directing: making thinks even simpler for RDF data consumer and giving them a more powerful RDF discovery service.

Note: make sure to refresh the DNS cache of your desktops and servers so that you see the new, and not the old, PTSW web site.

Freshmeat.net now available in DOAP: 43 000 new DOAP projects

Three weeks ago, Rob Cakebread contacted me vis-à-vis some possible issues with the PingtheSemanticWeb.com web service. Rob is involved with the development of Gentoo Linux and he wanted to come-up with a method to let their development teams know when new packages are released for some projects. Since the RSS feeds of Freshmeat.net are limited in length, and that they do not support DOAP, he started to think about a way to convert Freshmeat.net’s data projects into RDF using the DOAP ontology. That way, he could easily create services to track the release of new packages and then increasing the effectiveness of their development teams.

Rob wrote:

I came across ptsw.com when trying to determine how many package indexes are using DOAP right now (so far I’ve only found Python Package Index and O’Reilly’s Code Zoo).

I’m a developer for Gentoo Linux. A website I created (Meatoo) checks Freshmeat twice a day and finds packages that have new releases available. We have a database of all our maintainers grouped into ‘herds’, a Python herd, desktop herd, PHP herd etc. The developers in the herds can query my website by a command-line client that uses XML-RPC, or subscribe to RSS feeds by herd or package name, or read the website itself and see which packages have new releases.

DOAP fits into this because I was thinking about creating DOAP records for each release from each package index and making this available so people can write tools to find out information about software packages easily.

Its how we got in contact. Rob had a practical problem, then he tried to find a way to resolve it and to help other people to resolve it too; and its how he found PingtheSemanticWeb.com and other semantic web related projects and communities (such as the Linked Open Data).

Freshmeat.net in DOAP

Then a couple of days ago, Rob re-contacted me to let me know that the Freshmeat.net’s 43 000 projects description is now available in RDF.

He created a prototype service that converts the data he has aggregated from Freshmeat.net into DOAP. Its project is called DOAPSpace. The idea is to make available the Freshmeat.net projects into DOAP, then to ping PingtheSemanticWeb.com to ultimately make them available on Doapstore.org that is feeded by PTSW.

People can get the DOAP description of each project by going to:

http://doapspace.gentooexperimental.org/doap/<project_name>

There are some examples of URIs:

http://doapspace.gentooexperimental.org/doap/0verkill
http://doapspace.gentooexperimental.org/doap/amanda

RDF dump

I was really pleased to see how Rob managed to generate that data. Then I asked him if a RDF dump of that precious data would eventually be available for download? It is exactly what he is doing at the moment, and as soon as he send me the dump, I will make it available via PingtheSemanticWeb.com. Then, it will be ready to be integrated into the Linking Open Data project.

Content Negotiation

At the same time, Rob added a new feature to its service; a user only has to append the “?zg=1” parameter to the URL to get redirected to the Zitgist Browser. It was really nice from him to think about that; I really appreciated.

However, I introduced him at how he could use content-negotiation to do that and to make its service compatible with other tools such as other RDF browsers. So I pointed him to the How to Publish Linked Data on the Web draft document so that he can have a better understanding of the content-negotiation process.

Linking Open Data Community and Early Developers

Rob is certainly an early adopter of the Semantic Web. He is a developer that wants to solve problems with methods and technologies. He had the intuition that the DOAP ontology and other semantic web principles and technologies could help him to solve one of its problems. This intuition leaded him to discover what the semantic web community could do to help him.

It’s the kind of user we have to take care of; and that we have to help to release their projects. Its people like Rob that will make the Semantic Web a reality. Without such early adopters, from outside of the Semantic Web Community, the semantic web is probably doomed. We are there now; ready to help developers to integrate semantic web technologies into their projects; to generate their data into RDF and to link it with other data sources. It’s the goal of communities like the Linking Open Data Community and its what we are about to do.

News at Zitgist: the Browser, PTSW, the Bibliographic Ontology and the Query Service

It is not because we had some issues with the Zitgist Browser‘s server that things stopped at Zitgist. In fact, many projects evolved at the same time and I outline some of these evolutions bellow.

New version of the Zitgist Browser

A new version of the browser is already on the way. In fact, the pre-release version of the browser was a use case; a prototype. Now that we know that it works and that we faced most of the issues that have to be taken into account to develop such a service, we hired Christopher Stewart to work on the next version of the browser. He is already well into the problem now, so you could expect a release of this new version sooner than you could be expecting. At first, there won’t be many modifications at the user interface level, however, many things will be introduced in this new version that will help us to push the service at another level in the future.

New version of Ping the Semantic Web

The version 3.0 of the PingtheSemanticWeb.com web service should be put online next week. It will be a totally new version of the service. It won’t use MySQL anymore; Virtuoso has replaced it. The service will now fully validate RDF files before including them in the index. More stats will be available too. It is much faster (as long as remote servers are fast too) and I estimate that this only server could handle between 5 to 10 million pings per day (enough for the next year’s expansion). This said, the service will be pushed at another level and be ready for more serious traffic. After its release, a daily dump of all links will be produced as well.

The first draft of the Bibliographic Ontology

The Bibliographic Ontology Specification Group is on fire. We are now 55 members and generated 264 posts in July only. Many things are going on here and the ontology is well underway. We should expect to release a first draft of the ontology sometime in August. If you are interested in bibliographic things, I think it’s a good place to be.

The Zitgist Semantic Web Query Service

Finally, Zitgist’s Semantic Web Query Service should be available for alpha subscribed users sometime in September. You can register to get your account here. Also, take a look at what I wrote about vis-à-vis this search module (many things evolved since, but it’s a good introduction to the service).

Conclusion

So, many things are going on at Zitgist and many exiting things should happen this autumn, so stay tuned!

Integration of Zotero in a Semantic Web environment to find, search and browse the Web’s citations

Zotero is a great FireFox add-on that lets its users find, search, edit and create citations they find on the Web while browsing it. All the power of Zotero resides in its “translation modules“. These modules will detect citations in various types of web pages. When it detects one of these citations, it will notify its users to give them the opportunity to save them.

What interest me is that Zotero already use some ontologies to export users’ citations libraries using RDF. When I noticed that I started to wonder: what could we do with Zotero now?

The Zotero vision

Zotero is the best-integrated citation tool for the Web I know. A phenomenal amount of citations can be discovered on the Web via Zotero users community.

Remember what we have done with the Semantic Radar a couple of months ago? This FireFox add-on was detecting SIOC RDF documents in Web page. Then I contacted Uldis Bojar to ask him to ping PingtheSemanticWeb.com each time a user was detecting a RDF file while he was browsing the Web. Now a good source of RDF data pinged to PTSW come from Semantic Radar users. This is a sort of “social semantic web discovering” technique.

What I would like to do is the same thing but for Zotero.

zotero-ptsw-zitgist.jpg
[Click to enlarge to full size]

  1. Zotero users browse the Web, discover citations and save them into their personal libraries.
  2. Each time a Zotero instance discover a citation, it would send the URL where we can find it to PingtheSemanticWeb.com.
    1. Note: the user should be aware of that functionality via an option into Zotero that would explains him what this feature it is all about, and to gives him the possibility to disable it.
    2. Note: Zotero would ping PTSW each time it detects a citation (so that the icon appears in the FireFox’s URL bar), and not each time a user save it.
  3. Via the Virtuoso Sponger, PingtheSemanticWeb.com will check the incoming URL from Zotero users and will check to find citations too. If a citation is found, it will be added to its list of know citations and archive their content.
  4. PingtheSemanticWeb.com will then send the new citations to Zitgist so that it can include them into its database.
    1. Note: here Zitgist could be replaced by any web service wanting them. Remember that PTSW act as a data-multiplexer.
  5. Via Zitgist (that is a semantic web search engine), users from around the World will be able to search among these citations (discovered by Zotero users) and to browse them.

Zitgist has a Zotero citation provider

What is fantastic here is that Zitgist become a source of citations. So if a Zitgist user has Zotero installed, then he will be able to batch-save the list of results returned by Zitgist; and if the user is browsing Zitgist’s citations, he will be able to include them into their Zotero instance like if Zitgist would be Amazon.com or any other citations web sites.

That way, Zotero’s found data would be accessible to Zotero users via Zitgist that would then become a citations provider (mainly feed by the Zotero community).

You see the interaction?

What have to be developed?

Some things have to be developed to make that vision working. No major development, but only a couple of features to develop on each system.

Integration of Ping the Semantic Web into Zotero

The integration of Ping the Semantic Web into Zotero is quite straightforward.

Pinging PingtheSemanticWeb.com via a web service

The first step is to make Zotero notify PTSW each time it comes across a citation. It has to send the URL of that/these citation(s) via XML-RPC or REST.

That is it. Each time Zotero detect a citation, it sends a simple ping to PTSW via an XML-RPC or REST request.

Adding a pinging option to Zotero

Another thing that Zotero would have to add to their add-on is an option that would gives the possibility to their users to disable that feature in case they don’t want to send a notification to PTSW each time they discover a citation on a Web page while they are browsing the Web.

Development of Zotero translators into Sponger Metadata Cartridge

The biggest development effort that would have to be done is to convert the Zotero translators into Virtuoso Sponger’s Metadata Cartridge.

Right now, Metadata Cartridge exists for: Google Base, Flickr, microformats (hReview, hCalendar, etc.), etc. These cartridges are the same things as “Zotero translators” but for the Virtuoso Sponger. By developing these cartridges, everybody running Virtuoso will be able to see these citations (from Amazon, etc.) as RDF data (mapped using some ontologies).

Documentation about how to develop these cartridges will be available in the coming days. From there, we would be able to setup an effort to convert the Zotero Translators into Spongers Metadata Cartridges.

Conclusion

This is the vision I have of the integration of Zotero into the current Semantic Web environment that exists. Any ideas, suggestions, collaboration propositions would be warmly welcome.

Note: a discussion about this subject started on Zotero’s web forum