Past Projects – Page 2 – Frederick Giasson

Ping the Semantic Web version 3: a brand new system!

August 20, 2007August 20, 2007 Frederick Giasson

Pinging and receiving list of newly created and updated RDF resources has never been easier! I am pleased to announce the release of the latest version of Ping the Semantic Web.

In this brand new system you have access to a:

Validated RDF resources
Simplified pings list export system
Faster pinging infrastructure
Brand new user interface
New statistics

1. Validated RDF resources

In the version 2.0, PTSW was doing a pseudo validation of RDF files. In the version 3.0, it fully validates RDF documents. This means that all pings the service export are valid RDF documents.

This is a major upgrade to the system since now all agents requesting pings from PTSW will know that each of them are valid RDF documents. That way, they will save time and bandwidth since they won’t try to process bad RDF documents.

2. Simplified pings list export system

Now all ping consumers need to be registered to the PTSW web service. This simple registration greatly helps consuming pings coming from PTSW. There are the steps to follow to get pings from the server:

The user have to register an account on pingthesemanticweb.com
He has to register the IP address of the server that will download the xml file listing all the latest pings received from the system
Additionally, he has to setup his pings retrieval preferences in the user account section.
The registered web server has to request the xml file at: http://pingthesemanticweb.com/export/

What improved is the way applications get pings. Now a web server only has to request the xml file, and PTSW will take care to created the xml file according to the user’s preferences.

Finally, PTSW is archiving the time of the latest request of the user. Next time the user’s server will request this document, it will receive the xml file with all the pings received by PTSW since its last request.

This is a major improvement since if the user’s web server was down for 2 days, for some reason, it won’t lose any pings since PTSW will send him all the pings received by the service in the last 2 days.

Note: all current Ping the Semantic Web ping consumers have to create an account and change their application accordingly.

3. Faster pinging infrastructure

The web service is now hosted on a much bigger server. We also switched from MySQL to Virtuoso. These changes result in a more powerful service that I estimate to be able to handle up to 5 million pings per day (in the best of the World with fast remote web server delivering the RDF content). In any case, it is probably enough for the next year’s expansion.

4. Brand new user interface

We also spent some time refreshing the user interface of the web service. This new interface will help us to easily integrate new features and sections to the service’s web site along with keeping it appealing to users.

5. New statistics

New statistics on the state of the service are now available.

All stats about Namespaces. This is the list of namespaces used to describe entities in RDF. For example, if a RDF document has an entity types as a sioc:Post, then the SIOC namespace will be added and its stat counter will be incremented by one. There is currently 347 used namespaces know by PTSW.
All stats about Types. This is the number of typed entities defined in each RDF document know by PTSW. For example, if a RDF document has four foaf:Person defined, then four will be added to the counter. If the same entity (URI) is defined in two different RDF documents, the type of the entity will be calculated twice. So take these numbers as a good approximation, but not as an absolute truth. There is currently 2773 types know by PTSW.

Some people will notice that the current numbers in the sidebar are completely different from the ones that were on the old website of the service. They are right, and there is the reason: I pruned the geonames.org and talkdigger.com pings from PTSW.

In fact, when I started the web service, I added these two RDF data dumps to PTSW. At that time, initiatives such as the Linking Open Data Community didn’t exist and people didn’t know how to export their RDF data dumps. So I choose to include them in the PTSW system. Since then, methods improved and things changed. Now RDF data dumps are available directly from these web sites, data dump repository exists, and people don’t use PTSW for that reason. In fact, they use the PTSW exportation feature to synch their service, and not to get complete datasets from them. This said, I pruned all these 7 000 000 documents from the system leaving about 845 000 “wild” RDF documents in the system.

It is the inclusion of these complete data sources that were increasing the stats compared to today’s stats.

Conclusion

When I created Ping the Semantic Web more than one year ago, I hoped developers would use the service to easily find RDF data without crawling the entire. I hoped that this web service would be a vector of semantic web application development. I think that it succeeded in some ways when I think about services such as SIndice and DOAPStore that emerged from the PTSW initiative.

This new version of Ping the Semantic Web tries to go further in that directing: making thinks even simpler for RDF data consumer and giving them a more powerful RDF discovery service.

Note: make sure to refresh the DNS cache of your desktops and servers so that you see the new, and not the old, PTSW web site.

Freshmeat.net now available in DOAP: 43 000 new DOAP projects

August 4, 2007 Frederick Giasson

Three weeks ago, Rob Cakebread contacted me vis-à-vis some possible issues with the PingtheSemanticWeb.com web service. Rob is involved with the development of Gentoo Linux and he wanted to come-up with a method to let their development teams know when new packages are released for some projects. Since the RSS feeds of Freshmeat.net are limited in length, and that they do not support DOAP, he started to think about a way to convert Freshmeat.net’s data projects into RDF using the DOAP ontology. That way, he could easily create services to track the release of new packages and then increasing the effectiveness of their development teams.

Rob wrote:

I came across ptsw.com when trying to determine how many package indexes are using DOAP right now (so far I’ve only found Python Package Index and O’Reilly’s Code Zoo).

I’m a developer for Gentoo Linux. A website I created (Meatoo) checks Freshmeat twice a day and finds packages that have new releases available. We have a database of all our maintainers grouped into ‘herds’, a Python herd, desktop herd, PHP herd etc. The developers in the herds can query my website by a command-line client that uses XML-RPC, or subscribe to RSS feeds by herd or package name, or read the website itself and see which packages have new releases.

DOAP fits into this because I was thinking about creating DOAP records for each release from each package index and making this available so people can write tools to find out information about software packages easily.

Its how we got in contact. Rob had a practical problem, then he tried to find a way to resolve it and to help other people to resolve it too; and its how he found PingtheSemanticWeb.com and other semantic web related projects and communities (such as the Linked Open Data).

Freshmeat.net in DOAP

Then a couple of days ago, Rob re-contacted me to let me know that the Freshmeat.net’s 43 000 projects description is now available in RDF.

He created a prototype service that converts the data he has aggregated from Freshmeat.net into DOAP. Its project is called DOAPSpace. The idea is to make available the Freshmeat.net projects into DOAP, then to ping PingtheSemanticWeb.com to ultimately make them available on Doapstore.org that is feeded by PTSW.

People can get the DOAP description of each project by going to:

http://doapspace.gentooexperimental.org/doap/<project_name>

There are some examples of URIs:

http://doapspace.gentooexperimental.org/doap/0verkill
http://doapspace.gentooexperimental.org/doap/amanda

RDF dump

I was really pleased to see how Rob managed to generate that data. Then I asked him if a RDF dump of that precious data would eventually be available for download? It is exactly what he is doing at the moment, and as soon as he send me the dump, I will make it available via PingtheSemanticWeb.com. Then, it will be ready to be integrated into the Linking Open Data project.

Content Negotiation

At the same time, Rob added a new feature to its service; a user only has to append the “?zg=1” parameter to the URL to get redirected to the Zitgist Browser. It was really nice from him to think about that; I really appreciated.

However, I introduced him at how he could use content-negotiation to do that and to make its service compatible with other tools such as other RDF browsers. So I pointed him to the How to Publish Linked Data on the Web draft document so that he can have a better understanding of the content-negotiation process.

Linking Open Data Community and Early Developers

Rob is certainly an early adopter of the Semantic Web. He is a developer that wants to solve problems with methods and technologies. He had the intuition that the DOAP ontology and other semantic web principles and technologies could help him to solve one of its problems. This intuition leaded him to discover what the semantic web community could do to help him.

It’s the kind of user we have to take care of; and that we have to help to release their projects. Its people like Rob that will make the Semantic Web a reality. Without such early adopters, from outside of the Semantic Web Community, the semantic web is probably doomed. We are there now; ready to help developers to integrate semantic web technologies into their projects; to generate their data into RDF and to link it with other data sources. It’s the goal of communities like the Linking Open Data Community and its what we are about to do.

News at Zitgist: the Browser, PTSW, the Bibliographic Ontology and the Query Service

July 31, 2007August 1, 2007 Frederick Giasson

It is not because we had some issues with the Zitgist Browser‘s server that things stopped at Zitgist. In fact, many projects evolved at the same time and I outline some of these evolutions bellow.

New version of the Zitgist Browser

A new version of the browser is already on the way. In fact, the pre-release version of the browser was a use case; a prototype. Now that we know that it works and that we faced most of the issues that have to be taken into account to develop such a service, we hired Christopher Stewart to work on the next version of the browser. He is already well into the problem now, so you could expect a release of this new version sooner than you could be expecting. At first, there won’t be many modifications at the user interface level, however, many things will be introduced in this new version that will help us to push the service at another level in the future.

New version of Ping the Semantic Web

The version 3.0 of the PingtheSemanticWeb.com web service should be put online next week. It will be a totally new version of the service. It won’t use MySQL anymore; Virtuoso has replaced it. The service will now fully validate RDF files before including them in the index. More stats will be available too. It is much faster (as long as remote servers are fast too) and I estimate that this only server could handle between 5 to 10 million pings per day (enough for the next year’s expansion). This said, the service will be pushed at another level and be ready for more serious traffic. After its release, a daily dump of all links will be produced as well.

The first draft of the Bibliographic Ontology

The Bibliographic Ontology Specification Group is on fire. We are now 55 members and generated 264 posts in July only. Many things are going on here and the ontology is well underway. We should expect to release a first draft of the ontology sometime in August. If you are interested in bibliographic things, I think it’s a good place to be.

The Zitgist Semantic Web Query Service

Finally, Zitgist’s Semantic Web Query Service should be available for alpha subscribed users sometime in September. You can register to get your account here. Also, take a look at what I wrote about vis-à-vis this search module (many things evolved since, but it’s a good introduction to the service).

Conclusion

So, many things are going on at Zitgist and many exiting things should happen this autumn, so stay tuned!

Zitgist Browser’s server stabilized

July 31, 2007 Frederick Giasson

Five weeks ago I introduced the Zitgist Browser on this blog. At that time, I talked about a pre-release of the service. These two little words probably helped to explain what followed in the following weeks.

In fact, some of you probably noticed that the Zitgist Browser was down half of the time for a couple of weeks. In fact, we found many issues at many levels that rendered the browser’s server unstable. In the last weeks, we performed a battery of tests to fix all issues that appeared. Now, about three weeks later, the server is back stable. At least, it has been online for the last couple of days without any issues.

Thanks to the OpenLink Software Inc. development team, we have been able to stabilize the service; and it wouldn’t have been possible without their help and expertise.

Finally, stay tuned for the next release of this service and continue to use it and report issues that you could encounter while browsing the semantic web (more information about the next version in the next blog post); and sorry about the possible frustrations you possibly had when you used the unstable version of the service.

Zitgist’s RDF Browser: Browse the Semantic Web

June 20, 2007June 20, 2007 Frederick Giasson

I am pleased to announce the pre-release of the Zitgist RDF Browser. This new tool from Zitgist will help users to browse the information available on the Semantic Web. As you will see bellow, this tool is a sort of information shape-shifter. Depending on the data available for a given Thing (a resource), it will shapes its user interface so that the data is best displayed for a better understanding of its semantic and for a better browsing experience.

This pre-release version is usable by anybody, however I would appreciate that you report any bug, issues or suggestions to me so that I can enhance the browser to meet people’s expectations.

Introducing Zitgist’s RDF Browser

The Templating system

The core of this new RDF browser is its templating system. This system will enhance the RDF browsing experience of users along with their understand of the information displayed to them. People can see it as a typical web browser such as Internet Explorer or FireFox, but instead of reading and displaying HTML, it display RDF data. Users only have to put the URI of a resource (it can be a URL where the browser can find RDF information about this Thing), then pressing the “browse” button.

Then, depending on the information available about this Thing, the RDF browser will shape its interface to optimize users’ browsing experience with the data.

Sources of data

Data displayed in the Zitgist RDF Browser can come from many different data sources:

Zitgist’s internal RDF datastore
URI dereferencing
On-the-fly conversation of data sources such as:
- Microformats
- RDFa
- eRDF
- HTML meta tags
- API data source such as: Amazon.com, Google Base, etc.

So, depending on what information is available for a given URI, the browser will mesh-up these data sources and displays the information to the user.

First example of the templating system

This first example shows how the browser will create a web page out of a RDF data source. In this case, the data source is a URI where Madonna’s latest album “Confession on a Dance Floor” is described.

The browser will check for that URI: http://zitgist.com/music/record/d7929b28-5812-4b8f-a99f-1800983c71fb
No information is available in its data store, so it will dereference the URI to get the RDF triples describing the album.
All in all, 15 different URIs will be dereferenced to create the web page.
The browser will detect that the type of the entity related to this URI is a mo:Album; so it will triggers the “moAlbum” template to skin the data source so that the user can easily see and understand the information we have about this resource (music album).
Then the skinned information is displayed to the user.

The templating system in action

Now we will see the templating system in action. In fact, the RDF browser does much more than skinning a single data source.

If you put that URI in the browser, you will see Sebastian’s profile. The browser will fire the foafPerson template, and his profile will be skinned according to this template.

However, what is interesting in that example is not only Sebastian’s profile, but the entities it links to. In fact, if you take a closer look and go down the page a little bit, you will notice the “Current projects” section of his profile. Then you will see a list of projects.

The first project is a musical group described as a foaf:Group. So, the browser will check the URI Sebastian’s profile link to, get information about it, skin it accordingly to the foafGroup template, and embed the result within Sebastian’s profile page.

Since we could embed such entities at infinitum, the browser restricts this automatic browsing to 3 deep levels in the graph.

Finally, we can “lookup” an individual embedded item by clicking on the lookup icon at the upper right corner of each entity.

Sidebar Navigator

In some cases some generated web page can be quite large, so a navigation widget has been developed to help users to navigate generated documents. The navigation of a document is based on the entities displayed in it.

For example, if we run the Zitgist RDF Browser for that URI: http://www.macosxhints.com, we will notice that information displayed is many pages long. So, to help us navigating this long document, we will use the entity navigator widget.

All the types available in that web page are listed in the sidebar, and for each type you have all the instances available.

In that example, you can easily browse the web feed of that web page. In a click, you can see all Posts, Feeds and Authors.

Interesting examples

There is a list of starting points to see the Zitgist RDF Browser in action:

http://www.macosxhints.com/
- Browsing a web feed converted into RDF.

http://swaml.berlios.de/doap.rdf
- The genetic template used to display the description of a doap:Project

http://homepages.cwi.nl/~ivan/AboutMe/CV/publist.rdf
- Ivan Herman’s list of publications.

http://b4mad.net/2006/05/30/googlegroups-sioc-dev.rdf
- Google group described using SIOC.

http://iswc2004.semanticweb.org/posters/metadata.rdf
- Poster abstracts of the ISWC2004 conference.

And all the examples above.

Bookmarklet

The Zitgist RDF Browser can process any URI. So, from any web page, a user can launch the browser to see what semantic web information is available for that URI. Then, all the information the browser can find/generate out of that data source will be displayed to the user.

To help users, I developed this really simple bookmarklet that get the URI of the current web page, send it to the browser, and then redirect the user to the browser’s generated page.

Zitgist RDF Browser’s Bookmarklet

Conclusion

As you noticed above, this new RDF browser is a sort of information shape-shifter. Depending on the information available for a given URI, it will skin it to make it easier to browse and understand for users.

Frederick Giasson

Machine Learning, Engineering & Data

Category: Past Projects

Ping the Semantic Web version 3: a brand new system!

1. Validated RDF resources

2. Simplified pings list export system

3. Faster pinging infrastructure

4. Brand new user interface

5. New statistics

Conclusion

Freshmeat.net now available in DOAP: 43 000 new DOAP projects

Freshmeat.net in DOAP

RDF dump

Content Negotiation

Linking Open Data Community and Early Developers

News at Zitgist: the Browser, PTSW, the Bibliographic Ontology and the Query Service

New version of the Zitgist Browser

New version of Ping the Semantic Web

The first draft of the Bibliographic Ontology

The Zitgist Semantic Web Query Service

Conclusion

Zitgist Browser’s server stabilized

Zitgist’s RDF Browser: Browse the Semantic Web

Introducing Zitgist’s RDF Browser

The Templating system

Sources of data

First example of the templating system

The templating system in action

Interesting examples

Bookmarklet

Conclusion