November 2006 – Frederick Giasson

First web service developed using Ping the Semantic Web data

November 30, 2006 Frederick Giasson

A couple of days ago, the first web service(that I am aware of) developed using Ping the Semantic Web data has been made public.

Doap:store is a DOAP (Description Of A Project) documents search, browsing and visualization tool. It has been developed by Alexandre Passant, one of the most active contributors of the SIOC ontology.

From Alex’s Blog, Doap:Store is:

Then, doap:store provides a common search engine and browsing interface for these decentralized project description, while authors keep control over their data. Data is updated each time PTSW has a new ping for it (in the future, PTSW should store new pings only if the document has changed, so updated will be made only for real document updates).

That is it: a web service helping people to find projects depending on some criteria. At the time I wrote this article, the web service focus mainly on computer projects involving programming languages. If you check at the bottom of the page, you will see categories of projects related with the programming language(s) involved in the project.

These categories are dynamically generated depending on the DOAP documents it aggregates. So as soon as new document from Ping the Semantic Web are aggregated by Doap:store, they will have an impact on this feature of the service depending on the languages defined for these projects description.

Doap:store and Ping the Semantic Web

This new web service is the perfect example of the utility of Ping the Semantic Web. With data in hands, a developer can create wonderful systems and user interface to manipulate, manage and search that data.

In this example, Alex didn’t had to care about where to find DOAP documents, he only had to retrieve the list of latest created/updated DOAP documents from Ping the Semantic Web service (he do that each hour).

Keeping control over its data

This is one of the more important observations of Alex. Such an information infrastructure let the user doing what he wants with the data he generates.

This is the idea I had in mind when I wrote a blog post called “Communities’ websites should use FOAF profiles to help users managing their online persona” and this is the reason why I create an import/export feature for users’s profiles using FOAF document: to give back the power to users.

Conclusion

Doap:store really shows how Ping the Semantic Web can be use by any developer. I hope other people will start to use the service the same way Alex does. Ultimately I hope that Ping the Semantic Web will become a vector of development for semantic web projects.

Ping the Semantic Web: a new pings exportation feature

November 28, 2006 Frederick Giasson

The “pings exportation” feature of Ping the Semantic Web was a little bit messy and I was really not satisfied with it. So I took the time to re-work it and I think I came up with something much better (probably something that people were expecting from the beginning).

The new way to request pings

The new way to request a list of pings from Ping the Semantic Web is quite simple. You have a set of pings (all pings received by the service so far) and you apply constraints on that set to get the subset of pings you really want for your application.

There are 7 different constraints you can apply:

Constraint pings for a specific type of RDF document: SIOC, FOAF, DOAP, RDFS or OWL
Constraint pings for a specific serialization language: XML or N3
Constraint pings for a time frame: last hour, yesterday or any time
Constraint pings with a number of results: 0 to x
Constraint pings for a specific domain name, example: getting all the pings from www.talkdigger.com
Constraint pings for a specific namespace, example: getting all the pings where the namespace “http://purl.org/dc/elements/1.1/

This new method is much more powerful. That way you can easily get a specific subset of pings for the specialized needs of your web services or software agent.

The new way to handle namespaces

Reworking this feature leaded me to rework the way Ping the Semantic Web was handling namespaces.

Now all the namespaces of the RDF documents aggregated by the service are aggregated by the service as well.

This means two things:

You can get RDF documents defining a specialized namespace
You can take a look at the list of namespaces know by Ping the Semantic Web

For the moment the service know about 400 namespaces, but it is discovering them at a rapid pace.

Conclusion

I am stabilizing the system right now and the redevelopment of this feature was resulting from that stabilization. All my updates are mostly finished and soon enough a first version of a SPARQL endpoint (and user interface) should be publicly available.

Ping the Semantic Web: call for names of web services exporting RDF documents

November 23, 2006 Frederick Giasson

More and more web services start to export some of their data archived in their databases using RDF. Some of them have a specific goal in mind, others only do it “in case of” that someone would need it (like livejournal.com does). In any case, these RDF documents are waiting here, somewhere on the Web, waiting to be read and used.

Ping the Semantic Web’s goal is to act as a central point in that environment: aggregating these RDF documents and then sending them to other services (softwares) that need them.

More and more people are starting to ping the service. It now gets about 5000 ping requests each day and its constantly growing.

In the last couple of days, I contacted some people that are developing systems that export RDF data. I asked them two things:

Would it be possible for you to make your system pinging Ping the Semantic Web each time it creates or updates a RDF document?
Would it is possible for you to send me a list of URLs where Ping the Semantic Web could find the existing RDF documents generated by your service?

Today I am asking your help:

If you know a web service that export RDF documents, would it be possible for you to contact me with the name of this service?

Then what I’ll do is contacting them to ask them the two questions. What I’ll also do is helping them (technically) to implement the feature into their system if they encounter any problem.

Currently the service know about 57 000 RDF documents. I predict you that in one year it will knows millions of RDF documents. And what I hope is that many of them will not be serialized using XML but using N3 instead (but I should certainly wait after a wider adoption of SPARQL before seeing that happening).

People I already contacted:

D2R Server publishing the DBLP Bibliography Database (Richard sent me 1.2 million of URL to crawl and he should start to ping PTSW in January (yeah, this is partly why I said that I would had millions of RDF document 😉 )
Revyu (Tom should start to ping PTSW soon)
FOAFMap
FOAFNaut
Geonames (Just when I was about to publish this article, Marc contacted me telling me that Geonames now pings PTSW each time a geoname changes or is created. Also, he send me a list of 6.2 million of geonames to include).
Tribe
Semantic Media Wiki

Conclusion

Finally what I am requesting is help from people to try to find as many web services that export data using RDF as possible. That way everybody will benefit from it: they will increase the visibility of the data they are generating and PTSW will see its database of RDF document growing and growing for the benefit of the community.

Technorati: Pingthesemanticweb | semantic | web | web3.0 | rdf | xml | n3 | foaf | sioc | doap | rdfs | owl |

Ping the Semantic Web and its future SPARQL endpoint

November 17, 2006November 17, 2006 Frederick Giasson

Soon enough I’ll add a SPARQL endpoint to the Ping the Semantic Web service. What it means?

It means that anybody will be able to send SPARQL queries (SPARQL look-like the SQL query language but is used to query RDF graphs) to retrieve information from the RDF documents know by the web service. As soon as someone ping pingthesemanticweb.com with a RDF document’s URL, other people will be able to search it using the SPARQL endpoint.

How it will work?

Users will have access web interface where they will be able to write and send SPARQL queries to the triple store (this is the name given to the type of database systems that archive RDF graphs)

For example, they will be able to send queries like:

SPARQL
PREFIX sioc: <http://rdfs.org/sioc/ns#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT *
WHERE
{
GRAPH ?graph
{
?s rdf:type sioc: Post
}
}

That query to the triple store will return all the resources (things) that have been described (type) as a sioc: Post (a blog post, a forum post, etc.)

How to visualize the triples store?

Creating this SPARQL endpoint will be somewhat easy to do. In fact, the structure will remain the same but we will add one new server: a SPARQL endpoint that gives access to a RDF triple store.

There is how one could imagine how triple store works:

Figure 1

Figure 2

If we take a look at the schemas, each RDF document is a graph in itself. A RDF graph is composed of the relations between resources <subject , verb, object>. For example a relation could be <peter , hair-color, brown> (so Peter’s air color is brown (so the resource “Peter” has the property “hair-color” brown)).

With the triple store, we have the possibility to merge two RDF graphs together. That way, we create a sort of meta-graph with all the relations between one graph and the other.

This is where things are getting interesting.

Ping the Semantic Web’s graph will be created by merging the graph of each RDF documents it knows (via pinging).

That way, users will have the possibility to search this sort of Meta-Graph of relationship between resources by querying it using SPARQL queries.

We could possibly talk about the semantic web in a nutshell.

Virtuoso to create the RDF triple store

I’ll use a database management system called Virtuoso to create this RDF triples store.

A first prototype version

Consider the first version of the triple store as a prototype. In fact, the RDF triple store feature of Virtuoso is relatively new. It is always in development and some things have to be created (to enhance the functionality) and upgraded. However, it is perfect for a couple of hundred of millions of triples (relations), but when we will reach the billion of triples, it is possible that some queries to the system will become unworkable. At that time, I’ll possibly be obligated to restrict users’ requests possibilities to ensure that the system will always be working at its full potential.

In any case, the triple store and the SPARQL endpoint will “live” on another server, so the performance of the current pinging system will not be affected by the performance of the endpoint, they are two totally different entities in our system.

Why a triple store with a SPARQL endpoint?

At first: for research and education purposes. People will have the possibility to query a system that aggregate RDF documents “from the wild”. Eventually, such initiative could lead to more interesting technologies development (user interface, anything) that could be used by a broader range of people.

Having this system in hands, one could search the triple store to extract statistics on the RDF documents it knows for research purposes.

Also, it is a way for OpenLink to debug, upgrade and enhance its service that will ultimately benefit to everyone (since an open source version of Virtuoso is available).

Conclusion

Keep me in touch if you have any thoughts about that new development with the Ping the Semantic Web service.

How to participate to the Web 3.0 using your blog: participating to the Semantic Web to enhancing your blog visibility

November 14, 2006November 14, 2006 Frederick Giasson

Do you like my catchy title (Update: okay I agree with Danny: “Web 3.0 love secrets of the French” is a more catchy title)? A little bit ironic considering all the brouhaha (1) (2) (3) (4) (5) (6) (and a way to much more) that generated this New-York Times article wrote by John Markoff. Web 3.0… semantic web… semantic web 3.0… call it what you like, I don’t really care: really. What is fantastic is that more and more people get interested in what many people are working on since about 12 years: the Web of Data.

Without caring about all the recent hype (and misunderstanding) it recently got, some people could ask themselves about how they could easily participate to the idea of the Semantic Web: the Web of Data.

Is it possible for the common of mortals? Yeah, even my mom could (at least if she had a blog).

If you have a blog, you can easily participate to the semantic web by installing a simple add-on to your blog system and by starting pinging a server called Ping the Semantic Web each time you publish a new blog post.

The idea here is to get the articles you wrote (and will write) and publish them on the web not as a web page, but as a document for the semantic web. You can see the Web like that:

At top, you have a source of data: the articles you wrote on your blog for example.

Then with that same source of information, you can participate to two different Webs:

At the left, you have the “web of humans”: the Web that can easily be understands by humans when they take a look at the screen. This is your blog.
At the right, you have the “web of machines”: the Web that can easily by read and processed by machines. This is another version of your blog but for machines.

Well, it seems complex, so how the hell my mom is supposed to be able to participate to the semantic web?!?!?!?

Easy, In a hypothetical World, my mom is using: WordPress for her blog on cooking, Dotclear for her blog about design, b2Evolution for her family blog and Drupal for her new French mothers` community website.

The only thing she has to do is to install one of the add-on available for each of these blogging systems.

The instructions to install the add-on on WordPress are simples:

1. Copy the following files to the WordPress wp-content/plugins/ directory:

2. Enable “SIOC Plugin” in the WordPress admin interface (Admin -> Plugins -> action “Activate”)

For Dotclear, the installation package can be found here, and the source code of the add-on can be found here.

For b2Evolution: Copy the following files to the /xmlsrv/ directory of your b2Evolution installation folder:

For the Drupal add-on, all the information can be found here.

As soon as she installed these add-ons, she started to participate to the semantic web.

Why people should take the time to install these add-ons? What is the advantage?

Increasing the visibility of your blog

By doing so, you are exposing your blog`s content to many other web crawlers ( web crawlers of a new generation, propelled by the adoption of the semantic web).

From that point, you only have to ping a new pinging service called Ping the Semantic Web to make sure that your blog is visible to these new web services. The process is the same as pinging weblogs.com or technorati.com for your web feed (RSS or Atom), but you are pinging pingthesemanticweb.com: a specialized pinging service for the semantic web.

Doing that helps you to increase your visibility on the Web.

How can you setup your blog system to automatically ping this pinging service?

Simple, the process is the same for each system described above. By example, if you are using WordPress you only have to:

Log into your WordPress Dashboard
Select Options
Then select the Writing tab
Near the bottom you should see a space labeled “Update Services”: Add “http://rpc.pingthesemanticweb.com/” on a new line in this space
Finally press the Update Options button

So, you only have to make your system pinging http://rpc.pingthesemanticweb.com/

Conclusion

In two simple steps (1) installing an add-on and (2) adding a service to ping, a blogger can get more visibility for his blog and can start to participate to the semantic web.

Frederick Giasson

Machine Learning, Engineering & Data

Month: November 2006

First web service developed using Ping the Semantic Web data

Ping the Semantic Web: a new pings exportation feature

Ping the Semantic Web: call for names of web services exporting RDF documents

Ping the Semantic Web and its future SPARQL endpoint

How to participate to the Web 3.0 using your blog: participating to the Semantic Web to enhancing your blog visibility