Frederick Giasson – Page 40 – Machine Learning, Engineering & Data

Ping the Semantic Web and its future SPARQL endpoint

November 17, 2006November 17, 2006 Frederick Giasson

Soon enough I’ll add a SPARQL endpoint to the Ping the Semantic Web service. What it means?

It means that anybody will be able to send SPARQL queries (SPARQL look-like the SQL query language but is used to query RDF graphs) to retrieve information from the RDF documents know by the web service. As soon as someone ping pingthesemanticweb.com with a RDF document’s URL, other people will be able to search it using the SPARQL endpoint.

How it will work?

Users will have access web interface where they will be able to write and send SPARQL queries to the triple store (this is the name given to the type of database systems that archive RDF graphs)

For example, they will be able to send queries like:

SPARQL
PREFIX sioc: <http://rdfs.org/sioc/ns#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT *
WHERE
{
GRAPH ?graph
{
?s rdf:type sioc: Post
}
}

That query to the triple store will return all the resources (things) that have been described (type) as a sioc: Post (a blog post, a forum post, etc.)

How to visualize the triples store?

Creating this SPARQL endpoint will be somewhat easy to do. In fact, the structure will remain the same but we will add one new server: a SPARQL endpoint that gives access to a RDF triple store.

There is how one could imagine how triple store works:

Figure 1

Figure 2

If we take a look at the schemas, each RDF document is a graph in itself. A RDF graph is composed of the relations between resources <subject , verb, object>. For example a relation could be <peter , hair-color, brown> (so Peter’s air color is brown (so the resource “Peter” has the property “hair-color” brown)).

With the triple store, we have the possibility to merge two RDF graphs together. That way, we create a sort of meta-graph with all the relations between one graph and the other.

This is where things are getting interesting.

Ping the Semantic Web’s graph will be created by merging the graph of each RDF documents it knows (via pinging).

That way, users will have the possibility to search this sort of Meta-Graph of relationship between resources by querying it using SPARQL queries.

We could possibly talk about the semantic web in a nutshell.

Virtuoso to create the RDF triple store

I’ll use a database management system called Virtuoso to create this RDF triples store.

A first prototype version

Consider the first version of the triple store as a prototype. In fact, the RDF triple store feature of Virtuoso is relatively new. It is always in development and some things have to be created (to enhance the functionality) and upgraded. However, it is perfect for a couple of hundred of millions of triples (relations), but when we will reach the billion of triples, it is possible that some queries to the system will become unworkable. At that time, I’ll possibly be obligated to restrict users’ requests possibilities to ensure that the system will always be working at its full potential.

In any case, the triple store and the SPARQL endpoint will “live” on another server, so the performance of the current pinging system will not be affected by the performance of the endpoint, they are two totally different entities in our system.

Why a triple store with a SPARQL endpoint?

At first: for research and education purposes. People will have the possibility to query a system that aggregate RDF documents “from the wild”. Eventually, such initiative could lead to more interesting technologies development (user interface, anything) that could be used by a broader range of people.

Having this system in hands, one could search the triple store to extract statistics on the RDF documents it knows for research purposes.

Also, it is a way for OpenLink to debug, upgrade and enhance its service that will ultimately benefit to everyone (since an open source version of Virtuoso is available).

Conclusion

Keep me in touch if you have any thoughts about that new development with the Ping the Semantic Web service.

How to participate to the Web 3.0 using your blog: participating to the Semantic Web to enhancing your blog visibility

November 14, 2006November 14, 2006 Frederick Giasson

Do you like my catchy title (Update: okay I agree with Danny: “Web 3.0 love secrets of the French” is a more catchy title)? A little bit ironic considering all the brouhaha (1) (2) (3) (4) (5) (6) (and a way to much more) that generated this New-York Times article wrote by John Markoff. Web 3.0… semantic web… semantic web 3.0… call it what you like, I don’t really care: really. What is fantastic is that more and more people get interested in what many people are working on since about 12 years: the Web of Data.

Without caring about all the recent hype (and misunderstanding) it recently got, some people could ask themselves about how they could easily participate to the idea of the Semantic Web: the Web of Data.

Is it possible for the common of mortals? Yeah, even my mom could (at least if she had a blog).

If you have a blog, you can easily participate to the semantic web by installing a simple add-on to your blog system and by starting pinging a server called Ping the Semantic Web each time you publish a new blog post.

The idea here is to get the articles you wrote (and will write) and publish them on the web not as a web page, but as a document for the semantic web. You can see the Web like that:

At top, you have a source of data: the articles you wrote on your blog for example.

Then with that same source of information, you can participate to two different Webs:

At the left, you have the “web of humans”: the Web that can easily be understands by humans when they take a look at the screen. This is your blog.
At the right, you have the “web of machines”: the Web that can easily by read and processed by machines. This is another version of your blog but for machines.

Well, it seems complex, so how the hell my mom is supposed to be able to participate to the semantic web?!?!?!?

Easy, In a hypothetical World, my mom is using: WordPress for her blog on cooking, Dotclear for her blog about design, b2Evolution for her family blog and Drupal for her new French mothers` community website.

The only thing she has to do is to install one of the add-on available for each of these blogging systems.

The instructions to install the add-on on WordPress are simples:

1. Copy the following files to the WordPress wp-content/plugins/ directory:

2. Enable “SIOC Plugin” in the WordPress admin interface (Admin -> Plugins -> action “Activate”)

For Dotclear, the installation package can be found here, and the source code of the add-on can be found here.

For b2Evolution: Copy the following files to the /xmlsrv/ directory of your b2Evolution installation folder:

For the Drupal add-on, all the information can be found here.

As soon as she installed these add-ons, she started to participate to the semantic web.

Why people should take the time to install these add-ons? What is the advantage?

Increasing the visibility of your blog

By doing so, you are exposing your blog`s content to many other web crawlers ( web crawlers of a new generation, propelled by the adoption of the semantic web).

From that point, you only have to ping a new pinging service called Ping the Semantic Web to make sure that your blog is visible to these new web services. The process is the same as pinging weblogs.com or technorati.com for your web feed (RSS or Atom), but you are pinging pingthesemanticweb.com: a specialized pinging service for the semantic web.

Doing that helps you to increase your visibility on the Web.

How can you setup your blog system to automatically ping this pinging service?

Simple, the process is the same for each system described above. By example, if you are using WordPress you only have to:

Log into your WordPress Dashboard
Select Options
Then select the Writing tab
Near the bottom you should see a space labeled “Update Services”: Add “http://rpc.pingthesemanticweb.com/” on a new line in this space
Finally press the Update Options button

So, you only have to make your system pinging http://rpc.pingthesemanticweb.com/

Conclusion

In two simple steps (1) installing an add-on and (2) adding a service to ping, a blogger can get more visibility for his blog and can start to participate to the semantic web.

Discussion about mime types: changing RSS 1.0 mime type and other considerations

November 10, 2006 Frederick Giasson

Recently I serialized the SIOC and FOAF RDF documents generated by Talk Digger using N3. I also enabled Ping the Semantic Web to detecting and archiving pings of RDF documents serialized using N3.

These two modifications to these systems make me thinking about some things:

Why I wasn’t using the “application/rdf+xml” mime type to describe the RSS 1.0 web feeds generated by Talk Digger?
Why I was not serializing the RSS 1.0 RDF documents using N3 too?
Why the mime type for the N3 serialization is “text/rdf+n3” instead of “application/rdf+n3”?

Why not using the “application/rdf+xml” mime type to describe the RSS 1.0 web feeds?

To try to answer to this question I had to re-read the RSS 1.0 specification last modified the 30 May 2001. I can read from the “section 5: Core Syntax” of the document:

Mime Type
The current mime-type recommendation for an RSS 1.0 document is application/xml. However, work is currently being done to register a mime-type for RDF (and possibly RSS). The RDF (or preferably RSS) mime-type should be used once it has been registered.

Then I was thinking: the application/rdf+xml mime type has been accepted in September 2004 by the IANA.

So I propose to change the specification accordingly to this fact. RSS 1.0 files are RDF documents, so we should reflect that fact in the specification by using the good mime type.

Also, if other developers, like me, use the application/xml type instead of the application/rdf+xml, web services like Ping the Semantic Web will only ignore these precious pieces document.

Why not serializing the RSS 1.0 RDF documents using N3 too?

It is a good question that I don’t know the answer right now.

It is because it is because N3 is unsuitable for RSS 1.0? Is it because N3 is not enough popular among developers? Is it because the RSS 1.0 specification is too old?
Personally I think that RSS 1.0 could benefit by adding a reference to the possibility to serialize RSS 1.0 documents using N3 and not only XML.

So I would propose to add the fact that people could have the possibility serialize RSS 1.0 documents using N3 with the mime type “text/rdf+n3” (even if I would certainly prefer “application/rdf+n3” but I will come back to this issue with the next question).

Why the mime type for the N3 serialization is “text/rdf+n3” instead of “application/rdf+n3”?

I checked the Notation 3 design issue document to answer to that question. The reason is:

The type application/n3 was applied for at one point (2002?) but I have no trace of any correspondence. It should not be used, as part of the point of N3 is to be human readable, and so the text tree is indicated. The application for text/rdf+n3 with the IANA registry is pending as of 2006-02 as IANA #5004. While registration is pending, applications should use the proposed type in anticipation of registration, not an x- type.

In the Notation 3 Primer document I can read:

The world of the semantic web, as based on RDF, is really simple at the base. This article shows you how to get started. It uses a simplified teaching language — Notation 3 or N3 — which is basically equivalent to RDF in its XML syntax, but easier to scribble when getting started.

If this notation is simpler for teaching purposes, this notation is probably also simpler for development purposes too (at least I found so). For that later fact, I think it would be important to consider it when we think about mime types.

In fact, it seems that the N3 document have been designed for teaching purposes because it is simpler to express RDF relations using N3 than XML. I agree.

However, I think that the semantic web community will benefit from that fact not for teaching, but for developing purposes (so spreading the use of RDF as a way to describe resources).

But I see a paradox in the use of “text/rdf+n3” mime type instead of “application/rdf+n3”. The reason is that “N3 is to be human readable”. If we extend that reasoning, we could certainly say that XML is to be human readable too (at least I am able to read and understand some).

My question is: are RDF documents at the intention of the human or the machine? I always saw RDF documents as a document at the intention of machines. In that case, for me, both serializations are at the intention of the machine, and not really at the intention of human. In fact, I think that the question we have to ask is “Who will consume that document?” instead of “Is the document human readable?” So yeah the content is human readable but it is to be used by machines.

If one agrees with that fact, we should certainly think about using the “application/rdf+n3” mime type instead of “text/rdf+n3”, no? After all, are mime types at the intention of humans or machines?

Conclusion

Finally I suggest updating the RSS 1.0 spec with the “application/rdf+xml” mime type and I suggest adding a reference to the possibility to use N3 to serialize RDF 1.0 RDF documents. Also I (re?)-open the discussion about the use of “text/rdf+n3” mime type (instead of application/rdf+n3).

Please tell me if I missed something while thinking about these things, if there are considerations that I am not aware of, or anything else.

Technorati: Semantic | web | rdf | rss | xml | n3 | mime | type | human | machine |

Show your relations with other web sites directly on your Blog using Talk Digger and Grazr

November 5, 2006 Frederick Giasson

What about showing the relationship your blog, or web page, has with other websites? Why not using the power of Talk Digger and the beauty of Grazr to let your readers discovering people that talks about you, and the people you are talking about?

This is what Talk Digger and Grazr are proposing you to do.

What is Grazr?

Grazr is a OPML and RSS outliner: it lets you browse these type of file in a simple and beautiful user interface directly from a web site.

You have three view modes: slider, outliner and three panes. It is simple, fast and it integrates beautifully in any blog or web page.

What are Talk Digger relations?

Talk Digger not only tracks conversations evolving on the Web. No, it also explicit relations between conversations (so relations between web pages).

Three type of relations are make explicit by Talk Digger:

Web pages that are talking about the current Web page.
Web pages, from the same domain name, that are talking about the current Web page.
Web pages that the current Web page is referring to.

Talk Digger and Grazr

If you put Talk Digger and Grazr together, you will be able to browse effortlessly Web sites by their relationship.

Why adding Talk Digger’s Grazr widget on your blog or Web site?

Blog readers like reading blogs not only because they like what the blog author writes, but also because they can discover new things of interests and new people by the links created by the author.

This is why putting Talk Digger’s Grazr widget on your blog is really interesting: it helps your blog readers to discover who links to your blog, and to whom you are linking to. In both cases, these links are of interest to your readers and it will helps them to discover new and interesting things on the Web.

What is also important in the Digital World is your online reputation and trust people have in you. Their readers can trust people if they write with their real name, if they put their photo, if they write about them in a personal way, if they write about their job, etc. But the online reputation also grows when other people start to talk about you, when they start to link to your personal web site. Showing these relations between you (your blog or personal web site) can help you to create your online persona and increase your online reputation and the trust people have in you.

How to get your Talk Digger – Grazr widget for your blog?

It is simple. You only have to go to the Talk Digger – Grazr widget generator page.

From that web page, you only have to:

Put the URL of your blog or web page in that box: “To create your own TalkDigger Grazr enter your site URL here:” and then pressing the “update” button.
Going to step #2 and customizing the look-and-feel of the widget.
Finalizing with the step #3 and putting the generated code in your blog or web page.

Talk Digger now serialize its SIOC and FOAF RDF documents using N3

November 4, 2006 Frederick Giasson

A couple of weeks ago I make Ping the Semantic Web detecting and indexing RDF documents serialized using N3. Now I took a part of yesterday to serialize Talk Digger’s content using N3 as well.

So Talk Digger now export most of the relations it knows in RDF using 10 ontologies: SIOC, FOAF, GEO, BIO, DC, CONTENT, DCTERMS, DC, ADMIN, RSS and serialized with two languages: XML and N3.

Check at the bottom of each conversation page, or user page, and you will see SIOC and FOAF RDF documents serialized in both XML and N3.

I started to play with N3 serialization when I implemented it in Ping the Semantic Web. At first I was telling me: why another serialization method, why confusing users and developers with yet another way to write things?

Then I found my answer: N3 is basically a simplified teaching language used to express RDF documents (so, to serialize) developed by Sir Tim Berners-Lee. Once you get the basis of the language, you can easily read and write RDF documents in an elegant way. The parsing of N3 documents is much easier than its counter part (XML).

This serialization language gain to get know and its adoption would certainly encourage the usage of RDF by the fact that developers could concentrate their efforts on the RDF documents instead of the way they are serialized (there are so many ways to serialize something in RDF using XML; sometime I wonder if it is bounded and boundless…).

There are some links to getting started with N3:

Primer: Getting into RDF & Semantic Web using N3
Notation 3: An readable language for data on the Web
Turtle – Terse RDF Triple Language