Archive for September, 2006

Talk Digger now support any language characters

 

   

Recently Talk Digger has been featured by a big Japanese blog named 100shiki and Internet.com Japan. It brought a lot of new users into the system. However, most of these users were not creating their profiles or they were not searching using English words. Most of them were interacting with the system in Japanese.

 

Talk Digger’s demography

In fact, many non-occidental people are using the system (around 50%): Japanese, Chinese, Middle East people, Taiwanese, Russian, etc.

I had to make sure that they could interact with Talk Digger in their own language without being frustrated by bugs related to their language characters.

   

 

Handling well the UTF-8 charset

I have done what I should have done before: making sure that all the characters manipulated by the system are encoded in UTF-8. What it means? That all the underlying systems had the UTf-8 charset as default, that all the functions I developed were manipulating UTF-8, that all the URLs I was playing with (Ajax) were also encoded in UTF-8, that all the data I had in the database was in UTF-8, etc, etc, etc. I would say that 90% of the system was right, but the remaining 10% was frustrating the user experience.

So I took the bull by the horns and I fixed everything.

Now you should be able to write anything, anywhere, in any language, and the system should support it without any problems.

You should be able to use a non-alphanumeric username, you should be able to write your password in Chinese, you should be able to search conversations, comments, etc. in Japanese or Russian, etc.

 

The next step

Seeing that about 50% of Talk Digger users are non-occidental people told me that I had to do something about it. It was the first step, now the next step is to create a multi-language version of the service. That way, even if a user doesn’t speak English, he will be able to interact with the interface in his own language.

 

Bugs

Please, report any bugs that you encounter related to that issue as soon as possible. Everything should work just fine, but we never know.

Technorati: | | | | | | | | charset

Implementing the SIOC v1.08 ontology into Talk Digger

 

Many months ago I choose to export Talk Digger’s entire dataset (and relations between that data) using RDF. At that time I had to choose some ontologies to use that would best fit to explicit relationships (semantics) between Talk Digger content. This is why I choose to use the FOAF and the SIOC ontologies. I needed to explicit the relationship between Talk Digger users (FOAF) and I needed to explicit the relationship between conversations (SIOC) and finally I needed to explicit the relationship between both users and conversations (SIOC and FOAF).

This document is about the implementation of the version 1.08 of the SIOC ontology, about its relation with FOAF documents, and finally it is about the semantic web as well.

 

New version of the SIOC ontology: v1.08

To create a good ontology you have to process by iteration: refining the ontology with testing and peer reviews.

Implementing an existing ontology in a system (such as Talk Digger) also has that process: generating a RDF file accordingly to the ontology and then trying to figure out how to link everything together (defining URI classes, defining resources, linking to resources, etc.) to optimize the graph (optimizing the relations betweens the resources to have as much meaning (easy to query) as possible).

This is also what happened since the last time I implemented the SIOC ontology into Talk Digger (about 4 months ago). Many changes have been made with the ontology since my last implementation, and it is the reason why I am re-implementing the ontology into the system and that I am refreshing this documentation.

 

Mapping SIOC classes and properties to Talk Digger functionalities

The first step is to map the SIOC ontology terms to the Talk Digger’s web site entities (functionalities, concepts, etc). The schemas bellow explicit the mapping I have done between the ontology and Talk Digger. At the left you have the SIOC ontology classes and properties (I only put the properties that create relations between classes. Properties like sioc:topic, sioc:description, etc. are not on that schemas to make it clearer). At the left you have the Talk Digger system. In the middle you have the relations between the SIOC ontology classes and Talk Digger entities.

 

 

[Click on the schemas for the Full View]

 

Description of the schemas

  • The sioc:Site instance is Talk Digger’s web site (talkdigger.com)
  • A sioc:Forum is a Talk Digger conversation page. I consider that a conversation page is a forum. Each time that a new URL is tracked by Talk Digger, then a new “forum” is also created. Forums are interlinked together, so if a url A and B are tracked by the system and that the web document at the url B links to the url A we will have: [sioc:Forum(A ) -- sioc: parent_of --> sioc:Forum(B )] AND [sioc:Forum(B ) -- sioc:has_parent --> sioc:Forum(A)]
  • A sioc: Post is a comment wrote by a Talk Digger user on a conversation page. So each time a user write a comment, a sioc: Post is created in the sioc:Forum.
  • A sioc:Topic is a tag used to track a conversation. Each time a user start tracking a conversation on Talk Digger, he has the possibility to tag it with some keywords. So each time a tag is used to describe a conversation, a sioc:Topic is created to describe the sioc:Forum and sioc: Post topics.
  • A sioc:User is a Talk Digger user. A Talk Digger user is defined by his unique username. The personal description of the sioc:User is related (via the rdfs:seeAlso property) to it’s FOAF profile (archived in the Talk Digger System).
  • Each time a conversation page is created in the system, a related sioc:Usergroup is also created. Each time a user start to track a conversation using Talk Digger, it also subscribe to the related sioc:Usergroup. So: [sioc:User(A) -- sioc:member_of --> sioc:Usergroup(conversation)]

 

Relations between conversations

Two sioc:Forum can be linked together if a url A and B are tracked by Talk Digger and that the web document at the url B links to the url A.

But what happen if the url A links to the url B too?

 

 

There is a circular loop in the model: both sioc:Forum are childs and parents.

In the context of Talk Digger, it tells us that A is part of the conversation started by B and B is also part of the conversation started by A. We could probably infer that A and B belongs to a set and that that set is the conversation.

 

sioc:reply_of and sioc:has_reply to recreate the course of events

The sioc:reply_to and sioc:has_reply of the sioc: Post class are really great in the context of Talk Digger (and blog comments) because systems will be able re-create the course of events, without needing dates, only by following the graph created by these relations.

 

 

 

Implementation using RDF/XML

Now that the mapping between the system (Talk Digger) and the ontology (SIOC) is done, what we have to do is to implement the ontology using RDF serialized in XML. What it means? It means that Talk Digger will export its dataset in RDF/XML according to the SIOC (and FOAF) ontology.

 

Implementation procedure using IsaViz

The tool I used to implement the SIOC and the FOAF ontologies in Talk Digger is a RDF editor/visualization tool called IsaViz.

The procedure was simple:

  1. Generating the RDF/XML files (accordingly to SIOC and FOAF) with Talk Digger’s content database.
  2. Importing the RDF/XML file in IsaViz.
  3. Visualizing and analyzing the resulting graphs.
  4. Checking all the relations between the resources and trying to figure out if it was possible to cut/add some of them to simplify/optimize the resulting graph.
  5. Checking all the anonymous nodes (bNodes) of the graph and checking if it was possible to relate them to an existing resource.
  6. Performing these 5 steps until I was satisfied by the resulting graph.

 

Playing with URIs and xml:base

What is great is that I can distribute Talk Digger’s content from anywhere on the Web (with different URLs) and a crawler can download all these snipped of content (FOAF profiles, conversations content and relationships, etc.), aggregate them and merge them in a unique RDF graph. That way they can have their hands on all the relations that exist in the Talk Digger and then querying it (the RDF graph) in useful and meaningful ways.

All that magic is possible by the fact that we can define a different URI for a given RDF/XML document using the xml:base attribute. That way I can:

  • Host a RDF/XML document at the URL http://talkdigger.com.com/a
  • Define the xml:base with the URI “http://talkdigger.com.com/db/”
  • Host a RDF/XML document at the URL http://talkdigger.com.com/b
  • Also Defining the xml:base with the URI “http://talkdigger.com.com/db/”

Then if a crawler downloads both RDF documents “a” and “b”, it can merge them to recreate the single RDF document defined at “http://talkdigger.com.com/db/”. By example, this merged RDF document would be the graph of all relations defined in Talk Digger.

 

Talk Digger’s URI classes

I refer to a “URI class” when I talk about a “part” of a URI that is general to many URI “instances”. I refer to an “URI instance” when I talk about a URI that refers to a resource.

By example, the “URI class” of Talk Digger users is:

http://www.talkdigger.com/users/

But an “instance” of that “URI class” would be the URI that describe a particular Talk Digger user:

http://www.talkdigger.com/users/fgiasson

In that example, this “instance” refers to a resource that is the Talk Digger subscribed user called “fgiasson”.

There is the list of “URI classes” defined in Talk Digger:

  • URI class referring to a conversation container (work as a container for the components of a conversation)

http://www.talkdigger.com/conversations/[url]

  • URI class referring to a conversation

http://www.talkdigger.com/conversations/[url]#conversation

  • URI class referring to a comment in a conversation

http://www.talkdigger.com/conversations/[url]#comment-x

  • URI class referring to a usergroup (a group of users tracking that conversation)

http://www.talkdigger.com/conversations/[url]#usergroup

  • URI class referring to a subscribed user

http://www.talkdigger.com/users/[username]

 

Visualizing relationship between Talk Digger users and conversations

It is now the time to visualize what is going on. What we will do is importing and merging some SIOC and FOAF documents into IsaViz directly from talkdigger.com

The example will be performed using two SIOC document files, and one FOAF document file.

 

Step 1

The first step is to get a conversation tracked by Talk Digger and to visualize it into IsaViz.

  1. Open IsaViz
  2. Select the “IsaViz RDF Editor”, click on the menu [File-> Import -> Replace -> RDF/XML from url...]
  3. Copy this url into the box that appeared: http://www.talkdigger.com/sioc/grazr.com
  4. Press enter

Now you can visualize the relationships of the conversation about Grazr.

Take a special attention to these following resources:

  • http://www.talkdigger.com/conversations/grazr.com#conversation
  • http://www.talkdigger.com/conversations/grazr.com#usergroup
  • http://www.talkdigger.com/users/fgiasson
  • http://www.talkdigger.com/foaf/fgiasson

Check how these resources are related to other resources (what are the properties that describe them).

 

Step2

Now it is the time to add some stuff in that graph. What we will do is merging the SIOC document of another conversation that is “talking about” this conversation.

  1. Select the “IsaViz RDF Editor”, click on the menu [File-> Import -> Merge -> RDF/XML from url...]
  2. Copy this url into the box that appeared: http://www.talkdigger.com/sioc/blog.grazr.com
  3. Press Enter

Now you can visualize the relationships between two conversations: Grazr and Grazr’s blog.

Take a special attention to these following resources:

  • http://www.talkdigger.com/conversations/grazr.com#conversation
  • http://www.talkdigger.com/conversations/blog.grazr.com#conversation

Check how these two resources are related together (the “blog.grazr.com” conversation is talking about the “grazr.com” conversation, so “blog.grazr.com” has a “parent” relation with “grazr.com”.

 

Step 3

Now it is the time to merge a FOAF document to that graph. That way, we will have more information about the user (fgiasson) that is interacting into these conversations.

  1. Select the “IsaViz RDF Editor”, click on the menu [File-> Import -> Merge -> RDF/XML from url...]
  2. Copy this url into the box that appeared: http://www.talkdigger.com/foaf/fgiasson
  3. Press Enter

Take a special attention to these following resources:

  • http://www.talkdigger.com/users/fgiasson
  • http://www.talkdigger.com/foaf/fgiasson

Check how a User (a person defined by his FOAF profile) is in relationship with his User Account (a user account on Talk Digger defined by a SIOC document).

 

Extending this method to any Talk Digger conservations

Above I explained how to visualize two conversations and a user profile using IsaViz. You have to know that this method can be use to visualize any conversations know by Talk Digger.

You only have to follow the same steps as I described above with other documents. If you check at the bottom of any web page of Talk Digger, you will see a “Semantic Web Ready” logo. At the right of this logo, you will have some icons that link to RDF documents available from that web page. So you only have to click on them, copy the URL of the document, and import it in IsaViz.

 

The big picture

All this belongs to a bigger schema. A couple of years ago, the Semantic Web was looking good on paper; now it is starting to look good on the Web.

As you can see in the schema bellow, RDF documents, SIOC and FOAF ontologies are just some stones belonging to the Semantic Web. The schema bellow is not the Semantic Web; it is a small portion of it; this is an example of how it is all working together: this is a sort of Semantic Web Mashup.

 

 

As described in one of my last blog post: Semantic Radar for FireFox and the Semantic Web Services environment, an infrastructure supporting the ideas behind the Semantic Web is starting to emerge.

The implementation of the SIOC ontology in Talk Digger is only a small step. Another small step is the development of the Ping the Semantic Web web service that aggregate and export lists of RDF documents to other web services and software agents. Other steps consist of the development of RDF data exporters like SIOC plug-ins for blogging systems, browser plug-ins like the Semantic Radar, etc.

 

The final word

In a recent discussion I had with Daniel Lemire, he wrote:

“Here is where we disagree:

“Everything is changing, and everything should explode… soon!”

I honestly do not see the Semantic Web being about to take off.”

Then I answered:

“So, will the semantic web explode in the next 5 years? My intuition tells me yes. Do I have a 6th sense like mothers? No. So, what the future reserve us? I hope it will be the semantic web (beware, I never said that we will be able to infer trusts, deploy search system with the power of Google, resolve the problem of the evolution of ontologies (versioning, etc), etc, etc, etc.) But I think that in 5 years from now, we will have enough data, knowledge, and services (that use that data) to say that we can do something useful (saving time, etc) so that we will be able to say: the semantic web is now living.”

I hope that I will be right and that Daniel will be wrong. I have the intuitive that Daniel hopes the same thing.

Technorati: | | | | | | | | | | | | |

The Semantic Web landscape is changing

 

Today I read two really interesting blog posts about RDF, ontologies and the Semantic Web. I’ll start with Daniel Lemire’s Do not ask me to be a keynote speaker on ontologies and inference engines. In his article, Daniel said:

 

“Before I become interested in anything that has to do with web ontologies, I need to be convinced that, at least, RDF is a useful idea. So, first take Tim Bray’s RDF challenge:

“To the first person or organization that presents me with an RDF-based app that I actually want to use on a regular basis (at least once per day), and which has the potential to spread virally, I hereby promise to sign over the domain name RDF.net.”

But see, the rdf.net domain name is still down. Tim Bray, who can be seen as one of the initiators and early promoters of RDF, is still waiting for a useful RDF application. So am I.”

 

The challenge is not an easy one, but I don’t think it is an impossible one. After all, “Things are only impossible until they’re not”. As we will see later in this article, things are changing and such an application could be possible in a somewhat near future. At least, me and many other people are working toward that goal.

 

Semantic Web based research isn’t working

After that, I read Ingrid Und Leo’s blog post: quote: semantic web based research isn’t working

 

“1. Researchers need to stop thinking of themselves as researchers and start thinking of themselves as implementors.

2. Research institutes need to join forces with emerging businesses looking to adopt semantic technology. This breaks the current model of business / research institute collaboration since startups do not have money to contribute to fund research, but tough noogies.

3. Researchers need to build their tools in real-world development environments, i.e. as modules for LAMP web-publishing tools such as Drupal and WordPress. They need to find more organizational partners to deploy their solutions. They need to do something other than build widgets.”

 

Update 18-09-2006: A much better answer from Harry Chen: Struggling with the Semantic Web

 

From researchers to implementers

At the best of my knowledge, I totally agree with both articles.

The proof that both RDF and web ontologies are useful is yet to be done.

The good news is that that RDF “researchers” are becoming more and more RDF “implementers”. If I base my observations on the SIOC ontology development team, I can see that all the improvements to the ontology are made after implementations of the ontology into blogging softwares like WordPress or in community portals like Talk Digger or ODS and also with its interaction with other ontologies like FOAF, DC, etc.

They also work hard to push SIOC’s adoption by content creators around the Web. Right now, they developed exporter plugins for blogging systems, I implemented the SIOC ontology in Talk Digger, OpenLinks implemented the SIOC ontology in their ODS solution, I created a pinging system to aggregate and export RDF files to crawlers and software agents, and much more stuff have been done as John Breslin enumerate on his blog.

So, as you can see, the biggest part of their work is not as researchers, but as implementers.

 

The landscape is changing.

The landscape of the Semantic Web is changing. We are currently at a crucial moment in the Semantic Web’s development: from an academic goodie to a commercial venture.

More and more web developers and companies start to see how all these technologies and techniques could increase the power of their software, their infrastructure, etc. The problem is that they hadn’t the data to play with. It was impossible for them to justify such an investment considering that there were not any structured data (RDF) available on the Web.

Everything is changing, and everything should explode… soon!

Everything is in place; we had to have a context in place before hoping to see the semantic web appearing on the Internet. We had to reach a critical mass of RDF data, so people would see the need to start developing applications using that structured data. But we also had to reach a critical mass of applications so people would see the need to export their content in RDF. You see the pattern? We need both to see both happening.

 

Early adopters

The only hope we have is to get a critical mass of early adopters, and it is what we are slowly reaching. The early adopters will create applications, ontologies and data. According to Swoogle, they already aggregated about 1.7 millions of RDF documents. Ping the Semantic Web reached 32 000 in less than a month, without crawling the Web as Swoogle does. All these stats and projects tell me that we can hope to see real semantic web applications in a near future.

 

The Semantic Web Revisited

After writing this article, I checked my list of things to read. An item I put in my list about two months ago has attracted my attention. Harry Chen wrote about the The Semantic Web Revisited article. Read this article to know at which milestone the Semantic Web currently is. It covers all the topics around the Semantic Web, and some of them, like the importance of early adopters to help the Semantic Web to be widely adopted, are in direct relation with what I said in that article.

Technorati: | | | | | |




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about my semantic Web researches and related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 69 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN