Frederick Giasson

How Talk Digger should evolve in the next months

September 4, 2006 Frederick Giasson

Talk Digger will slowly roll out. What it needs if more content, more users and more interaction (relation between users, comments, conversation pages crawling by other search engines like Google, MSN Search, Yahoo! (it already started slowly). Then more patterns would emerge, conversations will build up, and people will get in contact.

However critics of Talk Digger are right: it is not perfect; it lacks indexed data and few bad results popup from time to time into conversations.

Validating the vision

Critics are right, but at the same time people are slowly validating the vision I had, the seed idea underlying the current implementation of Talk Digger.

When I read other people articles about this new service, I find that they know what it is all about, how it could help them in their day-to-day work and ultimately how it could be enhanced to meet their specific needs.

Current state of the system

I started to check how people were using the new features of this new system. As expected I have far more page views with this new version than the previous one. People search, browse and track conversations.

One thing that surprised me is that far more people than I initially thought were using the “Related conversation” tab of each Conversation Page.

I thought that I was alone, but it seems that many people use it not only to find out the relations between conversations (so relations between web pages, and ultimately people) but also to find new and interesting stuff from a starting point. So the process of browsing conversations using their relations is becoming the process of finding new and interesting stuff.

Improving the system

Considering that the vision I had of the system is slowly validated by other people’s reviews, a lot of stuff will come in the next months.

Crawling for more data

One of the biggest update is also the less time consuming (in the point of view of development time): indexing more and more data into the system (so crawling more and more conversations). It will be done by itself, as time go, with users’ searches and tracks.

Parsing out bad results

Another big improvement of the system will be to parse out bad results (irrelevant links from a web page to another web page (conversation)). It will enable the system to check the context of each link, considering their releventness, rating them accordingly and ultimately refusing to index it into the system if it is too irrelevant.

Automatic conversation topics finder (auto-tagging)

Many users are searching for tags. The problem is that there are few considering that the system is young, so no many conversations are “tracked” by users (because tags came from tags term used by users when they start tracking a conversation).

So the system could eventually extract conversation topics. That way, the “search tag” feature could become a “search by topics”.

Ultimately, these topics could use to cluster search results and find new relationship between conversations (relationships via topics instead of document links).

The future

This is only a part of the future; other ideas are incubated in my mind. It is just the beginning as we know it. Much more is to come…

New Talk Digger website now Online

August 30, 2006 Frederick Giasson

I talked a lot about it in the past few months, now it comes: the new generation of Talk Digger is now publicly available. I just upgraded it from closed alpha testing to public beta testing. I will monitor it and fix bugs as they will come in the next days and weeks. This version is much more stable than one month ago, but is probably less than in a month. I choose to make it public because I needed more people using it, so there we are.

Use it, play with it and please report anything that doesn’t seem right here.

I don’t want to talk much about it on my blog right now. What I want is looking at how people will use it, what they have to say, etc. So I open this thread to get comments from users (via comments on this blog post, email or Skype), talk with them, knowing their feelings about it, etc.

Hope you like what you will see.

Ping the Semantic Web: exporting pings list by Namespaces

August 23, 2006 Frederick Giasson

I just finished implementing a new feature, to PingtheSemanticWeb, that has been requested by many people: being able to get a list of pings based on the namespaces defined in a RDF document.

The motivation was that many RDF document like RSS 1.0 could be of no use for some type of web services or software agents. So we needed to filter them by defined namespaces.

How does it work?

Tim Finin from Swoogle sent me an associative list of the most commonly used namespaces prefixes for the most commonly tracked ontologies by Swoogle.

When I receive a ping for a RDF document, I extract all the namespaces defined, I check their “most commonly used prefixes” and I add them to the pinging list with these prefixes.

That way, if you want the list of RDF documents with the defined namespaces “foaf” or “doap”, you only have to fill the “Pings received with these namespaces” (in the export page) box with the string “foaf doap” (note: all prefixes are “spaces separated”). That way, you will receive the list of all pings received by PingtheSemanticWeb having the namespaces “foaf” and “doap” defined in their RDF documents.

Then a crawler will have to get the list of namespaces prefix in the “types” attribute of an “rdfdocument” element, split all the prefixes, and check in the associative array of the namespaces and their prefixes to know which ontologies are in the RDF document.

Modification to the export format

The version of the export format is now set to “1.1”. I added a new attribute to the “rdfdocument” called “ns”. Now a pings export file look like:

<pingthesemanticwebUpdate version=”1.1″ updated=”2006-08-11 11:20:54″>

<rdfdocument url=”http://b4mad.net/datenbrei/index.php?sioc_type=post&sioc_id=300″ created=”2006-08-11 11:21:00″ updated=”2006-08-14 09:57:26″ topics=”semantic web foaf technology WordPress sioc” ns=”foaf rss admin dc rdfs rdf content sioc” />

</pingthesemanticwebUpdate>

Where “ns” is the string that contains all the “most commonly used prefixes” for namespaces defined in this RDF document. Note that namespaces prefixes are spaces separated.

Bugs

I re-iter by bug warning: “there are probably some bugs left. If you find inconsistencies, please send me an email at [fred] at [fgiasson.com] so I’ll fix it as soon as possible. I will not be able to fix it this between the 23 to the 26 August considering that I’ll be off for my summer vacancies in the Maine; so it should work properly, without any problems, but we never know”

Semantic Radar for FireFox and the Semantic Web Services environment

August 21, 2006 Frederick Giasson

Recently I developer a web service for the semantic web called PingtheSemanticWeb. The goal of this service is to be notified that a new semantic web document has been published on the Web; to archive its location; and to give its location to other web services: it is called a pinging system, a sort of hub for semantic web documents.

Many people think that the semantic web is only accessible and useful for researcher. They were probably right five years ago; but now I see the landscape of the Web changing.

More and more web services related to the semantic web are emerging. More and more of these services are starting to interact together. More and more people are using the semantic web technologies and services in their day to day work.

This blog post is about an emerging architecture of semantic web services.

The Semantic Radar for FireFox

One of the new comer is the Semantic Radar wrote by Uldis Bojars. This plug-in for FireFox will notify you if it finds a FOAF, SIOC or DOAP RDF document on the web pages your surf.

The characteristic of semantic web documents is that they are not intended for humans, but for software agents (like search engines crawlers, personal agent software like Web Feed Readers, etc). The consequence is that humans do not see these documents, so no body really knows that the Semantic Web is growing and growing on the current Web.

This is the purpose of this new Semantic Radar: unveiling the Semantic Web to humans.

The Semantic Radar: much more than that

This plug-in is much more than that. Effectively, each time it detects one of these semantic web documents, it will notify PingtheSemanticWeb.com web service.

This is where the interaction between semantic web services and applications are starting to emerge. Now Web browsers will detect semantic web documents and notify a web service acting as a central repository for semantic web documents.

The New Semantic Web Services Environment

A couple of years ago, everything was looking good on paper; now everything is starting to look good on the Web.

Bellow is a simple schema describing the interaction between some technologies and web services of the Semantic Web.

This is not the semantic web, but this is a small portion of it; this is an example of how it is all working together: this is a sort of Semantic Web Mashup.

Semantic Web Documents (RDF)

This is what the semantic web is: a sea of documents formatted in such a way that they explicit the semantic of their content.

RDF. The more widespread type of semantic web documents.
FOAF. Documents describing the person.
SIOC. Documents describing online communities.
DOAP. Documents describing a project.

Exporters

All these documents are generated (created) by special application called exporters. An exporter is a program that will generate a semantic web document with a source of data (normally a database).

Some of these exporters will be able to ping (sending a notification) to PingtheSemanticWeb.com pinging service when they generate or update a new semantic web document (like Talk Digger and ODS, or blogs using WordPress, b2Evolution or DotClear that added PinttheSemanticWeb.com to their servers to ping when they publish new articles on their blog).

Portal exporters

This is a sub-class of exporters. They are community web sites with thousands of users that are exporting some of their content using semantic web documents.

Talk Digger. This web application is exporting the profile of its users as FOAF documents and exports its conversations (because it is a web service that search for conversations on the Web) as SIOC documents.
LiveJournal. This is a blogging community web site with 10 million of registered users that export its user profiles as FOAF documents.
ODS. This is a set of Web2.0 applications like blogs, Wikis, forums, etc. It exports all its data using documents like FOAF, SIOC, DOAP, and other RDF ontologies.

Individual exporters

This is another sub-class of exporters. They are generally plug-ins that individual users add to their software to let them export their data in one of these type of documents.

One good example of such exporters is plug-ins for blogging softwares:

WordPress SIOC exporter. It let the WordPress blogging software export its data in SIOC document.
b2Evolution SIOC exporter. It let the b2Evolution blogging software export its data in SIOC document.
DotClear SIOC exporter. It let the DotClear blogging software export its data in SIOC document.

Individual pings

Even if many exporters will automatically ping PingtheSemanticWeb.com service, not all of them will (by example, LiveJournal is not). Also, individual people will create and publish semantic web documents without pinging the system too.

In such a case, the document could be “invisible” to the “semantic web” because nobody knows they exist.

This is the reason why you have another kind of tool that let people ping specific web pages. That way, they have the power to say: Hey! I found that semantic web document; you can find it there.

Semantic Radar for FireFox. This tool will notify the pinging server if the user encounters a semantic web document while he is surfing the Web.
Bookmarklet. This is the PingtheSemanticWeb.com bookmarklet that let a user click on a bookmark to notify the service that a semantic web document is present on the page is he currently looking at.
Website. This is the web interface of the service that let people enter URL of document they found to include them to the service.

Ping the Semantic Web.com

This semantic web service is at the center of the architecture I present today. It will act as a multiplexer for semantic web documents location. It will receive the location of semantic web document from a multitude of sources; it will archive them; and it will re-distribute the location to other web services or software agents.

This is the place where the semantic web is truly unveiled; and this is the place where people will go to know where semantic web documents live.

Other web services and software agents

Other application like Swoogle has to crawls the Web to find Semantic Web documents. This is why Swoogle will integrate PingtheSemanticWeb in their infrastructure: they will directly have access to a full list of RDF documents ready to be included in their search engine.

Other web services like the SIOC explorer will import only the list of new SIOC documents.

So for all web services or software applications there is a place where they can find a list of new semantic web document ready to be used.

The future

What I think is that the synergy created by this architecture could propel the adoption of the Semantic Web.

More people will create semantic web documents if more web services are using them.

Web services will use more semantic web documents if more people will create them.

More web services will create semantic web documents if more people will use them.

More people will use semantic web documents if more web services create them.

In a case or another, this interaction will be driven by:

How many semantic web documents are accessible (quickly and easily)

The answer to this need is PingtheSemanticWeb service; pinging tools like the Semantic Radar; and a set of dedicated users that find and ping “semantic web documents”.

Ping the Semantic Web.com: new auto-discovery method and re-indexing of the database

August 18, 2006August 21, 2006 Frederick Giasson

In the first version of the pinging service I was only checking at the namespaces to know where was belonging a RDF documents (to SIOC, FOAF, DOAP or to another ontology). After some conversations I revised the method used and I upgraded the auto-discovery (classification system) of the pinging system.

Now I validate and classify each RDF documents in three steps:

I check if the RDF header is present in the “RDF document”.
Then I check if the namespace of the ontology (SIOC, FOAF or DOAP) is used in the RDF document (as base namespace of the RDF document or with prefix).
Finally I check if I can find an instance of a class of the ontology somewhere in the RDF document.

If these three steps are true, I will classify the RDF document accordingly (a file can be classified in more than one section). Otherwise, I will add it to the list of pinged RDF documents (non-categorized ones).

Re-indexing

I deleted all the databases and re-indexed all the RDF documents using this new classification technique. Now all the little classification glitches some people experimented should be past. Now you should see the ping appearing in the list only if an instance of the ontology is defined in the RDF document.

Bugs

Naturally there are probably some bugs left. If you find inconsistencies, please send me an email at [fred] at [fgiasson.com] so I’ll fix it as soon as possible. I will not be able to fix it this weekend considering that I’ll be off in the wood without any possible communication; so it should work properly, without any problems, but we never know 🙂

Open question

There is still an open question raised by Tim Finin as a comment on my last blog post. I will check what I could do this next Monday; however if someone have an idea, please leave it as a comment on this blog post.

Machine Learning, Engineering & Data

Author: Frederick Giasson

How Talk Digger should evolve in the next months

New Talk Digger website now Online

Ping the Semantic Web: exporting pings list by Namespaces

Semantic Radar for FireFox and the Semantic Web Services environment

Ping the Semantic Web.com: new auto-discovery method and re-indexing of the database