Frederick Giasson

Talk Digger Beta 2.0: a totally new system and interface

December 12, 2005May 21, 2006 Frederick Giasson

I talked about it in my previous blog posts. I worked on it during the last two months. Then the new Talk Digger website is released.

I will call this version Beta 2.0. In fact, I would call it the Beta 1.0 considering that the first version of Talk Digger was in reality an Alpha one. Everything is new: the underlying system, the interface, the design, the RSS feed, etc. Why do I re-programmed/re-designed everything? Because I wanted to get rid of the first mistakes I have done in the previous version; I wanted to design it in such a way that it would be a good base to extend it in a new type of service (that I will develop in the next months).

So, what is new in this version?

1. I designed a more traditional search engine layout. This one is much simpler than the previous one. I wanted to make Talk Digger simple (but not simpler!). I tried to make it more intuitive for new users.

2. Much more results are displayed by Talk Digger (between 10 to 20 depending on the search engine).

3. Some new search engines: Google Blog and Yahoo!

4. I added a really great feature to the system (thanks to Tom Sherman for the idea): the PageRank of each returned results by Talk Digger is displayed beside the title of the items. This is really great because it gives trustable information about each result: what is its popularity and credibility on the Internet.

5. New options have been added to the system. Now you can specify the maximum number of results you want to view per search engine. You can sort the results with the most recent entries first or with the highest PageRank first.

6. I created a hotkeys system that helps users with the usability and navigation of the website.

7. The tracking RSS feed system is now formatted using RSS 1.0 instead of RSS 2.0. I briefly explained why in that previous post.

8. All the duplicated results (the same article returned by two different search engines) are deleted (only one will be displayed). You also have the option to exclude results with the same domain name as the searched URL’s.

9. It works on IE/FireFox/Safari/Opera on both PC and MAC. The entire website is XHTML1.0 Strict and CSS validated.

10. A new slogan: “You talk, we dig!” (thanks to Bora Ung)

I also created a “Tour” section that show how Talk Digger works:

What is the near future of Talk Digger?

During the next month, I will work improving Talk Digger with the feedbacks from users; but I will also check the possibility to broadcast the TD results in RDF and/or OWL. That way, other services would be able to gather and understand the computed results returned by Talk Digger and being able to do want they want with the information (a first step into the semantic web…). I am currently designing the RDF Schemas (and the OWL ontology) that will describe the Talk Digger results. However I am not sure that I will open such a service right now (considering the network infrastructure it would need and my current lack of money).

In fact, it would be the first step to test Talk Digger as a Semantic Web service. The next phase of its development will go even further in that direction (it’s the goal).

What is The future of Talk Digger?

Two lines of research: (1) semantic web and (2) semantic analysis/management of web documents.

So, this is what is happening right now with Talk Digger.

Do not hesitate to contact me is you have any questions, comments or suggestions about that new version of Talk Digger: it is always greatly appreciated.

I would like to thank Tom Sherman, Jeff Nolan and Matthew Hurst for their comments and suggestions about this new version of Talk Digger. I would also like to thank Suzanne Morel at Les Graphoides for this totally new graphical design and the time she spent working, re-working and re-re-working on the graphics as my mind changed.

I hope you like this new Talk Digger version and find it as useful as I.

Why Microsoft seems to reinvent the wheel with RSS?

December 8, 2005May 22, 2006 Frederick Giasson

I cannot understand why Microsoft seems to try to reinvent the wheel with RSS 2.0. Okay, I am a little bit late with that one, but I just discovered that they talked about an “extension” to RSS 2.0 called “Simple List Extensions Specification” at Gnomedex 2005.

Well, what this SLES is all about? “The Simple List Extensions are designed as extensions to existing feed formats to make exposing ordered lists of items easier and more accessible to users”.

Then I was lost…

Why does Microsoft publish such a specification for RSS 2.0? RSS 1.0, supported by XML Namespaces and RDF, already use such an ordered list called a “rdf:seq” to do exactly the same thing. This capability is provided directly by RDF.

I already wrote about the difference between RSS 1.0 and RSS 2.0 and I really do not understand why Microsoft develops modules for RSS 2.0 instead of implementing everything using RSS 1.0 and RDF.

I already read somewhere that Microsoft doesn’t have in their plan to develop any RDF parser in their .NET framework. It is probably one of the reasons why they do not use RDF 1.0: because they do not have any tool to implement it and do not have plans to develop one.

Why? Someone could help me with that one?

Right now I think that my greatest whish is to have the Jena framework developed in C#. I think that I can’t rely on Microsoft for that one.

Finally it seems that I am not the only person that have questions related with this move in relation with RSS 1.0.

tags technorati : Rss rss1.0 rdf micrsoft sles gnomedex2005 c# .net jena framework

I save time with new technologies: the result is that I do more things with that time.

December 6, 2005May 21, 2006 Frederick Giasson

In the past, 30 or 40 years ago, people were saying: in the future, with all the new technologies, we will work 20 hours a week and all the rest of our time will be spend on leisure.

In fact, 30 or 40 years later, people are doing twice the work they were doing with the same time. The new technologies permit us to do much more things in much less time. Some people will tell me that it is not the case, but I would say that if it is not, at least the quality is greater.

The problem is that with these new technologies and these new working techniques follow new dynamics. Well, if we can do more in less time, then why the situation is not as expected decades ago? Because new geo-demo-politico-dynamics are emerging at the same time. The world is changing, everything goes faster and faster. New democracies are emerging, new populations want their part of the cake, information is democratizing with the evolution of the Internet, etc. We have to learn, to assess, and to act quickly to be able to cope with this new and constantly changing world.

It is in that vision that new products and technologies emerge every week. Most of these products try to help you to cope with these new dynamics. They try to automatically assess your environment, they try to help you to find relevant things in the constant incoming flow of information, and they try to make things easier for you: but the result seems that it will only help you to do much more things with the same time.

Is it our human nature to works endlessly? Is it our social structure that is pushing us in that direction? Is it the result of cultural interactions? Why do we use that saved time only to try to do more things?

RSS 1.0, RSS 2.0: make it simple not simpler

November 30, 2005May 25, 2006 Frederick Giasson

Update to the discussion about RSS 1.0 vs. RSS 2.0

Why using RDF instead of XML? [25 May 2006]

“Make everything as simple as possible, but not simpler”. – Albert Einstein.

I love that quote of Albert Einstein. Few words that tell so much to designers. Make things simple to the user, make it such that he does not even know that he his using what you designed (okay, it is an utopia); but beware: do not make it simpler, do not compromise on the capabilities of what you are designing to make it simple (this is all the art of design).

This said, I am currently rewriting the Talk Digger RSS feeds generator for the next release planned in a week or two. While working on it, I found that I done the error: I make it simpler while whishing to make it simple.

Let me explain the situation. Some months ago, I choose to create the feeds in RSS 2.0 instead of RSS 1.0. But what is the problem then? RSS 2.0 should be much more evolved then RSS 1.0, isn’t? No, it is not. RSS 2.0 is about 2 years younger than RSS 1.0, but much simpler. Why do I say that the file format is much simpler? Because RSS 1.0 feeds are serialized in RDF and RSS 2.0 feeds are serialized in XML.

Where is the problem then? XML serialized files are much easier to read than RDF serialized ones; in fact, RDF files are only cluttered XML files, isn’t it? No, definitely not. It is sure that RDF/XML serialized files (because there exist other serialization format like N3 that will also serialize RDF files) are less intuitive to read for humans, but they are much more powerful to answer to some needs.

Personally I see RSS 2.0 as a lesser version of RSS 1.0. Why? Because applications that support RSS 2.0 are much simpler (a thing that we do not want) considering that it only have to handle XML files instead of full RDF ones.

Fred, you are telling us that RSS 1.0 is much powerful than RSS 2.0? Yes, all the power of RSS 1.0 resides in the fact that it supports modules. This capability is given by RDF and his ability to import external RDF schemas to extend his vocabulary. What is a module? A module gives the possibility to the content publisher to extend his file format’s vocabulary by importing external RDF schemas.

Okay, but what is the advantage of using these modules? I will explain it with an example using Talk Digger. I am currently thinking about creating a RDF schema that would model some semantic relations that Talk Digger will compute with the search engines’ returned results. Personally I want to make that information publicly available to anyone who would like to have access to it and do something with it. This said, I am also thinking to broadcast the information directly in the RSS feed: I want to create only one source of information that would broadcast everything. RSS 1.0 gives me that possibility (in fact, a RSS 1.0 web feed is a normal RDF/XML file using the RSS 1.0 schema). It is beautiful, I can make all the information I want available to any one, in a unique source. If a software that read the feed do not understand a part of the information I broadcast (in reality, he do not know the RDF schema I am using) he simply skip it and continue to read the source of information (the web feed) and do what he have to do with the information he understand. I can’t do that with RSS 2.0 because it is serialized in XML and not RDF. I could even add OWL elements in my feeds to model some relations between the knowledge represented in the web feeds. That way an application could be able to infer knowledge from it! An example of a popular module is the Dublin Core metadata initiative.

You are probably thinking: yeah Fred, but readers only have to support both formats, and publishers also only have to support both formats as well and everybody will be happy. Bad design thinking: do not forget that the goal is to make application. How do you think that I will explain the difference between RSS 1.0 and RSS 2.0 to my mother? How do you think that I will explain her which one to choose if she have the possibility to subscribe to more than one feed? Will she choose RSS 0.91, RSS 0.92, RSS 1.0, RSS 2.0, ATOM 0.9, or ATOM 1.0 (because some websites propose them all)? Sorry, but I do not want to.

One of the current problems

On of the problem are the way applications handle all these file formats and serializations. I will explain it with a problem I faced today while testing the new RSS feed of Talk Digger with Bloglines.

A thing I wanted was to use the Dublin Core element “Description” instead of the normal “<description></description>” tag of the RSS 1.0 specification. I first thought that it would scale much more because the Dublin Core RDF schema is widely use by many, many applications over the Internet. First I tested it using RSS Bandit. It worked like a charm. All the Dublin Code elements I added to my RSS feed were handled by it. Wow! Then a tested it with Bloglines: nothing. Bloglines just doesn’t handle that Dublin Core tag: deception.

Then I included this namespace into my RDF file: “xmlns:ct=”http://purl.org/rss/1.0/modules/content/””. Then I re-tested it: nothing. Wow, it should works, isn’t? Then I tested something else, I changed the alias “ct” for the namespace “content”: it worked. What a deception I had: Bloglines is not caring about the namespace local alias, in fact it seems that it parse the RSS 1.0 feeds (in fact a RDF file) with fixed strings. The system should know that “ct” is related to the namespace instead of “content” because they are just aliases that I use to define the namespace in my local file. It is a perfect example of bad implementation of specifications in softwares.

The problem here is that Bloglines is the most popular web feed reader out there. So I have to change the way I build my feeds to handle that fact, but I shouldn’t be supposed to (it is really frustrating). Will I have to change the way I build my feeds each time I discover that an application is not parsing and using them properly? I hope no, I shouldn’t be supposed to because I follow the specification to build them.

I hope they will check that problem with their parser and hire somebody to develop a robust system that will parse and handle the RDF specification, and not only parsing RSS 1.0 feeds as simple text files with some format… (Could I change that skill requirement “Familiarity with RSS and blogs” for “Strong understanding of RDF, RSS and blogs”, cited in that job proposition, to answer that responsibility “maintain and improve RSS crawling and parsing processes”.

I hope to be able to show you how RSS 1.0 could be extended, using a future version of Talk Digger, soon.

The reasons why am I quiet these days

November 29, 2005May 21, 2006 Frederick Giasson

My posting rate is lower and lower as days pass. It is not an illusion, it is the sad reality.

What is happening? Many things! It is the reason why I am quiet these days. I am currently working hard on the next version of Talk Digger. In fact, I am not really working a new version; I am working on a totally new piece of software.

What is new?

Everything and nothing. It is the same services, without any real innovation. But it is everything because it is a much more robust system, the Ajax interface is working on Internet Explorer / FireFox / Opera / Safari on both Windows and Mac, there are much… much more results, the pages are XHTML 1.0 Strict compatible for a maximal compatibility, validation are done everywhere in the system to ensure that nothing goes wrong, and the layout is totally new and much simpler.

Why?

For some reasons. First, I was ashamed of the current version of Talk Digger. In fact, it was not really a Beta version, it was an Alpha prototype. Second, I wanted a robust system that I could easily extend to implement all the ideas I have to upgrade and innovate with the service. Third, I wanted a system where I could easily broadcast the computed information to other web-services that could use it to do something (the current problem with that idea is the computer/network infrastructure); that way it would be a first step to enter Talk Digger in the Semantic Web: making it a semantic web-service oriented toward the conversations that evolve on the Internet.

This post is not complete, descriptive and is probably really not clear. However, have in mind that this new version of Talk Digger will be available in a week or two and then I will start to talk about it and his future at that time.

During that time, are there things that you would like to see in that new version of Talk Digger? Some wishes?

Technorati: Talkdigger | semantic | semanticweb |

Machine Learning, Engineering & Data

Author: Frederick Giasson

Talk Digger Beta 2.0: a totally new system and interface

Why Microsoft seems to reinvent the wheel with RSS?

I save time with new technologies: the result is that I do more things with that time.

RSS 1.0, RSS 2.0: make it simple not simpler

The reasons why am I quiet these days