Past Projects – Page 12 – Frederick Giasson

Talk Digger now support Digg.com

January 19, 2006May 21, 2006 Frederick Giasson

Some days ago, digg.com changed their search feature. Now we can enter an URL to know what are the digs related with that URL. Digg.com is not a traditional search engine. So if I dig “fgiasson.com” on Talk Digger for example, then the results I will receive from digg.com will be the digs related with that URL and not the URL of the web pages that link to that URL (this is how all the other search engines supported by Talk Digger work).

However, I need to try digg.com with Talk Digger. The idea of integrating digg.com to Talk Digger is to give a possibility to read the comments from digg.com users about an URL. The usefulness of digg.com results are not the same as the other search engines. That way, we will be able to read comments, good and bad, from users of a constantly growing community.

So, if you click on the results’ link, you will be redirected to digg.com. That way you will be able to read what digg.com users say about that URL. To save browsing time, I added the number of comments in the title of each result.

This is a test I am doing right now. I have some doubts about its true usefulness, but the only way to find out if it is, is by implementing it on talkdigger.com and check if Talk Digger users like it.

So I would appreciate to know what you think about this new implementation; how I could improve it; how I could make it even more useful, etc.

Also, you will see that I disable two search engines: Feedster and Icerocket. The Feedster linkback feature is under re-development, so it is unavailable for the moment. Icerocket always finish by banning the IP address of talkdigger.com, I constantly talk with Blake Rhodes to delete it of their black list, however I will waiting until talkdigger.com is put on their green list before re-enabling it.

Spreading the Word about Talk Digger: how users developed the service, and how they spread the word

January 5, 2006May 21, 2006 Frederick Giasson

I am certain that Talk Digger would not be what it is without its users. In fact, I owe everything to them. They spread the word about Talk Digger everywhere. They also give their feedbacks about the service: what they like, but even more important: what they didn’t like and what they would like to be able to do with it.

When I sit back and check the evolution of Talk Digger, I am confronted to a reality: I didn’t really develop anything, my users had! This is a fantastic story that I will remember all my life.

The first version of Talk Digger was the implementation of an idea: a meta search-engine that compare results of linkback features of certain search engines. That’s it.

Then something wonderful happened: I started to talk with the new users and started to find how they were using it, how they saw its utility, how it could be and couldn’t be good at.

The evolution of Talk Digger by users:

The first post about Talk Digger.
The first thing that a user suggested for Talk Digger was a Bookmarklet. The idea was great: users were able to know who was linking to a webpage in a single click, on any web browser.
Following this, a hardcore and now evangelist of Talk Digger over LiveJournal called Ivan A. Illyn told me how he was seeing Talk Digger: a way to find, follow and join conversations evolving around a specific URL. Talk Digger was born.
One day I was talking with Anouar El Haji over Skype, and then the idea of broadcasting the results of Talk Digger using RSS was born.

The version 1.0 was complete and working. I done many mistakes with that version, the service matured and I wanted to reflect that maturation by a totally new user interface and functionalities.

Then the version 2.0 was born. I thought about the new user interface, the new design, and the new and improved architecture. Then users come up with some more ideas:

I sent a pre-release for testing to a couple of old Talk Digger users and Tom Sherman came up with some ideas including the Page Rank feature.
Jeff Nolan suggested creating an option to exclude results with the searched domain name.
Bora Ung came up with a new slogan: “You Talk, we Dig!”.
Recently, David Jones suggested creating a feature to be able to see where the article came from: then the regional view option saw the day.

All these incredible ideas came from Talk Digger users. The only thing I have done is to read what they had so say, to ask them questions and to develop them. It was by blogging, skype, email, or phone.

Finally, hundred of users spread the word about Talk Digger everywhere on the Internet.

So, what do you think that Talk Digger would be without its users? The answer is simple: nothing. Talk Digger would not be anything without you, the users. I can’t thank you enough for that. In fact, since the new version of Talk Digger, the number of unique users per day doubled. This result has only been possible with my interaction with Talk Digger users. It is something that I will remember all my life in my professional career.

New Talk Digger feature: Regional view of results

December 31, 2005May 21, 2006 Frederick Giasson

I just released a new feature for Talk Digger. The idea of the feature is born with a wish of a PR worker called David Jones that wanted to be able to see and sort results per regions (countries). It was important for him considering that his client cared more about the comments from people in their targeted markets than the others. I found that that feature was essential for Talk Digger; not just for marketing and PR workers but also for everybody. What I like with this idea is that it put a touch of humanness in the digged conversations. It gives a new metric to users to try to analyze who are talking in a conversation. So I take the last two days to develop and release that new feature.

What this new feature is all about? It is called Regional setting. This setting let you enable the regional view of each result. If that feature is enabled (by default it is disabled) a flag of the country where the server that host the resulting web will appears. This option is useful when you try to find people living in a specific country that talk about an URL. This option is especially helpful for marketing and PR workers that have to do regional searches for the products of their clients.

How to interpret the flags? The flag appearing beside a title shows the country where the web page is hosted. If a Japanese blogger host his blog in America, then you will see the flag of the United-States except if he do not have a generic domain name (.com, .net, etc.) but a country one (.jp). However, people generally take their country domain names or at least host their web pages with a local web hoster. Considering the situation, I would say that 70% of the displayed flags represent the country where the creator of the result lives.

If you enable that feature, you will be able to sort the incoming results by countries. For example, here the first results will be the Canadian pages and all the others will be grouped by countries.

I hope that you will find that new feature, another one that saw the day by my interaction with Talk Digger users, as useful as I.

Happy New Year!

Preliminary analysis: some results of the topic-extracting module of Talk Digger

December 21, 2005May 21, 2006 Frederick Giasson

As soon as finished and released the last version of Talk Digger I started to work on a new prototype module that tries to extract topics of returned results by search engines. What are these topics? They are the topics that evolve in a conversation (and a conversation is a set of articles that link to a specific URL returned by different search engines).

I release these preliminary results because I find them somewhat interesting (so it could possibly interest another person too; in fact I read this blog post by Anjo Anjewierden yesterday when I developing these tests, so I thought that I could write a little something on the subject)).

These results are based on the results returned by Technorati with a search on a recent article of the BBC. The set of raw texts returned is defined by:

DOCUMENT 3

URL: http://www.jnoelbell.me.uk/2005/12/21/so-much-news-so-little-time/

be resolved shortly. I shudder to think about the people trying to get to the airports in a day or two. secondly, thank god people are coming to their senses . â€œintelligent designâ€� is just a pretext for promoting religion, which has no place in the public schools. you donâ€™t like it?
[…]

DOCUMENT 7

URL: http://godcountryyale.blogspot.com/2005/12/suck-it-fox-news.html

wild glory days when I got linked to from Not Even Wrong… Anyway, big news today is that the “intelligent design” case in Dover got struck down ( BBC , CNN), an event that was made all the merrier due to the fact that I first heard about it on Fox News while channel surfing. If you
[…]

Click here to see the whole set

The next step is to perform some lexical analysis techniques to ‘purify’ the raw text. The resulting set of purified texts is:

DOCUMENT 3

resolve shudder think people try airport sense design pretext religion place public school

[…]

DOCUMENT 7

wild glory day link wrong today design case dove down event fact first while channel surf

[…]

Click here to see the whole set

As you can notice, only nouns remain. The reason is simple: I assume that the words that have the greatest semantic meaning to describe topics are nouns. In the next steps, verb, adverbs and adjectives will possibly be added to these sets because of their possible semantic relations with these nouns in other conceptual domains.

The 10 most frequent words of this set will create the set of possible topics of the conversation. The set is defined by:

there [Frequence: (2) Tag count: (0)]

teach [Frequence: (3) Tag count: (0)]

today [Frequence: (3) Tag count: (13)]

federal [Frequence: (3) Tag count: (0)]

class [Frequence: (3) Tag count: (190)]

school [Frequence: (3) Tag count: (108)]

sense [Frequence: (3) Tag count: (8)]

judge [Frequence: (3) Tag count: (3)]

pretext [Frequence: (3) Tag count: (0)]

design [Frequence: (11) Tag count: (13)]

An interesting metric I make explicit in these results is the “tag count”. The tag count is the number of time the word appears in The Brown Corpus. It tells us what the “popularity” of the word is. So if I have to choose between two words with the same meaning, I will choose the one with the greatest tag count because it is the one that is the most used in English literature.

The next step is trying to find new topics with semantic relations with the existing possible ones, or to strengthen the currents one.

If you check a lexicon, you will see that each word can be defined by one or more sets of synonyms. In the current example I take the assumption that a word is defined by all his synonyms sets (it is an assumption I do to make things simpler, but in real world, I would have to find which of the synonym sets define the words by his meaning in that context). I make the guess that other words from the same article and the other articles (belonging to the conversation) will smooth the error’s effect on the results.

So, there is the set of possible topics augmented by the synonym sets of each words belonging to the set of possible topics.

social_class, socio-economic_class, course_of_instruction, course_of_study, course, category, family, division, year, grade, form, civilise, civilize, schooling, schoolhouse, feel, signified, shoal, schooltime, cultivate, train, educate, school_day, classify, sort, learn, Blackbeard, Edward_Thatch, Thatch, instruct, at_that_place, on_that_point, in_that_respect, thither, in_that_location, Edward_Teach, Teach, Fed, separate, sort_out, assort, federal_official, Federal_soldier, now, nowadays, Union, Union_soldier, sensory_faculty, sentiency, project, aim, intention, guise, pretense, evaluator, stalking-horse, pretence, intent, purpose, invention, figure, designing, innovation, excogitation, blueprint, conception, justice, contrive, jurist, common_sense, try, gumption, horse_sense, sentience, sensation, mother_wit, adjudicate, good_sense, estimate, approximate, guess, magistrate, label, pronounce, gauge, pattern, Federal, plan, pretext, teach, there, federal, today, judge, class, sense, school, design

The more interesting words of this set are:

pattern [Frequence: (2) Tag count: (9)]

Federal [Frequence: (3) Tag count: (0)]

plan [Frequence: (3) Tag count: (43)]

pretext [Frequence: (5) Tag count: (0)]

teach [Frequence: (5) Tag count: (0)]

there [Frequence: (6) Tag count: (0)]

federal [Frequence: (6) Tag count: (0)]

today [Frequence: (7) Tag count: (13)]

judge [Frequence: (10) Tag count: (3)]

class [Frequence: (12) Tag count: (190)]

sense [Frequence: (12) Tag count: (8)]

school [Frequence: (13) Tag count: (108)]

design [Frequence: (25) Tag count: (13)]

NOTE: If you check, you can think that the frequencies are not good. The reason is that I added the frequency of the previous sets with the ones of the synonyms set.

There are three interesting facts: (1) the appearance of the concept “plan”; (2) the upgrade of the concept “school” forced by his semantic links with the synonyms sets of the other words belonging to the set; and (3) the downgrade of the concept “pretext”.

The current set of possible topics is now defined by the 10 most frequent nouns we extracted and the synonym sets of each of these words.

The final step performed to find the topics of a conversation is to augment the set of possible topics with the words that describes the same concepts as the one in the set (the sister concepts). The resulting set is defined by:

Texas_Independence_Day, February_22, March_2, Washington’s_Birthday, St_Patrick’s_Day, April_Fools’, March_17, Saint_Patrick’s_Day, February_14, St_Valentine’s_Day, February_2, Groundhog_Day, holiday, Tet, Lincoln’s_Birthday, February_12, Saint_Valentine’s_Day, Valentine’s_Day, Valentine_Day, April_Fools’_day, All_Fools’_day, Father’s_Day, June_14, Flag_Day, June_3, Citizenship_Day, September_17, October_24, United_Nations_Day, American_Indian_Day, Davis’_Birthday, Jefferson_Davis’_Birthday, Patriot’s_Day, April_14, Pan_American_Day, May_Day, First_of_May, Armed_Forces_Day, Mother’s_Day, May_1, January_19, Robert_E_Lee_Day, old_age, middle_age, adulthood, salad_days, geezerhood, deathbed, commencement_day, Arbor_Day, Admission_Day, bloom_of_youth, mid-nineties, golden_years, mid-sixties, sixties, seventies, mid-seventies, nineties, mid-eighties, eighties, degree_day, November_5, market_day, ides, election_day, polling_day, Walpurgis_Night, New_Year’s_Eve, Halloween, Robert_E_Lee’s_Birthday, December_31, payday, red-letter_day, leap_day,

[…]

measure, time, estimate, dull, strike, age, point, gauge, dissolve, denature, label, indicate, intention, order, acquaint, obscure, resolve, get, sensitize, moderate, sensitise, blunt, blur, division, contrive, take, draw, purpose, tame, report, course, try, construct, pattern, run, bring, touch, season, think, life, activate, break, grade, set, shift, feel, loosen, sense, year, night, project, convert, plan, judge, school, turn, figure, separate, train, develop, aim, transform, class, make, form, design

Click here to see the whole set

As you can notice, there is a little exponential explosion. This is a problem and this is the reason why I should take decisions, at each step, to keep the best words that could describe the topics of a conversation.

The most interesting words in this new set are:

think [Frequence: (19) Tag count: (0)]
life [Frequence: (19) Tag count: (107)]
activate [Frequence: (20) Tag count: (2)]
break [Frequence: (20) Tag count: (0)]
grade [Frequence: (20) Tag count: (17)]
set [Frequence: (21) Tag count: (24)]
shift [Frequence: (22) Tag count: (1)]
feel [Frequence: (22) Tag count: (5)]
loosen [Frequence: (22) Tag count: (0)]
sense [Frequence: (22) Tag count: (8)]
year [Frequence: (23) Tag count: (5)]
night [Frequence: (24) Tag count: (736)]
project [Frequence: (24) Tag count: (1)]
convert [Frequence: (25) Tag count: (0)]
plan [Frequence: (25) Tag count: (43)]
judge [Frequence: (25) Tag count: (3)]
school [Frequence: (26) Tag count: (108)]
turn [Frequence: (26) Tag count: (4)]
figure [Frequence: (26) Tag count: (0)]
separate [Frequence: (27) Tag count: (3)]
train [Frequence: (29) Tag count: (5)]
develop [Frequence: (30) Tag count: (45)]
aim [Frequence: (31) Tag count: (4)]
transform [Frequence: (32) Tag count: (3)]
class [Frequence: (32) Tag count: (190)]
make [Frequence: (40) Tag count: (34)]
form [Frequence: (40) Tag count: (1)]
design [Frequence: (50) Tag count: (13)]

Some interesting new words appeared, other less interesting appeared too. This is just an example of the impact of adding the sets of words describing the sister concepts of the previous set of possible topics. We could do the same thing by adding the set of more general concepts related with our current set of concepts (Hypernymification) or by adding the set of more specific concepts related with our current set of concepts (Hyponymification).

This first test I made with a real world example is quite interesting and even promising. So, what I will do with that? Keep checking at talkdigger.com in the next month.

Alexia opens its teragigs of indexes: can Talk Digger get advantage of it?

December 13, 2005May 22, 2006 Frederick Giasson

Alexia (Amazon.com) just started a new web service that will give access to Alexia’s databases to anyone who needs it. It is really great news. I am all excited to see that big companies are opening themselves and making their data publicly available to anyone who needs it.

I am talking about how I see the future of the Web since some months. I am talking about the vision I have of the future of the Internet with the Semantic Web, etc. I talked about how the Web could change if everybody makes his gathered/processed/indexed content publicly available.

Yesterday I released a totally new version of Talk Digger. I talked about how I would like to make the computed results available to anyone who needs it. It is a dream I have, it is a reality that Amazon makes. Talk Digger and Alexia results would not be the same, the users would not too, but in a case or another, it goes in a vision of things that could change the way we use the Internet, the way that the Internet growth.

The new version of Talk Digger is using a web service of Google: PageRank. It is really great way to try to see what is the credibility of the people that are talking about an URL; it is a great way to know who the people that participate to a conversation are. It is sure that it is not the best and only way to do that, but it is a good start. In fact I am designing a system, a new feature of Talk Digger, that I think it could be a good way to see, analyze and interpret these conversations. In a case or another, it is a great feature that will be part of Talk Digger for long (as long as Google gives access to their API through a web service).

There is the point: Talk Digger goes ever further in displaying its results using the service of another company.

Now, would it be possible to integrate the new Alexia web service to enhance Talk Digger’s results? It would be really great considering all the stuff we have access too using the web service. I could even compare the Google’s PageRank with Alexia’s Popularity system to compute a unique indicator that would use both services (none are full-proof, but both of them could be complementary).

The problem with Alexia’s service is that I am restricted to one request per IP per second. The thing is that if you start a search for an URL and receive 70 results, then Talk Digger requested the PageRank of these 70 URLs in less than a second. So, I cannot really implement Alexia’s new web service in Talk Digger with this restriction.

In a case or another, Amazon has done a great thing by creating this new web service. I hope that other companies follow them in that direction.

Frederick Giasson

Machine Learning, Engineering & Data

Category: Past Projects

Talk Digger now support Digg.com

Spreading the Word about Talk Digger: how users developed the service, and how they spread the word

New Talk Digger feature: Regional view of results

Preliminary analysis: some results of the topic-extracting module of Talk Digger

Alexia opens its teragigs of indexes: can Talk Digger get advantage of it?