Frederick Giasson – Page 66 – Machine Learning, Engineering & Data

“I worry that search engines are sucking out too much of the Web’s value, acting as leeches on companies that create the very source materials the search engines index.”

There is the problem:

“The traditional analysis has been that search engines amply return the favor by directing traffic to these sites. While there’s still some truth to that, the scenario is changing.”

Jakob articulate his idea around many facts. I would like to talk about a specific feature of search engines: the cache. I think that the first search engine that cached and broadcasted its crawled web pages’ content is Google (am I right? However, it does not change anything to the story). This is a great feature for users: even if the webpage goes down its content is accessible forever (as long as Google, MSN, Yahoo, etc. live).

The question is: is it legal, or at least moral? The thing is that a Google, MSN or Yahoo users can browse the web without leaving these search engines. I will take of my time to create content; a company will spend money to create other content; and these search engines will shamelessly get that content, index it, and broadcast it without redirecting users to these website (most of the time they will, at least I hope).

There are two vision of the situation: (1) everything that is on the Internet should be free (I think that 40% or 50% of the Web’s content is created by hobbyist without earning a cent out of it); (2) the second is that content creators always have rights, even on the Web.

Which vision do I have? Probably one between these two; something of common sense. However, a problem I noted is that companies like Google will tell you what you can and cannot do with their “content” (it is their computation, but not their content, sorry) with highly restrictive terms of service. When reading such texts, something like that come up in my mind: do what I say, not what I do.

As I already noted in a previous post, John Heilemann wrote in the New York Metro:

“Alan Murray wrote a column in the Wall Street Journal that called Google’s business model a new kind of feudalism: The peasants produce the content; Google makes the profits.”

My current work with Talk Digger and my recent readings on the Blogsphere make me think about: how search engines will evolve; how people will react to this new situation; and will a new type of [search] service will emerge from that emerging environment?

Finally, it seems that Google’s crawlers also give some headaches to Daniel Lemire:

“Some of you who tried to access my web site in recent days have noticed that it was getting increasingly sluggish. In an earlier post, I reported that Google accounted for 25% of my page hits, sometimes much more. As it turns out, these two issues are related. Google was eating all my bandwidth.”

New basic features in Talk Digger

January 20, 2006May 21, 2006 Frederick Giasson

I released a couple of new features in Talk Digger today. The first one is a request of John Tropea. I added a new option when you create a RSS feed. Now you can exclude the results with the same domain name as the searched URL’s. So, if you create a RSS feed with the URL “fgiasson.com/blog/”, then all the results with the domain name “fgiasson.com” will be excluded of the feed. Also have in mind that the duplicated results (two times the same URL) are also excluded from the RSS feed.

I also added a new option that let you enable or disable the PageRank feature. So if you do not like the feature and think that it just clutters your user interface, then you only have to uncheck the option and the PageRank will not be displayed for the results anymore.

If you have any other ideas to improve Talk Digger, or find a bug with what I changed, please contact me without any hesitation!

Talk Digger now support Digg.com

January 19, 2006May 21, 2006 Frederick Giasson

Some days ago, digg.com changed their search feature. Now we can enter an URL to know what are the digs related with that URL. Digg.com is not a traditional search engine. So if I dig “fgiasson.com” on Talk Digger for example, then the results I will receive from digg.com will be the digs related with that URL and not the URL of the web pages that link to that URL (this is how all the other search engines supported by Talk Digger work).

However, I need to try digg.com with Talk Digger. The idea of integrating digg.com to Talk Digger is to give a possibility to read the comments from digg.com users about an URL. The usefulness of digg.com results are not the same as the other search engines. That way, we will be able to read comments, good and bad, from users of a constantly growing community.

So, if you click on the results’ link, you will be redirected to digg.com. That way you will be able to read what digg.com users say about that URL. To save browsing time, I added the number of comments in the title of each result.

This is a test I am doing right now. I have some doubts about its true usefulness, but the only way to find out if it is, is by implementing it on talkdigger.com and check if Talk Digger users like it.

So I would appreciate to know what you think about this new implementation; how I could improve it; how I could make it even more useful, etc.

Also, you will see that I disable two search engines: Feedster and Icerocket. The Feedster linkback feature is under re-development, so it is unavailable for the moment. Icerocket always finish by banning the IP address of talkdigger.com, I constantly talk with Blake Rhodes to delete it of their black list, however I will waiting until talkdigger.com is put on their green list before re-enabling it.

Robert A. Heinlein

January 19, 2006 Frederick Giasson

“As they keep telling you in Basic, doing something constructive at once is better than figuring out the best thing to do hours later.”

— Robert A. Heinlein

I want that my deputy has a blog

January 18, 2006May 22, 2006 Frederick Giasson

I just received a flyer from the Conservative Party of Canada candidate in my region. I checked it and I saw that he had a web site with his full name as domain name. Then I told me: I hope he have a blog! Then I checked: no blog; deception.

People are talking about that the Internet can help democracy. The Canadian government does some public consultations over the Internet. People are even thinking about voting over the Internet (in fact I already voted over the Internet at the University Laval).

Now, I would like that my deputy have a blog. Companies use blogs to market their products. Companies use blogs to have a contact with their clients. Writers have blogs to get in touch with their readers. I have a blog to find new ideas, to get feedbacks from Talk Digger users. Why my deputy doesn’t have a blog?

If it is good for companies with their clients, why it could not be good for the government and their citizens?

It would be so interesting to know who my deputy is, what he is working on, his ideas and visions. So I could comments his ideas; I could show him my vision of things; I could discuss a specific article with other citizens in my region.

Great, but why the Conservative Party of Canada does setup a blog network for all his deputies, ministers and representatives? That way I could know what these people have in mind, but even more important, I could get a voice and tell them what I am thinking.

Perfect! So, what about the Liberal Party of Canada? The New Democratic Party of Canada or even the Bloc Québécois?

It is great to have offices everywhere to meet people. But what happen if citizens do not have the time to go there and get information they need? What if they do not have 40 hours to check who his Federal, Provincial or Municipal candidates are, what are their visions, etc? Please, do not blame people by saying something like: it is your duty to take your time to gather this information, to meet that people, etc. Yup it is, but they also have to work to feed and educate their children. Please, help them a little bit by making information available more easily. A good way could be by using blogs and blogs networks.

I want that my deputy have a blog!