Web – Page 14 – Frederick Giasson

Answering to: The One Crucial Idea of Web 2.0

March 18, 2006May 21, 2006 Frederick Giasson

With this blog post I’m answering Joshua Porter regarding one of his most recent blog posts. To fully appreciate or understand my response, you should read his before continuing to read this post.

What is Web 2.0? Personally I do not agree that Web 2.0 is defined, as widely accepted, by the new social Web services trend that relies on a community to define and “dig” the Web. I would call that Web the “Web 1.5”: a new crucial step toward something much bigger: the Semantic Web; my view of the Web 2.0.

In reality, the new social Web services like Digg, Flickr, Del.icio.us, etc. are not new technologies. These services use old, well-understood methods and technologies. I think that the crucial factor that makes them spread like voracious mushrooms is the drastic decline of the price of their supporting infrastructure: cheap broadband, good open source (and free) developing technologies like MySQL, PHP (no licensing costs) and gigabytes of hard drive space for pennies. This is a form of convergence, not a new Web.

Mr. Porter wrote:

If there is one idea that encapsulates what Web 2.0 is about, one idea that wasn’t a factor before but is a factor now, it’s the idea of leveraging the network to uncover the Wisdom of Crowds. Forget Ajax, APIs, and other technologies for a second. The big challenge is aggregating whatever tidbits of digitally-recorded behavior we can find, making some sense of it algorithmically, and then uncovering the wisdom of crowds through a clear and easy interface to it.

It is all about popularity; it is all about Google Pagerank. But it is one tool amongst many others.

Google offers good services. Google changed the landscape in the search industry. The problem is that I can always spend 1 hour finding something on the Web, and yet what I find is often basically unacceptable.

To upgrade the Web, we should see a breakthrough that drastically upgrades its efficiency. Unfortunately Digg or Technorati have never helped me to decrease search time. They are cool services, but they don’t answer that particular need.

To put the tag 2.0 on the Web, we should see such a breakthrough. It is why I would call the emerging social trend 1.5: a good step forward, but not enough to change the first number of the version.

I have some questions for people who think that the current emerging “Web 2.0” is a major breakthrough for the Web:

1- What happens if the “crowd” does not find the golden piece of information I am searching for because it is buried too deeply in the Web and nobody noticed it before?

2- Did anyone see an article written on the Canadian government that offers tricks to complete your income taxes form popping-up on Digg?

The problem I see with this method is that something has to be flagged by many, many people to pop-up to the surface – *something* has to be useful to many people that will dig it, link to it, etc. And personally I find useful information all day long, but I don’t or won’t link to that useful information.

I do not want to have the references to resources that meets the needs of *everybody on the Web*; I want to have the references to resources that fill MY needs.

The only time that such methods are really useful is when my needs meet those of the majority. That is often the case when we talk about general information. However it just doesn’t work when I start to search for up-to-date and specific information about an obscure subject, a subject that few people care about, or even more important, a subject about which information has to be inferred in order to be discovered!

What is happening with these new services like Digg, Flickr or Del.icio.us, started with Google’s Pagerank idea, is good and really cool, but I hope this is not a end-point for the next 10 years – otherwise we will miss focusing on something much more useful and important.

And the evidence is mounting. Today, Richard MacManus writes of the new features on Rojo, and in explaining what they are Chris Alden tells Richard that they’re emulating Pagerank:

“How do we do it? (determine relevance) Generally, just like Google used link metadata to determine relevance of search results, there is a fair amount of metadata we can use to infer relevance, including how many people are reading, tagging, and voting for a story, how popular the feed is – both to you personally, to your contacts, and to all readers, as well as things like link data and content analysis. “

When I read this, I think about the Semantic Web: a way to create metadata on resources not to infer relevance, but to infer Knowledge. Relevance is good, at least in some scenarios, but Knowledge is better because it is good in all scenarios. Remember: Knowledge is power.

The problem is that people think about inferring relevance in terms of popularity, people linking and talking about something, and not in term of Knowledge.

I sincerely hope that people will start to talk about the Web 2.0 as a web of Knowledge, a Web of *Semantic usage*. As I said, I would refer as the social Web to the Web 1.5: a first step, a first non-academic and widespread experience toward the Web 2.0: the Web of Knowledge, the Web where you do not lose 1 hour of your precious time searching for something trivial but unfortunately not popular.

Vast: a model for the Semantic Web

March 16, 2006May 21, 2006 Frederick Giasson

In the past I talked a lot about the future of the web, the Semantic Web, and what developers and businesses have to do to make it happen.

Many people are currently blogging about a new search engine called Vast. I took a quick look at it and found an impressive service. At the moment, it has semantic capabilities. It’s a normal search engine that crawls the web to search for specific data: cars, jobs, and profiles. It broadcasts this information using a REST interface and no semantic relations between the results are available.

So, why do I say that it is a model for the Semantic Web if it has nothing to do with semantics? Because it has a crucial characteristic needed by most of semantic web services to make the idea of the Semantic Web work.

I have already talked about it in this blog post: sharing its content and computations freely, to anyone who needs it, without any restrictions.

This is exactly what Vast is doing:

“Use the Vast Dataset to Build and Augment Your Own Services – it’s free, open, and available for commercial and non-commercial uses!”

“Vast’s entire dataset is available for you to add to your site, blog, or service. You can rebuild all of Vast.com, if you’d like, offer targeted classified search results to your users, build visualizations or mappings, or process the data to find interesting correlations.”

So, you have an idea, and you need their data to develop it – what will you do? Take the data from their web servers, without any restriction.

The only question you need to ask yourself is: do I trust their reliability to develop my project using their free service? The answer is up to you, and it’s the sort of question developers must repeatedly ask themselves – if a new environment consisting of such web services is emerging (and I think it is).

I also questioned myself on what could be the business model of such a Web service? They have a part of the answer:

“At some point, we will accept payment for advertising embedded in our feeds. At that time, the advertising revenues will of course be shared with developers and partners using our feeds.”

“When Vast decides to embed sponsored links or advertisements in the dataset, you must display these links or ads with prominence alongside or as part of the data. However, it is our plan that you will receive a share of the revenues that you help generate for doing so.”

As simple as crying rabbit. They embedded some sort of sponsored results in their results listing and they force you to display them with their terms of service.

However, who cares? I mean, I have no problem with ads as long as they are relevant to what I am searching for. If their service suggests to me a car found somewhere on the Web or a car result sponsored by someone else, as long as I have a car that fill my needs I do not care where it came from and it is exactly what Vast is doing.

I can only say one thing at this point: congratulations guys for making all your data freely available to anybody, and for having built a viable business model (in my humble opinion) over it.

Is there place for a Meta-Memetracker and what would be its utility?

March 10, 2006May 22, 2006 Frederick Giasson

I came across a seed idea spread on the FeedBlog, wrote Kevin Burton, yesterday (using Talk Digger of course, you see the link to it in the blog post? It is the reason why linking is so important ). He pointed out an idea that Dave Winer gave for free 3 days ago on his blog.

The idea?

“Implement a search engine that accumulates all the stories pointed to by the top meme-engines over time. That way if I think of something I saw on Tailrank or Memeorandum a year ago, I just go to the universal meme search engine, type in the phrase, and get back the hits.”

Kevin was thinking about something a little bit different: a meta-memetracker that would look like Talk Digger.

I think that there is a place (at least emerging) for such a service considering the growing number of memetracker out there (TailRank, Memeorandum, Findatory, Megite, and probably others that I do not know of (I found yesterday a sort of memetracker on Rojo’s main page that is really cool)).

What would be the added value to users? The first thing is that you would have only one place to visit to get the top stories (obvious behavior for a meta-memetracker, no?).

However, I think that a more interesting phenomenon would happen too. The thing is that none of these memetrackers use the same methods/algorithms to find out what is a good story. Some seems to works with links and predefined list of good information sources selected by humans, other probably user some sort of advanced natural language processing algorithms, other a mix of these two methods and other probably use methods that I can’t think of.

All the memetrackers have one thing in common: they aggregate stories they think that are good (are they performing users profiling? It could be one next step to increase the effectiveness of these services Dave).

This said, some stories appear on all memetracker and other only on one of them. So, if one algorithm doesn’t score well for a specific story, it is not really a problem because the strength of the meta-memetracker is that it would prioritize the set of results composed by the intersections of the sets of results returned by each memetracker. That said, the meta-memetracker would return the bests of the bests stories because the error rate would be blended by the intersection of results’ sets.

It was my two pennies

(if you would like to read more about the socio-philosophical background of popularity, read that blog post wrote by Joshua Porter a couple of days ago)

The Web as a Publishing Platform: How could we optimize the process?

March 8, 2006May 21, 2006 Frederick Giasson

I was putting back my head back in my books and documents, rereading things and trying figuring out what will come next after 5 weeks off.

Then I re-read the transcript of a talk give by Tim Berners-Lee at a W3C meeting in London the 3 December 1997. He was talking about the evolution of the Web, started to talk about the concept of the Semantic Web and what it could brings to the Web.

Then I read:

“[…] One crazy aspect of the current Web use setup is that the user who wishes to publish something has to decide whether to use mailing lists, newsgroups, or the Web. The best choice in a particular case depends on the anticipated demand and likely readership pattern. A mistake can be costly. It is not always easy for a person to anticipate the demand for a particular Web page. […]”

It was 9 years ago, and is always up to date.

The idea Tim had in mind was probably to use semantic web technologies to publish texts: that way any software (agents) could use that published content the way they like.

The good news is that it is what is happening with the emergence of the Web Feeds like technologies. It is a good experience but there are much more to do. What would be great is to extent that Blogging (publishing) / Aggregating (reading) trend to everything else: news, shopping catalogs, anything else that is publishable and useful to somebody.

That way, anybody would be able to use one easy-to-use tool, to publish anything, to anyone (or any web service) over the Web without caring about anything else than the content he is publishing.

Utopia? I do not think so. A lot of work for sure but not utopia.

tags technorati : Web publishing semantic blogs feeds read

Search Engines are vampires that suck blood out of web pages

January 21, 2006May 22, 2006 Frederick Giasson

Jakob Nielsen wrote a great article about the relationship between search engines and web sites:

“I worry that search engines are sucking out too much of the Web’s value, acting as leeches on companies that create the very source materials the search engines index.”

There is the problem:

“The traditional analysis has been that search engines amply return the favor by directing traffic to these sites. While there’s still some truth to that, the scenario is changing.”

Jakob articulate his idea around many facts. I would like to talk about a specific feature of search engines: the cache. I think that the first search engine that cached and broadcasted its crawled web pages’ content is Google (am I right? However, it does not change anything to the story). This is a great feature for users: even if the webpage goes down its content is accessible forever (as long as Google, MSN, Yahoo, etc. live).

The question is: is it legal, or at least moral? The thing is that a Google, MSN or Yahoo users can browse the web without leaving these search engines. I will take of my time to create content; a company will spend money to create other content; and these search engines will shamelessly get that content, index it, and broadcast it without redirecting users to these website (most of the time they will, at least I hope).

There are two vision of the situation: (1) everything that is on the Internet should be free (I think that 40% or 50% of the Web’s content is created by hobbyist without earning a cent out of it); (2) the second is that content creators always have rights, even on the Web.

Which vision do I have? Probably one between these two; something of common sense. However, a problem I noted is that companies like Google will tell you what you can and cannot do with their “content” (it is their computation, but not their content, sorry) with highly restrictive terms of service. When reading such texts, something like that come up in my mind: do what I say, not what I do.

As I already noted in a previous post, John Heilemann wrote in the New York Metro:

“Alan Murray wrote a column in the Wall Street Journal that called Google’s business model a new kind of feudalism: The peasants produce the content; Google makes the profits.”

My current work with Talk Digger and my recent readings on the Blogsphere make me think about: how search engines will evolve; how people will react to this new situation; and will a new type of [search] service will emerge from that emerging environment?

Finally, it seems that Google’s crawlers also give some headaches to Daniel Lemire:

“Some of you who tried to access my web site in recent days have noticed that it was getting increasingly sluggish. In an earlier post, I reported that Google accounted for 25% of my page hits, sometimes much more. As it turns out, these two issues are related. Google was eating all my bandwidth.”

Frederick Giasson

Machine Learning, Engineering & Data

Category: Web

Answering to: The One Crucial Idea of Web 2.0

Vast: a model for the Semantic Web

Is there place for a Meta-Memetracker and what would be its utility?

The Web as a Publishing Platform: How could we optimize the process?

Search Engines are vampires that suck blood out of web pages