March 2006 – Page 2 – Frederick Giasson

Vast: a model for the Semantic Web

March 16, 2006May 21, 2006 Frederick Giasson

In the past I talked a lot about the future of the web, the Semantic Web, and what developers and businesses have to do to make it happen.

Many people are currently blogging about a new search engine called Vast. I took a quick look at it and found an impressive service. At the moment, it has semantic capabilities. It’s a normal search engine that crawls the web to search for specific data: cars, jobs, and profiles. It broadcasts this information using a REST interface and no semantic relations between the results are available.

So, why do I say that it is a model for the Semantic Web if it has nothing to do with semantics? Because it has a crucial characteristic needed by most of semantic web services to make the idea of the Semantic Web work.

I have already talked about it in this blog post: sharing its content and computations freely, to anyone who needs it, without any restrictions.

This is exactly what Vast is doing:

“Use the Vast Dataset to Build and Augment Your Own Services – it’s free, open, and available for commercial and non-commercial uses!”

“Vast’s entire dataset is available for you to add to your site, blog, or service. You can rebuild all of Vast.com, if you’d like, offer targeted classified search results to your users, build visualizations or mappings, or process the data to find interesting correlations.”

So, you have an idea, and you need their data to develop it – what will you do? Take the data from their web servers, without any restriction.

The only question you need to ask yourself is: do I trust their reliability to develop my project using their free service? The answer is up to you, and it’s the sort of question developers must repeatedly ask themselves – if a new environment consisting of such web services is emerging (and I think it is).

I also questioned myself on what could be the business model of such a Web service? They have a part of the answer:

“At some point, we will accept payment for advertising embedded in our feeds. At that time, the advertising revenues will of course be shared with developers and partners using our feeds.”

“When Vast decides to embed sponsored links or advertisements in the dataset, you must display these links or ads with prominence alongside or as part of the data. However, it is our plan that you will receive a share of the revenues that you help generate for doing so.”

As simple as crying rabbit. They embedded some sort of sponsored results in their results listing and they force you to display them with their terms of service.

However, who cares? I mean, I have no problem with ads as long as they are relevant to what I am searching for. If their service suggests to me a car found somewhere on the Web or a car result sponsored by someone else, as long as I have a car that fill my needs I do not care where it came from and it is exactly what Vast is doing.

I can only say one thing at this point: congratulations guys for making all your data freely available to anybody, and for having built a viable business model (in my humble opinion) over it.

Memetracking and Web Feed Reading

March 13, 2006May 22, 2006 Frederick Giasson

I read this post a couple of days ago when I was trying to cope with all the things that happened in the Blogsphere while I was traveling. This is a really well written and insightful post wrote by Robert Scoble about Memetracker vs. Web Feed Readers.

[…]

I miss my RSS reading. Reading RSS makes me smarter, not snarkier. Why? Cause I choose who I’m going to read. Pick smart people to read and you’ll get smarter.

Hint, the smartest people in my RSS are usually the least snarky. Why? Cause they could give a f**k about all the traffic.

[…]

I totally agree with Robert on this one, and it is probably a reason why I do not give much importance to memetrackers and that I only subscribe to their RSS feeds: I give them the same importance as any other bloggers.

However, memetrackers and blog search engines have the same problem: when you try to discover new blogs and new articles that may be of interest to you, you always get the same people and the same blog posts.

Unpopular bloggers have really good ideas. However, nobody finds them because they are not popular and they are not popular because they don’t give care at all about being popular.

The problem is that all these services generally use some sort of ranking system; the type of system popularized by Google. However ranking systems are not built to show you the best results, they show you the most popular results with the premise that they are the best; but they rarely are not. So, now – how can I find these bright people? How can I read their awesome ideas?

That’s what I want: I want something that helps me manage the information in such a way that it will aggregate information that may be of interest or use to me, and not necessarily the information that for whatever reason is of interest to the rest of the planet.

Yeah right … I am dreaming in technicolour … and I know that many people have been working on that problem for ages; however, I’m impatient and I can’t wait to see a real breakthrough as it unfolds in front of the general public.

During that time, I want to connect with and talk to people that have the same or similar interests as I do, rather than spending hours weekly trying to find these people using the current services available on the Web.

Pinging people through linking. Have you been pinged?

March 12, 2006May 21, 2006 Frederick Giasson

Many people that I talked about in my blog posts, and to whom I linked, have left comments on these blog posts. Now, what I would like to check is to what degree people use services like Talk Digger to find out who’s talking about them, what they’re saying and help them make contact to talk more and/or start a new relationship.

So, what I am going to do is write a list of the names of people I read who are not subscribed to my web feed (or that I think are not). And the intention of this blog post is to ping specific people by using links on my blog.

So, if you are one of these people … it would be nice to leave a short comment on this blog post telling me that you found it via the link I’ve made to your blog letting me know what you used to find it (it’s not obligatory that this be Talk Digger 😉 )

So there is the list:

Danah Boyd, Robert Scoble, Darren Rowse, Steve Rubel, Andy Wibbels, Toby, Scott Ginsberg, Paul Graham, Seth Godin, Jeff Cornwall, David H. Beisel, Amber Mac, Anil Dash, Daniel Lemire, Jack Vinson, Lilia Efimova, David Sifry, Matthew Hurst, John Battelle, Kevin Burton, John Tropea, David Weinberger, Michael Arrington

I have no idea what the results of this experiment will be , but I think that they could be interesting and maybe even surprising.

Better English, better blog posts

March 12, 2006May 22, 2006 Frederick Giasson

You will probably notice that the English of my blog post will upgrade considerably in my next posts. This is not magic, and no I didn’t implement an English language micro-chip in my brain. Everything is the grammar correction work of Jon Husband, a good friend of mine. He told me: “Fred, if you want that I continue to read your blog, I will have to correct your posts, otherwise I stop, I can’t continue anymore!”

Okay, it is not exactly what he said, but I would have understood! Nah, Jon kindly told me that he would be willing to correct my blog posts before I publish them, so I would be able to know the English errors I make habitually, and thus begin to accelerate the improvement of my English skills. Naturally, I said yes to his proposition!

That said, it’s a win-win game: I will continue to upgrade my English skills (and there is a lot of room for that) and you will begin to read better-written English blog posts.

Thanks Jon.

Is there place for a Meta-Memetracker and what would be its utility?

March 10, 2006May 22, 2006 Frederick Giasson

I came across a seed idea spread on the FeedBlog, wrote Kevin Burton, yesterday (using Talk Digger of course, you see the link to it in the blog post? It is the reason why linking is so important ). He pointed out an idea that Dave Winer gave for free 3 days ago on his blog.

The idea?

“Implement a search engine that accumulates all the stories pointed to by the top meme-engines over time. That way if I think of something I saw on Tailrank or Memeorandum a year ago, I just go to the universal meme search engine, type in the phrase, and get back the hits.”

Kevin was thinking about something a little bit different: a meta-memetracker that would look like Talk Digger.

I think that there is a place (at least emerging) for such a service considering the growing number of memetracker out there (TailRank, Memeorandum, Findatory, Megite, and probably others that I do not know of (I found yesterday a sort of memetracker on Rojo’s main page that is really cool)).

What would be the added value to users? The first thing is that you would have only one place to visit to get the top stories (obvious behavior for a meta-memetracker, no?).

However, I think that a more interesting phenomenon would happen too. The thing is that none of these memetrackers use the same methods/algorithms to find out what is a good story. Some seems to works with links and predefined list of good information sources selected by humans, other probably user some sort of advanced natural language processing algorithms, other a mix of these two methods and other probably use methods that I can’t think of.

All the memetrackers have one thing in common: they aggregate stories they think that are good (are they performing users profiling? It could be one next step to increase the effectiveness of these services Dave).

This said, some stories appear on all memetracker and other only on one of them. So, if one algorithm doesn’t score well for a specific story, it is not really a problem because the strength of the meta-memetracker is that it would prioritize the set of results composed by the intersections of the sets of results returned by each memetracker. That said, the meta-memetracker would return the bests of the bests stories because the error rate would be blended by the intersection of results’ sets.

It was my two pennies

(if you would like to read more about the socio-philosophical background of popularity, read that blog post wrote by Joshua Porter a couple of days ago)

Frederick Giasson

Machine Learning, Engineering & Data

Month: March 2006

Vast: a model for the Semantic Web

Memetracking and Web Feed Reading

Pinging people through linking. Have you been pinged?

Better English, better blog posts

Is there place for a Meta-Memetracker and what would be its utility?