July 2006 – Frederick Giasson

Norvig(Google) and Berners-Lee on the Semantic Web at AAAI 06

July 20, 2006July 20, 2006 Frederick Giasson

Many people are talking about that piece of news (Google exec challenges Berners-Lee): some will say that Tim is right; other will say that Peter is right. No one is right or not, everything depends on your situation in that environment (created by the Semantic Web).

Everybody knows Tim Berners-Lee, but everybody should also know that Peter Norvig is not a second class citizen. He wrote, with Stuart Russel, probably the best and most comprehensive book in the field of Artificial Intelligence, he is the director of research at Google, etc.

The best blog post I read about that subject, and that resumes really well my point of view, is the one wrote by Danny Ayers: Incompetents Revolt!

As reported by the CNet article:

Peter said:

“What I get a lot is: ‘Why are you against the Semantic Web?’ I am not against the Semantic Web. But from Google’s point of view, there are a few things you need to overcome, incompetence being the first,” Norvig said. Norvig clarified that it was not Berners-Lee or his group that he was referring to as incompetent, but the general user. […] We deal with millions of Web masters who can’t configure a server, can’t write HTML. It’s hard for them to go to the next step.”

Most of the thing I read vis-à-vis that declaration was talking about the “incompetence of users toward Semantic Web technologies”. However, I think that the most important point here is that Peter takes the time to say: as the director of research at Google, a Billionaire Company, I have some reserve vis-à-vis the Semantic Web.

Google have some reserver vis-à-vis it, but why? For technical considerations? For business vision? Anything else? I don’t know, and they probably don’t know either. Everybody fears the unknown. Why Google wouldn’t? They are, and they are probably because they can’t grasp what is at sake with their company, just like everybody else in the World.

Peter said:

“The second problem is competition. Some commercial providers say, ‘I’m the leader. Why should I standardize?’ The third problem is one of deception. We deal every day with people who try to rank higher in the results and then try to sell someone Viagra when that’s not what they are looking for. With less human oversight with the Semantic Web, we are worried about it being easier to be deceptive,” Norvig said.

Danny wrote:

“Competition and standardization – yes, certainly issues for the Web. But the companies that thrive in this environment tend to be the ones that embrace open standards. The fact is that the rest of the world is likely to be bigger than any leader. Respect the long tail.”

I add:

Hell yes he is right! If I put myself in the skin of any shopkeeper, restaurant owner, etc, do I want that people have a semantic access to my information [in these cases: price of merchandise, delivery procedures, etc]? Hell yes I want! However, if I put myself in the skin of a Google exec, do I want? I am not certainly sure that I want… give me sometime please, so I’ll be able to rework my business plan accordingly.

Later Tim said in answer to Peter:

“Berners-Lee agreed with Norvig that deception on the Internet is a problem, but he argued that part of the Semantic Web is about identifying the originator of information, and identifying why the information can be trusted, not just the content of the information itself.”

Yesterday I wrote on the SIOC Google Group “that I don’t think that the semweb will be crawled as Google crawl current websites. I think that the first step will be to use semweb technologies to let web services interact together with trusted sources of information. From there, network of trusted sources will emerge, etc, etc etc.

I think that people tend to forgot all the “trust” layer of the semweb when they talk about tricking semweb agents or search engines (in fact, trusts relationships will be explicit or inferred). Think about memetrackers like techmeme.com. The system started with a list of trusted bloggers and news sites, and expended its list by adding trusted sources from them, etc.

But noturally, once more, Danny’s writing summarize the whole point much better:

But anyhow, is Norvig really suggesting that Google are currently solving deception issues via human oversight? Whatever, proof and trust are key considerations for Semantic Web systems. The foundations are built into the theory, logic provides one route to addressing the issues. Even if that formalism is ignored and statistical approaches taken, the use of common languages makes data more amenable to analysis. Probabilities and uncertainties can be expressed as logical statements on top of the base languages. However you approach it, the answer is mo’ better data, not less.

Finally, it is not good or bad; it only depends on your position in the environment such a Web would create. I think we inevitably go in that direction, the only thing is that some people will need more time than other.

Hack for the encoding of a URL into another URL problem with Apache and mod_rewrite

July 19, 2006July 19, 2006 Frederick Giasson

While configuring my new dedicated server to support the new generation of Talk Digger, I encountered a really strange bug that emerged with the interaction of urlencode(), Apache and mod_rewrite.

It took me about a working day to figure out what was the bug, where it could come from, searching information to know if I am the only one on earth to have it, fixing it, etc.

I found out that I was not the only one to have that bug, but I never found any reliable source of information to fix it. Because I am using Open Source softwares, I think it is my duty to post the fix somewhere on the Web and this “somewhere” is on my blog. Normally I do not post such technical articles, but considering that it is an interesting bug, that many people expect it and that there is no central source of information that explain how to fix it from A to Z, so I decided to take a couple of minutes to write that article.

What is the context?

I have to encode a URL into another URL.

For example, I would like to encode that url:

www.test.com/test.com?test=1&test2=2

Into that other url:

www.foo.com/directory/www.test.com/test.com?test=1&test2=2

To do that, I have to encode the first url; the result would be:

www.foo.com/directory/www.test.com%2Ftest.com?test=1&test2=2

What is the bug?

The problem we have is that when you try to apply RewriteRule(s) to these URL using Apache (1.3) and the mod_rewrite module, mod_rewrite will not be able match any of its rules with that url.

By example, if I have a rule like:

RewriteRule ^directory/(.*)/?$ directory/redirect.php?url=$1 [L]

mod_rewrite will not be able match the rule with the URL even if it matches. The problem, as cited above, is the encoding process of URLs between Apache and mod-rewrite.

The explanation

The problem seems to be that the url passed to mod_rewrite seem prematurely unencoded. With a single encoding (urlencode( ) in PHP) of a URL, the RewriteRule(s) will not be matched if the “%2F” character is in the URL, or if it is (no %2F character in the url) then the substitution will not be necessarily completed.

After having identified the problem I found the bug entry of the problem: ASF Bugzilla Bug 34602

It is the best source I found, but it was not complete to resolve the problem I had.

The simplest hack, but the ugliest!

The simplest fix is to double encode the url you want to include in your other url. (by example, in php I would encode my url with: urlencode(urlencode(“www.test.com/test.com?test=1&test2=2” )); ). That way, everything will work fine with mod_rewrite and it will match the rule.

The problem with that easy fix is that it adds a lot of ugly characters in your URL. Personally I find that unacceptable, especially when we know that mod_rewrite is there to create beautiful URL!

The second hack

The second fix is to re-encode the url directly in the mod_rewrite module. We will re-encode all the url at the exception of the “%2F” character (because it is a glitch (bug?) not related with mod_rewrite but probably Apache itself). What you have to do is to create you own urlencode( ) method to encode all characters except “/”. That way everything will works as normally, except that the “/” character will not be encoded.

Security related to that hack

I don’t think this fix add a security hole if we think about code injection in URL or other possible hole. I’ll have to further analyze that point to make sure of that.

Future work

In the future it would be great to find where in Apache the “/” (%2F) character is prematurely decoded, or where we could encode it just before it is passed to mod_rewrite.

THE HACK

Okay, there is how to install that hack on your web server.

I only tested it on Apache 1.3.36 and mod_rewrite. I have no idea if the same problem occurs with Apache 2.

Step #1

The first step is to create your own urlencode( ) function that will encode a url without encoding the “/” character. A simple PHP function that would do the job could be (it is really not efficient, but it will do the job for now):

function url_encode($url)
{
return str_replace(“%2F”, “/”, urlencode($url));
}

Step #2

The second step is to change the code in mod_rewrite.c to re-encode the url.

You have to replace the mod_rewrite.c file into Apache’s source code at [/apache_1.3.36/src/modules/standard/] by this one:

The hacked mod_rewrite.c file

Step #3

Then you have to recompile/re-install your Apache web server.

Finished

Everything should now work fine. In your server-side scripts (PHP for example), you will have to encode your url with the new url_encode( ) function. Then everything will work just fine with mod_rewrite and it will matches the rules as expected.

The last word

I hope that this little tutorial will help you if you have the same problem as I had. Please point me any error/upgrade/code-enhancement in the comment section of that post, it will be really appreciated!

People Aggregator, Talk Digger, the SIOC ontology and the vision of a Semantic, Interactional, Web

July 12, 2006 Frederick Giasson

In this blog post, I will introduce a new web service called People Aggregator. I will point out interesting things they have with their vision of People Aggregator and its relation with Talk Digger; I will explain the advantages and disadvantages of developing Web API or using annotated RDF documents in HTML files; and finally I will introduce the SIOC ontology to Marc Canter.

What can you do with PeopleAggregator?

If we refer to this slideshow, you will find that there are really interesting things to do with People Aggregator:

Import your profile from other social networks and keep it in synch across networks
Connect, create and communicate across networks
Establish relationships, join/create groups, send messages, import/export content
Post all kinds of content, from anywhere you like microcontent, blog posts, media, people showcases, recipes, and more
Interconnection between sites and services, using open standards
Portability of data
Tens of millions of decentralized networks outside the control of large companies

For them who read my blog for some time will find that this is the vision I had while developing the next generation of Talk Digger. It is not only that, it is the vision I have for the future of the Web. It is not only a question of openness; it is a question of communities’ interaction. And to reach those communities interaction, we need, in fact, data openness.

When I read such features, I also think about the SIOC ontology, you can’t miss the relations.

What’s next with People Aggregator?

We need to work with web services developers to flesh out the mesh
We need to identify standards within domain areas
We’re going to build all sorts of mashups to show the potential

The only thing I can say here is: contact me Mr. Canter, me and the SIOC community are only waiting for people/entrepreneur like you to work with.

The current state of People Aggregator: it doesn’t do what it is supposed to do

Okay, I was really excited to read all that stuff and started to dream of People Aggregator interacting with Talk Digger and other web services.

So I created an account and: nothing. I was not able to import and/or export my FOAF or hCard profiles, I was even not able to delete my account.

I checked their developers Wiki without any success considering that the API and its documentation is currently under construction.

Okay well, so all these beautiful words without anything real?
Well… yes and no. Currently no, but it seems that in a near future yes:

Marc Canter wrote that comment on the Read/WriteWeb blog:

FOAF and XFN/hCard support are coming – I PROMISE! But we HAD to hit Gnomedex as a deadline – so we shipped with what we got. Remember nobody is funding this – but me.

Meanwhile there’s one subtle thing I’d disagree with Richard’s excellent article. He implies that you NEED to use PeepAgg to achieve all this.

Well today you do – but we’re hoping that the APIs and open data structures (like FOAF and XFN/hCard) will be adopted by others – so others can provide the same functionality. In fact we NEED that to happen or else we’ll be all by ourselves in our sandbox.

So we’re not saying you HAVE to use PeepAgg or that your data eventually resides inside of PeepAgg. We’re just showing the way, we’re giving the APIs that we develop to the community so they are – in fact – NOT proprietary.

And standards like Microsoft’s contact list can be meshed into our web, just as easily as FOAF or XFN. The underlying principle is of inclusion – and particiaption by all.

I totally understand the decision he took to hit Gnomedex. He had to create the front-head first to attract people and possible investors at Gnomedex and other conferences. The only thing that is sad is that all his system/concept is about the API and the open data structures. However, I have no doubts that these features are coming soon.

What I really enjoy is that he has the same vision of People Aggregator as the one I have with Talk Digger.

Open documents or Web API?

With Talk Digger, I didn’t chose to develop a Web API to let other web services access its data, at least for now. Instead I choose to annotate all my HTML pages with RDF content (SIOC, FOAF, etc). That way, any crawlers/agent software can crawl Talk Digger’s website, read these file, and do what they want to do with the data.

People Aggregator seems preferring to use a Web API to deliver their content.

Which method is best? No one, they both have good and bad.

The advantage of annotating HTML files with RDF content is that as soon as a crawler/agent software can find, read and interpret an annotated documents in a HTML web page, it can read and interpret it everywhere: Talk Digger, Live Journal (with their FOAF), People Aggregator (if they would do that), WordPress blogs using the SIOC ontology plugin, etc, etc. The problem is that it is slow considering that they have to crawl each web page if they want that content.

The advantage of using a Web API is that it is much faster to get the content, probably much reliable and give request flexibility. The problem is that you have to make your software understand the API for each such service (for example, I would have to develop functions to interact with Talk Digger, Live Journal and People Aggregator) if I want to let my software interact with each of these systems.

Mr. Canter, think about the SIOC ontology

Mr. Canter, do you know what is the SIOC ontology? No? So read this:

This is exactly what you have to implement in People Aggregator. What is the SIOC ontology? There is the abstract of the talk Uldis will do about the SIOC ontology at the BlogTalk Reloaded conference:

Semantically Interlinked Online Communities (SIOC)

http://rdfs.org/sioc/ is a framework for expressing information

contained within weblogs and online community sites in a machine

readable form. It consists of a SIOC ontology that defines the

vocabulary used to express this information and SIOC data exporters

that provide SIOC data from these sites.

Now that SIOC data export plugins are available for popular blogging

and CMS platforms (e.g., Drupal, WordPress, DotClear) we can use this

information to provide users with better and more interesting

services. This talk describes the SIOC browser

http://rdfs.org/sioc/browser – a tool, currently in development, that

allows to browse the information extracted from weblogs. It can be

considered the first generation of consumers of SIOC data.

Two features that distinguish SIOC are: (1) that all the entries of a

weblog are exported; and (2) that all this information is in a machine

readable form. This allows to make queries over the information

exported from a blog or set of blogs – such as retrieving last post

from a user on a given topic, identifying “hot topics”, and so on.

The browser works in two modes – on-the-fly mode and crawler mode. The

former displays the SIOC data received from a weblog (thus providing a

uniform interface to all SIOC-enabled weblogs) while the later stores

SIOC data in the RDF data store allowing to make more complicated

queries via the use of SPARQL query language.

Since the information is published in SIOC – an open and public

standard – the same information source (a weblog or a multi-user blog

site) can be interpreted by many different users in a number of

different ways. This enables to develop a whole kind of browsers

similar to what happened with the emergence of RSS feed aggregators.

The browser presented here is one of the first in this group.

The FOAF import/export are developed in Talk Digger and the SIOC ontology is integrated as well in its next generation (the alpha version will be online by the end of the month). I started to check with Uldis and Alex how it could be used, extended, etc. This ontology is really promising and a good starting point for the Semantic Web vision that many people share.

What I try to do is trying to implement all these ideas (semantic web) in a real world, somewhat large scale, application. I know that it has a real potential, but I don’t think people will start adopting these technologies before viewing its potential. So it is what I am trying to do: showing them how it could be used, what are the advantages and its potential.

Mr. Canter, the only thing I can suggest you is to implement the SIOC ontology in People Aggregator and start talking with the SIOC (really active) community to make their vision, that is the same as yours, a reality.

Creating communities around Web conversations: Talk Digger, a Demo.

July 4, 2006 Frederick Giasson

I am pleased to show you the beta version of the help files of the next generation of Talk Digger.

For those who do not know, Talk Digger is a new way to find, follow and join discussions evolving on the Web. So you have three elements: (1) finding discussions, (2) following discussions and (3) joining discussions.

With the current version of Talk Digger, users get stuck at step one. These new improvements to Talk Digger will let its users to go ahead with the step two and three.

With these new features, Talk Digger will become a social platform that helps people to connect with other people that follow the same stories (the premise here is that people that follow the same discussions will also have some personal and professional interests in common). It will also become a search engine of its own, and not only a meta-one.

These help files are created like a slide show: you have a screenshot of what is going on the Talk Digger web site at the left and a description of the behavior at the right.

The reason I publish this first version of the help files is to show people what will be the next generation of Talk Digger, what they will be able to do with it. What I would also like is to get feedbacks from them: I would like to have your first impressions.

It is sure that they are just screenshot and that you can’t really have the feeling of its usability, however, I would suggest you to subscribe to the private alpha version that will be online by the end of the month or so.

There are the help sections I publish for now:

The Talk Digger home page.
A Talk Digger conversation page. This is the core section of the system.
A user profile page.
A user page.
A user tracking page.
The Talk Digger search engine.

I hope you will like what you will see!

Web2.0 concepts are as old as computer, however what make them different?

July 3, 2006 Frederick Giasson

One person recently asked me this question by email:

“The key part of Web 2.0 is that there is something about these new tools that enable new practices of collaboration,” said John Seely Brown, a consultant and former chief scientist of Xerox, who spoke at the Collaborative Technology Conference in Boston last week. “Web 2.0 is a profoundly participatory medium.”
[…]

My question to experienced bloggers is, what is the something? We had the same functionality 20 years ago, and some 40 yrs ago,
now touted as Web 2.0. What makes the difference in your opinions?

I answered him with:

Quickly, without thinking much about it, I would say the accessibility: anybody has the power to be ear if they have something to say that
worth listening at.

But it’s more than accessibility: it’s global and “easy” to use. I can follow, while reading a blog, what US soldier live in Iraq, or find out what is the feminine condition in Iran, or talk about the World Cup, I can post a photo, using my cell phone, of the London Bombing if I was here, etc. Some click, a connection, something to share, and something magic happen.

I am questioning myself about the emerging “Web2.0” trend. I don’t think it is a question of concepts, but more a question of technological conjuncture: much lower hardware price, Internet connection for everybody, anywhere in the World (I got an Internet Access at 3500m above the sea level in the middle of the Himalaya in the national park of the Everest at about 15 km of it in Namche Bazaar), the emergence of scalable and performing open source software, protocols, systems and architectures (the LAMP architecture for example), etc.

Frederick Giasson

Machine Learning, Engineering & Data

Month: July 2006

Norvig(Google) and Berners-Lee on the Semantic Web at AAAI 06

Hack for the encoding of a URL into another URL problem with Apache and mod_rewrite

People Aggregator, Talk Digger, the SIOC ontology and the vision of a Semantic, Interactional, Web

Creating communities around Web conversations: Talk Digger, a Demo.

Web2.0 concepts are as old as computer, however what make them different?