While configuring my new dedicated server to support the new generation of Talk Digger, I encountered a really strange bug that emerged with the interaction of urlencode(), Apache and mod_rewrite. |
It took me about a working day to figure out what was the bug, where it could come from, searching information to know if I am the only one on earth to have it, fixing it, etc.
I found out that I was not the only one to have that bug, but I never found any reliable source of information to fix it. Because I am using Open Source softwares, I think it is my duty to post the fix somewhere on the Web and this “somewhere” is on my blog. Normally I do not post such technical articles, but considering that it is an interesting bug, that many people expect it and that there is no central source of information that explain how to fix it from A to Z, so I decided to take a couple of minutes to write that article.
What is the context?
I have to encode a URL into another URL.
For example, I would like to encode that url:
www.test.com/test.com?test=1&test2=2
Into that other url:
www.foo.com/directory/www.test.com/test.com?test=1&test2=2
To do that, I have to encode the first url; the result would be:
www.foo.com/directory/www.test.com%2Ftest.com?test=1&test2=2
What is the bug?
The problem we have is that when you try to apply RewriteRule(s) to these URL using Apache (1.3) and the mod_rewrite module, mod_rewrite will not be able match any of its rules with that url.
By example, if I have a rule like:
RewriteRule ^directory/(.*)/?$ directory/redirect.php?url=$1 [L]
mod_rewrite will not be able match the rule with the URL even if it matches. The problem, as cited above, is the encoding process of URLs between Apache and mod-rewrite.
The explanation
The problem seems to be that the url passed to mod_rewrite seem prematurely unencoded. With a single encoding (urlencode( ) in PHP) of a URL, the RewriteRule(s) will not be matched if the “%2F” character is in the URL, or if it is (no %2F character in the url) then the substitution will not be necessarily completed.
After having identified the problem I found the bug entry of the problem: ASF Bugzilla Bug 34602
It is the best source I found, but it was not complete to resolve the problem I had.
The simplest hack, but the ugliest!
The simplest fix is to double encode the url you want to include in your other url. (by example, in php I would encode my url with: urlencode(urlencode(“www.test.com/test.com?test=1&test2=2” )); ). That way, everything will work fine with mod_rewrite and it will match the rule.
The problem with that easy fix is that it adds a lot of ugly characters in your URL. Personally I find that unacceptable, especially when we know that mod_rewrite is there to create beautiful URL!
The second hack
The second fix is to re-encode the url directly in the mod_rewrite module. We will re-encode all the url at the exception of the “%2F” character (because it is a glitch (bug?) not related with mod_rewrite but probably Apache itself). What you have to do is to create you own urlencode( ) method to encode all characters except “/”. That way everything will works as normally, except that the “/” character will not be encoded.
Security related to that hack
I don’t think this fix add a security hole if we think about code injection in URL or other possible hole. I’ll have to further analyze that point to make sure of that.
Future work
In the future it would be great to find where in Apache the “/” (%2F) character is prematurely decoded, or where we could encode it just before it is passed to mod_rewrite.
THE HACK
Okay, there is how to install that hack on your web server.
I only tested it on Apache 1.3.36 and mod_rewrite. I have no idea if the same problem occurs with Apache 2.
Step #1
The first step is to create your own urlencode( ) function that will encode a url without encoding the “/” character. A simple PHP function that would do the job could be (it is really not efficient, but it will do the job for now):
function url_encode($url)
{
return str_replace(“%2F”, “/”, urlencode($url));
}
Step #2
The second step is to change the code in mod_rewrite.c to re-encode the url.
You have to replace the mod_rewrite.c file into Apache’s source code at [/apache_1.3.36/src/modules/standard/] by this one:
The hacked mod_rewrite.c file
Step #3
Then you have to recompile/re-install your Apache web server.
Finished
Everything should now work fine. In your server-side scripts (PHP for example), you will have to encode your url with the new url_encode( ) function. Then everything will work just fine with mod_rewrite and it will matches the rules as expected.
The last word
I hope that this little tutorial will help you if you have the same problem as I had. Please point me any error/upgrade/code-enhancement in the comment section of that post, it will be really appreciated!
Technorati: mod_rewrite | apache | bug | hack | fix | url | encode | urlencode | php | espace | open | source |