While configuring my new dedicated server to support the new generation of Talk Digger, I encountered a really strange bug that emerged with the interaction of urlencode(), Apache and mod_rewrite.

It took me about a working day to figure out what was the bug, where it could come from, searching information to know if I am the only one on earth to have it, fixing it, etc.

I found out that I was not the only one to have that bug, but I never found any reliable source of information to fix it. Because I am using Open Source softwares, I think it is my duty to post the fix somewhere on the Web and this “somewhere” is on my blog. Normally I do not post such technical articles, but considering that it is an interesting bug, that many people expect it and that there is no central source of information that explain how to fix it from A to Z, so I decided to take a couple of minutes to write that article.

 

What is the context?

I have to encode a URL into another URL.

For example, I would like to encode that url:

www.test.com/test.com?test=1&test2=2

Into that other url:

www.foo.com/directory/www.test.com/test.com?test=1&test2=2

To do that, I have to encode the first url; the result would be:

www.foo.com/directory/www.test.com%2Ftest.com?test=1&test2=2

 

What is the bug?

The problem we have is that when you try to apply RewriteRule(s) to these URL using Apache (1.3) and the mod_rewrite module, mod_rewrite will not be able match any of its rules with that url.

By example, if I have a rule like:

RewriteRule ^directory/(.*)/?$ directory/redirect.php?url=$1 [L]

mod_rewrite will not be able match the rule with the URL even if it matches. The problem, as cited above, is the encoding process of URLs between Apache and mod-rewrite.

 

The explanation

The problem seems to be that the url passed to mod_rewrite seem prematurely unencoded. With a single encoding (urlencode( ) in PHP) of a URL, the RewriteRule(s) will not be matched if the “%2F” character is in the URL, or if it is (no %2F character in the url) then the substitution will not be necessarily completed.

After having identified the problem I found the bug entry of the problem: ASF Bugzilla Bug 34602

It is the best source I found, but it was not complete to resolve the problem I had.

 

The simplest hack, but the ugliest!

The simplest fix is to double encode the url you want to include in your other url. (by example, in php I would encode my url with: urlencode(urlencode(“www.test.com/test.com?test=1&test2=2” )); ). That way, everything will work fine with mod_rewrite and it will match the rule.

The problem with that easy fix is that it adds a lot of ugly characters in your URL. Personally I find that unacceptable, especially when we know that mod_rewrite is there to create beautiful URL!

 

The second hack

The second fix is to re-encode the url directly in the mod_rewrite module. We will re-encode all the url at the exception of the “%2F” character (because it is a glitch (bug?) not related with mod_rewrite but probably Apache itself). What you have to do is to create you own urlencode( ) method to encode all characters except “/”. That way everything will works as normally, except that the “/” character will not be encoded.

 

Security related to that hack

I don’t think this fix add a security hole if we think about code injection in URL or other possible hole. I’ll have to further analyze that point to make sure of that.

 

Future work

In the future it would be great to find where in Apache the “/” (%2F) character is prematurely decoded, or where we could encode it just before it is passed to mod_rewrite.

 

THE HACK

Okay, there is how to install that hack on your web server.

I only tested it on Apache 1.3.36 and mod_rewrite. I have no idea if the same problem occurs with Apache 2.

 

Step #1

The first step is to create your own urlencode( ) function that will encode a url without encoding the “/” character. A simple PHP function that would do the job could be (it is really not efficient, but it will do the job for now):

function url_encode($url)
{
     return str_replace(“%2F”, “/”, urlencode($url));
}

 

Step #2

The second step is to change the code in mod_rewrite.c to re-encode the url.

You have to replace the mod_rewrite.c file into Apache’s source code at [/apache_1.3.36/src/modules/standard/] by this one:

The hacked mod_rewrite.c file

 

Step #3

Then you have to recompile/re-install your Apache web server.

 

Finished

Everything should now work fine. In your server-side scripts (PHP for example), you will have to encode your url with the new url_encode( ) function. Then everything will work just fine with mod_rewrite and it will matches the rules as expected.

 

The last word

I hope that this little tutorial will help you if you have the same problem as I had. Please point me any error/upgrade/code-enhancement in the comment section of that post, it will be really appreciated!

Technorati: | | | | | | | | | | | |

11 thoughts on “Hack for the encoding of a URL into another URL problem with Apache and mod_rewrite

  1. ugh same problem here. finally found your post, the bug report. can’t compile apache in this case 🙁

  2. Hi Mark,

    Okay, and what is your version of Apache and what is the error?

    Fred

  3. Thanks for the bugfix, I had the same problem here.
    (Apache 2.2.3-3.2 (debian))

    Greets Christian

  4. If you have Apache >= 2.0.46, you can use the core directive “AllowEncodedSlashes On” to allow %2F in urls

  5. The “/” issue is separate from the other generic mod_rewrite encoding issue – any URL with “/” in it before the query string in Apache 1.3 instantly returns a 404 before it gets to mod_rewrite, as an apparent security feature: Internal API documentation. This became a configuration option, AllowEncodedSlashes, in Apache 2.0.46, as FlorentG points out. Here’s my own thoughts and solutions on the more generic issue.

  6. Had the same issue, and double urlencoding worked great (everything is hidden behind mod_rewrite so they don’t see it anyway). Thanks.

  7. Thanks for your post. I used the “AllowEncodedSlashes On” described by FlorentG on my Apache 2.2/Win and it works!

  8. I think the / “bug” you are referring to can be explained by the Apache AllowEncodedSlashes directive.

  9. I had almost the same problem.

    My URL /myurl/searchText?accès

    was transformed to

    /myurl/searchText?accÍ>Cs

    The problem came from a rewrite rule. I added the NE flag on this rule and it fixed it !

  10. Niksa Jakovljevic

    January 13, 2011 — 5:47 am

    I had similar issue with Apache 2 and Tomcat that sits behind.
    First Apache didn’t allow / in URLs (give 404 error), after that I added AllowEncodedSlashes directive.
    Then found problem with AllowEncodedSlashes because it doesn’t behave as described in Apache doc ( Apache does perform decoding automatically!), so my Tomcat receives ‘/’ instead of ‘/’ …. and after wasted day I did double encoding and that was the only solution.

    Regards,

    Niksa Jakovljevic

Leave a Reply

Your email address will not be published. Required fields are marked *