Hack for the encoding of a URL into another URL problem with Apache and mod_rewrite

 

While configuring my new dedicated server to support the new generation of Talk Digger, I encountered a really strange bug that emerged with the interaction of urlencode(), Apache and mod_rewrite.

It took me about a working day to figure out what was the bug, where it could come from, searching information to know if I am the only one on earth to have it, fixing it, etc.

I found out that I was not the only one to have that bug, but I never found any reliable source of information to fix it. Because I am using Open Source softwares, I think it is my duty to post the fix somewhere on the Web and this “somewhere” is on my blog. Normally I do not post such technical articles, but considering that it is an interesting bug, that many people expect it and that there is no central source of information that explain how to fix it from A to Z, so I decided to take a couple of minutes to write that article.

 

What is the context?

I have to encode a URL into another URL.

For example, I would like to encode that url:

www.test.com/test.com?test=1&test2=2

Into that other url:

www.foo.com/directory/www.test.com/test.com?test=1&test2=2

To do that, I have to encode the first url; the result would be:

www.foo.com/directory/www.test.com%2Ftest.com?test=1&test2=2

 

What is the bug?

The problem we have is that when you try to apply RewriteRule(s) to these URL using Apache (1.3) and the mod_rewrite module, mod_rewrite will not be able match any of its rules with that url.

By example, if I have a rule like:

RewriteRule ^directory/(.*)/?$ directory/redirect.php?url=$1 [L]

mod_rewrite will not be able match the rule with the URL even if it matches. The problem, as cited above, is the encoding process of URLs between Apache and mod-rewrite.

 

The explanation

The problem seems to be that the url passed to mod_rewrite seem prematurely unencoded. With a single encoding (urlencode( ) in PHP) of a URL, the RewriteRule(s) will not be matched if the “%2F” character is in the URL, or if it is (no %2F character in the url) then the substitution will not be necessarily completed.

After having identified the problem I found the bug entry of the problem: ASF Bugzilla Bug 34602

It is the best source I found, but it was not complete to resolve the problem I had.

 

The simplest hack, but the ugliest!

The simplest fix is to double encode the url you want to include in your other url. (by example, in php I would encode my url with: urlencode(urlencode(“www.test.com/test.com?test=1&test2=2” )); ). That way, everything will work fine with mod_rewrite and it will match the rule.

The problem with that easy fix is that it adds a lot of ugly characters in your URL. Personally I find that unacceptable, especially when we know that mod_rewrite is there to create beautiful URL!

 

The second hack

The second fix is to re-encode the url directly in the mod_rewrite module. We will re-encode all the url at the exception of the “%2F” character (because it is a glitch (bug?) not related with mod_rewrite but probably Apache itself). What you have to do is to create you own urlencode( ) method to encode all characters except “/”. That way everything will works as normally, except that the “/” character will not be encoded.

 

Security related to that hack

I don’t think this fix add a security hole if we think about code injection in URL or other possible hole. I’ll have to further analyze that point to make sure of that.

 

Future work

In the future it would be great to find where in Apache the “/” (%2F) character is prematurely decoded, or where we could encode it just before it is passed to mod_rewrite.

 

THE HACK

Okay, there is how to install that hack on your web server.

I only tested it on Apache 1.3.36 and mod_rewrite. I have no idea if the same problem occurs with Apache 2.

 

Step #1

The first step is to create your own urlencode( ) function that will encode a url without encoding the “/” character. A simple PHP function that would do the job could be (it is really not efficient, but it will do the job for now):

function url_encode($url)
{
     return str_replace(“%2F”, “/”, urlencode($url));
}

 

Step #2

The second step is to change the code in mod_rewrite.c to re-encode the url.

You have to replace the mod_rewrite.c file into Apache’s source code at [/apache_1.3.36/src/modules/standard/] by this one:

The hacked mod_rewrite.c file

 

Step #3

Then you have to recompile/re-install your Apache web server.

 

Finished

Everything should now work fine. In your server-side scripts (PHP for example), you will have to encode your url with the new url_encode( ) function. Then everything will work just fine with mod_rewrite and it will matches the rules as expected.

 

The last word

I hope that this little tutorial will help you if you have the same problem as I had. Please point me any error/upgrade/code-enhancement in the comment section of that post, it will be really appreciated!

Technorati: | | | | | | | | | | | |

I am neither a painter nor a writer: I am a software designer

Will software design and development ever will be part of art schools? I doubt it, but why not? After all designing pieces of software is not only technique… it’s art. Like a writer, a painter or a sculptor, software designers use tools to make their ideas real: to share them with other people.

Right now I am prototyping ideas I have for enhancing Talk Digger. My creation process is much more like that of a writer than that of a software developer, an uber-geek.

Ideas pass through my mind; I muse about their utility; I prototype them; I test them; I use them; I play with them; I talk about them with other people; I get feedbacks I integrate these feedbacks into my idea; I change my ideas; I change my works; I delete some prototypes; I go with the flow of my ideas and their evolution.

This is my work.

Sure there are rigid procedures in software development. Yeah there are procedures to design, develop, debug and support software development. Yeah you have to write specifications; yeah you have to perform test cases; yeah you have to do all that … and much more.

However, what I am talking about is the first process, the creative process, the one wherein the technical knowledge does not really matter, the one that will make your software usable; the one that will be so intuitive to use that nobody will think about it but will only think about what they have to do; the one that will help users save time when performing their work. No one needs technical knowledge to develop such software. What he needs is creativity: he needs to be an artist.

It is what I am doing right now. It is the reason why I am not writing that much. I am in a sort of creative mood where I prototype my ideas to make Talk Digger bigger, better and more useful. I try to do these things to achieve Talk Digger’s goal: (1) gathering the best information, (2) archiving it in the best way, (3) displaying it in a most meaningful way, (4) thinking about the best ways users could interact together, (5) having fun doing it.

What is Talk Digger’s goal: finding, tracking and entering conversations evolving on the Internet.

When I am prototyping some of my ideas, most of the time the “final” prototype, the one that has some real potential, is not what I was thinking about when I started developing it. Most of the time it is something that evolved during my development. It evolved with new ideas I had while writing it, using it, contemplating it. This step was crucial, otherwise I would never had these ideas that make a first good idea, a final really good one (it is what I think and what I hope, however it is up to the users to confirm or not whether my final really good idea was in fact that good).

It is certain that when I will have finished these prototypes, when I will start developing the real system, that I will use all the techniques I know to make that application scalable: it is more than essential. However, I think that some developers, and many of their bosses, forget the importance of the creativity process in software development: how the first steps are so important to create an application that will reach the tipping point with users.

I hope that computer sciences degrees add an “art and creativity” course into their courses corpus.

Technorati: | | | | | | | | | | |

Review of Foundations of Ajax

Ajax interactive interface techniques are a spreading everywhere since the arrival of web applications like Google Suggest, Google Maps, GMail and many other high profiled web applications. It is even truer considering that Ajax techniques are intimately bounded with the most popular Web 2.0 applications.

Despite the fact that the term is used everywhere and that more and more web developers are using its techniques, Ajax remain mysterious and his techniques misused and sometime obscure. Dedicated web sites to Ajax’s development are opening and people start to standardize its development techniques and deployment.

If you start checking how to develop Ajax applications, and think that Ajax is a Greco legendary warrior, you will probably be lost in all the paths you searches will lead you. It is not an easy task, at first, to know what Ajax is and is not. Many web technologies are implied in Ajax application development like: JavaScript, XML, server-side programming languages, many different browser technologies, etc. Trying to make some order in all this information is a daunting task.

Mrs. Parker of Apress contacted me last week to ask me if I would like to review one of their new books: Foundations of Ajax. Considering what I said before on current state of Ajax development techniques’ documentation, I naturally said yes. I would had like to read that book before starting to develop Talk Digger, the Ajax interface’s development pain would had been greatly diminished. My goal by reviewing this book is: helping you to find the right tools and techniques to start on the right foot when developing web interfaces using Ajax techniques.

To whom the book is intended?

The book is intended to intermediate and expert web developers. The premise of the authors is that you already developed and implemented web sites and applications. They will not spend time to teach you what a server-side programming language is or how to create a function in JavaScript; they will teach you what are the best Ajax techniques and the available tools.

Where Ajax came from?

The first chapter is dedicated to the history of Ajax: where it came from and what motivated the development of technologies supporting it. The history of Ajax is the history of the evolution of web applications: CGI, Applets, Java, Netscape, the browsers war, servlets, server-side programming languages, Flash, DOM, DHTML, XML, etc.

How Ajax emerged from this evolution line? How should I use it? What do I need to know prior starting to develop Ajax applications? What do I need take into consideration when starting the design of my next Ajax application? All these questions will be clearly answered by the authors.

I want asynchronism

The most important feature of Ajax techniques is the possibility given to a web browser to asynchronously interact with a web server using an object called XMLHttpRequest. What it is all about? Which possibilities this object gives you? How could you use it to enhance the usability of you web site?

Then, what do I do with that information?

Once you mastered the asynchronous interaction between your application and a web server, you have to master the techniques to dynamically change the content of a web page. You have to take the received information and display it to the user by changing the DOM document of the web page. The authors will explain how a DOM document works and how to modify them to make a web page dynamics.

Many basic Ajax techniques used on popular websites are then explained: how to validate code, how to create an auto-refreshing web page, how to display a progress bar, how to create tooltips, how to automatically update a webpage, how to access a web-service, and how to create an auto-completion combo-box.

At that point you have all the knowledge necessary to create Ajax applications. The only other thing will you need is imagination and creativity to make all these things interacting together in such a way that people will say: ‘Wow!’ when they will use your interface.

I learned how to develop Ajax applications – now what are the tools that can help me to develop web applications with Ajax techniques?

Any developer wants a development environment with the best development tools. They want these tools to help them to quickly write better code. This is the exact purpose of the second part of the book: creating a toolbox for Ajax developers. The authors will describe the best available tools, what they are used for, and what is the best way to use them to develop Ajax applications.

You will find tools to write documentation on your JavaScript scripts, to validate the content of your HTML files, to inspect your DOM documents, and to compress and obfuscate your JavaScript code.

A whole chapter is dedicated on JSUnit: a unit-testing program for JavaScript. They explain how to install and use it. You think that you are losing your time implementing unit tests into your applications? Well, it depends on the length and complexity of the project you are working on. However, more often then not, you will save a lot of time on the long run. It helps you to validate that the new changes you do in one script do not affect any other functionalities of your web application before putting the modification online. Do not forget, we are talking about a web application: as soon as you save it on the web server the modifications are took into account by all users.

Finally, what would be a developer’s toolbox without debugging capabilities? Nothing. JavaScript debuggers for both FireFox and Internet Explorer are described at length.

Conclusion

What I liked while reading this book is that the authors tell you the truth about the current state of Ajax applications’ developing. Technologies that support Ajax are old, but their interactions are young. I expected, and always expect, various programming errors and problems on many web browsers while developing Talk Digger or Lektora. Many times you have to use some programming tricks to make it works the same way on all the available browsers on the market. I can’t hide it to you: it is sometime a real pain. The authors are aware of that situation and tell it to you without reserve.

This is a definitive book for web developers having to spring into Ajax’s application development world. It will show you the most effective programming techniques, tips, and tricks to create interactive web pages. It will explain you how to use the best free available tools to create your Ajax developer’s toolbox. It is an interesting reading striped of any fluff with an incalculable number of illustrations and examples.

Ajax and the Semantic Web

Ajax and the Semantic Web: currently two buzz terms; one that describe a new way to create interactive web interface; the other that describe documents in such a way that computers could “understand” their semantic meaning.

Tim Berners-Lee wrote something interesting: RDF-AJAX: 7 letters that open a window on a new world. We have two layers: one that shows things (Ajax), and the other that describe, by their semantic meaning, things (Semantic Web document).



You have to see the interactions of these two layers as the man-machine interactions. The Ajax layer will read a Semantic Web Document (RDF, for example) and make it human readable. The document will be computer readable by other software agents.

Big deal, you are thinking? Think about it. Right now, databases information is serialized in HTML files to help human to read and understand its information. Good, however, what happen if I wish to create a software agent to help me to automate some processes? There is the big deal. What I want is to serialize the databases information in Semantic Web formats, like RDF, instead of HTML. That way, the information help in these databases will be computer readable and understandable. Then, the problem is that I will not be, anymore, able to read and understand these big chunks of RDF documents.

There is the utility of the Ajax layer: to make RDF, or any other Semantic Web format, documents human readable. We could use an Ajax library that would understand RDF documents, and display their content in a browser. That way a single web page could be both processed by computers and humans. The Web would not be composed of HTML documents anymore, but Semantic Web formats ones.

There is another view of the future Web.

Technorati: | | | | | | |

Scratch notes on what is a Project and Project Management

What is a Project?

All projects are the result of a certain sort of cogitation. They normally born from fuzzy ideas and their realization are uncertain and faraway.

Some will say that they are building or developing a project. Other will say that they are leading a project. As you can see, a project is in reality a state. This is the state where the project managers are when they try to reach their objectives, with their ad hoc means in given deadlines.



A project could then be seen as a working team that tries to be in such a state.

What Is Project Management?

Project management is the activity that has as goal to end a project by reaching the objectives, with certain means in given deadlines. Managing a project also mean that they will need to take into account the constraints applied to the project and unforeseen events that could happen, and that probably will, during the realization of that project.

  • Time management. The time management will help the project manager to be sure that he will be able to end a project, in the deadlines. Managing the time means defining a path to follow to end the project, and to timestamp it. The addition of these timestamps will form a time envelop. It is that envelop that the managers will have to manage. The main tool used to manage it is the old fashioned calendar. Then the manager will know if he will be able to reach the objectives of the project, in the specified deadlines.
  • Resources management. At the beginning, the project will benefit of a given budget. This budget represents the means of the project. With that budget, managers will be able to buy the resources that they will need to reach the objectives of their project in the specified deadlines. Managing the resources could means buying material they will need to end the project. It could also means building effective working teams, or knowing their employees to know what they are really good at and affecting them for the right tasks. All things that belong to a project are resources and are there to be managed.
  • Production management. The production management is nothing else than the path managers will follow to reach their objectives.



You need to have in mind that objectives, deadlines and means are inter-linked: they are the mind, the heart and the guts of Projects.

These definitions are applicable to all type of projects. However, if you put them into context, you could see the things differently.

By example, in a software development project, the objectives will only be totally defined at the end of the project. It could seem strange, but if you think about it, s software is not tangible. It is an aggregation of functions that will act together to create a certain comportment. Software developers will define what are the functionalities of their software (potentially with your client), they will model it, create prototypes, and test available technologies. Then they will eventually develop it, test it, tweak it, debug it, and finally release it. But they will not know how the final release version will really works before finishing it, impossible.

Technorati: | | | | | | | | | |