In the previous post I said I’m allowing the URL shortener trick on hvm.pw and that I’m trying to lift the limit on the size of the URL such that large images (or files in general) would work.
While trying to do that I found a limitation in mod_proxy. After I raised all the limits I could find in Apache and cherrypy my URLs were still getting truncated. Through some debugging I found that Apache was passing the clients’ requests correctly and that cherrypy was properly handling them – the URLs were written to the database (postgresql) and sent back in the HTTP redirect without any truncations.
The only thing left to check was the return trip of the redirect through Apache, mod_proxy, and mod_proxy_http. I was using mod_proxy and mod_proxy_http through mod_rewrite as a method to pass the requests from Apache to cherrypy (see this).
Also, to make the “image hosting” trick work I had to use a http redirect (wiki) – this redirects the client when the connection is initiated. A http equiv redirect(wiki) would have worked for a link accessed directly by the user (i.e. by navigating to it one way or another) but fails if the link is meant to be embedded in a HTML page.
The HTTP redirect response looks like this:
HTTP/1.1 301 Moved Permanently Location: http://www.example.org/ Content-Type: text/html Content-Length: 174 <html> <head> <title>Moved</title> </head> <body> <h1>Moved</h1> <p>This page has moved to <a href="http://www.example.org/">http://www.example.org/</a>.</p> </body> </html>
The problem here is the “Location:” header line. After digging in the code of the Apache modules I found that the response header is read from the inner webserver (in my case, cherrypy) line by line and the lines are put into a buffer of a fixed size. That size is 20k characters. Now that it is a pretty sane limit (hence I’m not going to call this a bug) since no normal response header should have a line of that length. Still, this means I have to find another way of making Apache and cherrypy communicate.
I have tested the server with large files and it seems to work fine (although somewhat slow). An example is in this page (firefox only it seems). The .gif is ~1.6MB in size which grows to ~2.2MB when converted to base64. The top image is directly embedded in the .html page and the bottom one is loaded from the URL shortener.
PS: as suggested by the title, there will be a part 2 on improving hvm.pw. I’m going to add some unique functionality (as far as I know) aimed at hosting files the proper way (not through a base64 encoded hack).