tricks

TLDR: hvm.pw now works with large URLs (or files converted to base64), see this page as a demonstration of a 1.6MB file being hosted as an URL.

In the previous post I said I’m allowing the URL shortener trick on hvm.pw and that I’m trying to lift the limit on the size of the URL such that large images (or files in general) would work.

While trying to do that I found a limitation in mod_proxy. After I raised all the limits I could find in Apache and cherrypy my URLs were still getting truncated. Through some debugging I found that Apache was passing the clients’ requests correctly and that cherrypy was properly handling them – the URLs were written to the database (postgresql) and sent back in the HTTP redirect without any truncations.

The only thing left to check was the return trip of the redirect through Apache, mod_proxy, and mod_proxy_http. I was using mod_proxy and mod_proxy_http through mod_rewrite as a method to pass the requests from Apache to cherrypy (see this).

Also, to make the “image hosting” trick work I had to use a http redirect (wiki) – this redirects the client when the connection is initiated. A http equiv redirect(wiki) would have worked for a link accessed directly by the user (i.e. by navigating to it one way or another) but fails if the link is meant to be embedded in a HTML page.

The HTTP redirect response looks like this:

HTTP/1.1 301 Moved Permanently
Location: http://www.example.org/
Content-Type: text/html
Content-Length: 174
 
<html>
<head>
<title>Moved</title>
</head>
<body>
<h1>Moved</h1>
<p>This page has moved to <a href="http://www.example.org/">http://www.example.org/</a>.</p>
</body>
</html>

The problem here is the “Location:” header line. After digging in the code of the Apache modules I found that the response header is read from the inner webserver (in my case, cherrypy) line by line and the lines are put into a buffer of a fixed size. That size is 20k characters. Now that it is a pretty sane limit (hence I’m not going to call this a bug) since no normal response header should have a line of that length. Still, this means I have to find another way of making Apache and cherrypy communicate.

The solution is using WSGI (instructions here and here) which is much more robust (and arguably the proper setup as opposed to mod_rewrite being more of a hack).

I have tested the server with large files and it seems to work fine (although somewhat slow). An example is in this page (firefox only it seems). The .gif is ~1.6MB in size which grows to ~2.2MB when converted to base64. The top image is directly embedded in the .html page and the bottom one is loaded from the URL shortener.

PS: as suggested by the title, there will be a part 2 on improving hvm.pw. I’m going to add some unique functionality (as far as I know) aimed at hosting files the proper way (not through a base64 encoded hack).

I will start this blog right off (I don’t like introductions) with a trick I found while reading thedailywtf.com.

The problem: you are posting something on the internet and you want to link or embed an image but the site you are posting to doesn’t host the images.

Obvious solution: use an image host (imgur, flickr, whatever floats your boat).

The trick (but not really a solution): encode the image into base64, give it to an URL shortener site like TinyURL and use the result in your post.

How it works:

<html>
<body>
<img src="...RXGEXSIgEsVs7wfwtXKkkseb90q04mRuO0eIkkkmak2Ps//9k=" /> <br />
<img src="http://tinyurl.com/nzpmgsm" /> <br />
</body>
</html>

There is a standard feature of the HTML format which allows for embedding images (or any file for that matter) directly into the src field as you can see in the first <img>.
The second <img> has the URL which TinyURL gave me for the shortened data above. The results can be seen in the demo page.

As far as I know this trick works on all browsers (if they’re up to date) but most shorteners employ a limit to the length of the URL you can shorten (duh). I actually got banned briefly by TinyURL while attempting to test said limit. The result is you can probably get away with a 10k URL which means about 7.5k for the image data (base64 adds 1/3 overhead).

Disclaimer: I’m not advocating the use of this trick since it puts unnecessary work on URL shorteners which usually provide the service free of charge.
Disclaimer 2: I did not come up with this trick, someone used it while posting a story on thedailywtf.com. I couldn’t find the story because it was unrelated to the trick and I don’t remember other details.