TLDR: hvm.pw now works with large URLs (or files converted to base64), see this page as a demonstration of a 1.6MB file being hosted as an URL.

In the previous post I said I’m allowing the URL shortener trick on hvm.pw and that I’m trying to lift the limit on the size of the URL such that large images (or files in general) would work.

While trying to do that I found a limitation in mod_proxy. After I raised all the limits I could find in Apache and cherrypy my URLs were still getting truncated. Through some debugging I found that Apache was passing the clients’ requests correctly and that cherrypy was properly handling them – the URLs were written to the database (postgresql) and sent back in the HTTP redirect without any truncations.

The only thing left to check was the return trip of the redirect through Apache, mod_proxy, and mod_proxy_http. I was using mod_proxy and mod_proxy_http through mod_rewrite as a method to pass the requests from Apache to cherrypy (see this).

Also, to make the “image hosting” trick work I had to use a http redirect (wiki) – this redirects the client when the connection is initiated. A http equiv redirect(wiki) would have worked for a link accessed directly by the user (i.e. by navigating to it one way or another) but fails if the link is meant to be embedded in a HTML page.

The HTTP redirect response looks like this:

HTTP/1.1 301 Moved Permanently
Location: http://www.example.org/
Content-Type: text/html
Content-Length: 174
 
<html>
<head>
<title>Moved</title>
</head>
<body>
<h1>Moved</h1>
<p>This page has moved to <a href="http://www.example.org/">http://www.example.org/</a>.</p>
</body>
</html>

The problem here is the “Location:” header line. After digging in the code of the Apache modules I found that the response header is read from the inner webserver (in my case, cherrypy) line by line and the lines are put into a buffer of a fixed size. That size is 20k characters. Now that it is a pretty sane limit (hence I’m not going to call this a bug) since no normal response header should have a line of that length. Still, this means I have to find another way of making Apache and cherrypy communicate.

The solution is using WSGI (instructions here and here) which is much more robust (and arguably the proper setup as opposed to mod_rewrite being more of a hack).

I have tested the server with large files and it seems to work fine (although somewhat slow). An example is in this page (firefox only it seems). The .gif is ~1.6MB in size which grows to ~2.2MB when converted to base64. The top image is directly embedded in the .html page and the bottom one is loaded from the URL shortener.

PS: as suggested by the title, there will be a part 2 on improving hvm.pw. I’m going to add some unique functionality (as far as I know) aimed at hosting files the proper way (not through a base64 encoded hack).

This is not a planned continuation of the previous post. I thought about making this shortening service a few times before but I admit the catalyst was the previous post.

TLDR: this is it hvm.pw.
TLDR2: yes, the trick in the previous post is allowed but there is a limit on the size of the URL (~8kB). I’m working on raising that limit just for the heck of it.

What it does

My first plan was to create an URL shortening service which yields pronounceable URLs. That way one could tell the URL verbally to someone else. It can also help if you’re trying to copy an URL from the computer to a phone or something with cumbersome input. It’s easier to remember and write something that rolls off the tongue (klondike vs yxrkt).

The project started with just that, the shortening of URLs and producing somewhat readable URLs. During development I started to get frustrated with the number of steps I needed to take to test my service and the UI I created. I then started to improve the UI a bit.

The first step was selecting the URL when clicking on it but shortly after I changed it to selecting it on hover. Then one night it hit me that the slowest and most annoying part is selecting the text box and pasting text in there. I realised that I can (and should) streamline the process as much as possible.

Now, the user interaction looks like this:

  • user enters site
  • user pastes (using CTRL+V) the URL
  • the page automatically requests the short URL (no action required from user)
  • the resulting short URL is preselected, waiting for user to copy (using CTRL+C)
  • user copies the short URL and goes on his merry way
  • (the URL still gets selected on hover if the user accidentaly clicks somewhere)

This is the least complicated flow one can have in a webpage because browsers don’t allow access to the clipboard (the location where data is stored when one copies something). There are ways to get around it using flash or java but those are also limited – they require some user action (a click) making the workaround useless.

About the implementation

I have apache running for managing virtual hosts and serving static files. The dynamic content is created in python with cherrypy. The database is postgresql (using psycopg2 in python). Since the interface is very light and simple I opted to write the html code directly in the python methods and not use any templating engine.

There are basic limitations on the number of accesses and URLs created in a short period of time.

The “image hosting” trick

Yes, I’m allowing the use of the trick described in the previous post but I ran into issues with raising the limit on the size of the URL. I’ve narrowed the problem down to mod_proxy, an apache module that I use to communicate with my cherrypy server. The solution is using mod_wsgi, a custom python interface for apache. Once I set that up, I will write another post or update this one.

It’s worth mentioning that the URLs are actually properly received and saved by the server (up to 1MB in size), it’s just that when the user opens the short URL the large URL gets clamped to about 8kB. This is the size of the buffer mod_proxy uses when parsing the response of the inner server.

In conclusion

If you have questions or suggestions on how I can improve it, please leave them in the comments below. If you want to stress test and/or test the security you are also welcome to do so :D.

I’d especially like suggestions on colors, I suck at choosing colors apparently.

The code is open source, you can find it here: https://github.com/hvm2hvm/urldable.

I will start this blog right off (I don’t like introductions) with a trick I found while reading thedailywtf.com.

The problem: you are posting something on the internet and you want to link or embed an image but the site you are posting to doesn’t host the images.

Obvious solution: use an image host (imgur, flickr, whatever floats your boat).

The trick (but not really a solution): encode the image into base64, give it to an URL shortener site like TinyURL and use the result in your post.

How it works:

<html>
<body>
<img src="...RXGEXSIgEsVs7wfwtXKkkseb90q04mRuO0eIkkkmak2Ps//9k=" /> <br />
<img src="http://tinyurl.com/nzpmgsm" /> <br />
</body>
</html>

There is a standard feature of the HTML format which allows for embedding images (or any file for that matter) directly into the src field as you can see in the first <img>.
The second <img> has the URL which TinyURL gave me for the shortened data above. The results can be seen in the demo page.

As far as I know this trick works on all browsers (if they’re up to date) but most shorteners employ a limit to the length of the URL you can shorten (duh). I actually got banned briefly by TinyURL while attempting to test said limit. The result is you can probably get away with a 10k URL which means about 7.5k for the image data (base64 adds 1/3 overhead).

Disclaimer: I’m not advocating the use of this trick since it puts unnecessary work on URL shorteners which usually provide the service free of charge.
Disclaimer 2: I did not come up with this trick, someone used it while posting a story on thedailywtf.com. I couldn’t find the story because it was unrelated to the trick and I don’t remember other details.