In my previous post I briefly mentioned a rather creative use for URL shorteners: to store the contents of arbitrary files (a.k.a. “parasitic storage”). There was previous work on this area, namely a tool called TinyDisk that could implement a rudimentary filesystem on top of the TinyURL service. The bad news is, the site that hosted the tool appears to be down and I couldn’t get to play with it.
After a deep, thorough investigation (read: 30 seconds of googling) I came to the conclusion that no source code sample was available to do this. Naturally, the solution was to roll my own! It’s a crude proof-of-concept but it works. My first idea was to break up the file to be uploaded into small chunks that could fit the maximum length allowed in an URL, then ask TinyURL to shorten each of them, and keep the shortened URLs to be able to download the file later.
I started to test the tolerance of the TinyURL API and found three interesting things. One, that I wouldn’t even need to disguise the data as valid URLs, because TinyURL makes no validation of any kind – all I had to do was make sure the data was encoded in a way that wouldn’t break the HTTP requests. I chose to use hex encoding, but I’m sure there are more efficient encodings that would do the job nicely. Two, I could send obscenely large URLs. Sending 256 Kb of POST data worked like a charm so I chose that for the chunk size. The next power of two (512 Kb) caused the TinyURL server to drop the connection without a reply but I haven’t determined if this was really a limitation in their server or in my Squid proxy – in any case I thought it was a good idea to leave it at that. The third thing I noticed was that after a while of sending repeated requests the TinyURL service would stop responding for a while, I guess this is a protection against spammers. Quite understandable given the fact that this API is anonymous and doesn’t require an API key like many other services. The solution is to wait a little while when this happens before making any more HTTP requests.
Now the problem is how to get the data back. A simple GET request won’t work behind proxies, because TinyURL blindly returns our data in a Location: header and predictably my Squid refused to parse it when I tried. The solution I found was the preview feature. Changing the http://tinyurl.com/somecode into http://preview.tinyurl.com/somecode returns a webpage that contains the target URL – then we can parse the HTML to get the data back. It’s actually easier this way, because urllib2 tries to follow redirections by default and that would have called for some hacks on our part. Downloading the preview page is painless, but it also introduces more overhead (a 256 Kb block produces a preview page of a approx. 1 Mb).
That’s pretty much it… when we upload a file we break it into 256K blocks and send it to the TinyURL servers, then we store each returned short URL in a text file. Since all shortened URLs begin the same way (http://tinyurl.com/) we only need to store the code after the slash. The we download each block by asking for the preview page for each URL and extracting the data from the HTML page.