A few days ago I went and registered the domain dashto.cc and created a really quick-n-dirty URL shortening site.
A URL shortening service takes any URL and "shortens" it. The website TinyURL is the most famous. It's being used everywhere around the web, from blog posts to tweets. Since the creation of TinyURL there have been numerous copy-cat sites.
What I really wanted to talk about briefly was how these services work. It's probably not too to difficult to figure out. Basically you have a database table with an ID field and a URL field. When someone requests a URL with an ID, you map it to the URL and perform the redirect.
But notice how these sites are using the base 36 number system rather then base 10 (our decimal system). This makes it possible to create very short URLs even when the ID's in the database are huge. Base 36 is most convenient because it can be encoded using plain ASCII characters 0-9 and (case insensitive) letters A-Z. Using base 36 we can represent an ID of 1000000 (1 million) as "LFLS" which is both shorter, and easier to write out then a long series of numbers.
Since it is easy to convert between base 36 and base 10 (using PHP's built-in base_convert function), we can still take advantage of the efficient indexes database systems have to offer on integers.
-
$base10 = 1000000;
-
-
$base36 = 'ceft';
(Even more efficient might be to use base 64 which makes a distinction between upper and lower-case letters, but that is less user-friendly/portable.)
Have you ever thought of using base 36 to encode your ID's? Do you think it is really any more user-friendly than decimal numbers? One might argue it's less user friendly because you introduce ambiguous characters like 1/i and 0/O. But certainly for some cases it is something to consider.
Just some food for thought!
December 21st, 2009 at 10:29 pm
Nice post, you made one mistake though: base 62 makes a distinction between upper and lower-case letters base 64 adds two more chars (not to confuse with base64_*).
Regarding the ambiguous chars you can strip down those chars and use a smaller base (like base 52).
Also, check out this related SO question: http://stackoverflow.com/questions/1938029/php-how-to-baseconvert-up-to-base-62