Encoding Numbers as Base 36
A few days ago I went and registered the domain dashto.cc and created a really quick-n-dirty URL shortening site.
A URL shortening service takes any URL and "shortens" it. The website TinyURL is the most famous. It's being used everywhere around the web, from blog posts to tweets. Since the creation of TinyURL there have been numerous copy-cat sites.
What I really wanted to talk about briefly was how these services work. It's probably not too to difficult to figure out. Basically you have a database table with an ID field and a URL field. When someone requests a URL with an ID, you map it to the URL and perform the redirect.
But notice how these sites are using the base 36 number system rather then base 10 (our decimal system). This makes it possible to create very short URLs even when the ID's in the database are huge. Base 36 is most convenient because it can be encoded using plain ASCII characters 0-9 and (case insensitive) letters A-Z. Using base 36 we can represent an ID of 1000000 (1 million) as "LFLS" which is both shorter, and easier to write out then a long series of numbers.
Since it is easy to convert between base 36 and base 10 (using PHP's built-in base_convert function), we can still take advantage of the efficient indexes database systems have to offer on integers.
-
$base10 = 1000000;
-
-
$base36 = 'ceft';
(Even more efficient might be to use base 64 which makes a distinction between upper and lower-case letters, but that is less user-friendly/portable.)
Have you ever thought of using base 36 to encode your ID's? Do you think it is really any more user-friendly than decimal numbers? One might argue it's less user friendly because you introduce ambiguous characters like 1/i and 0/O. But certainly for some cases it is something to consider.
Just some food for thought!
Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically each day to your feed reader.

No comments yet.
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>