<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Devlog &#187; General</title>
	<atom:link href="http://devlog.info/cat/general/feed/" rel="self" type="application/rss+xml" />
	<link>http://devlog.info</link>
	<description>One developers blog.</description>
	<lastBuildDate>Tue, 31 Aug 2010 18:45:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Encoding Numbers as Base 36</title>
		<link>http://devlog.info/2008/05/22/encoding-numbers-as-base-36/</link>
		<comments>http://devlog.info/2008/05/22/encoding-numbers-as-base-36/#comments</comments>
		<pubDate>Thu, 22 May 2008 22:19:59 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[base 10]]></category>
		<category><![CDATA[base 36]]></category>
		<category><![CDATA[decimal]]></category>

		<guid isPermaLink="false">http://devlog.info/?p=34</guid>
		<description><![CDATA[A few days ago I went and registered the domain dashto.cc and created a really quick-n-dirty URL shortening site.
A URL shortening service takes any URL and "shortens" it. The website TinyURL is the most famous. It's being used everywhere around the web, from blog posts to tweets. Since the creation of TinyURL there have been [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago I went and registered the domain <a href="http://dashto.cc/">dashto.cc</a> and created a really quick-n-dirty URL shortening site.<span id="more-34"></span></p>
<p>A URL shortening service takes any URL and "shortens" it. The website <a href="http://tinyurl.com/">TinyURL</a> is the most famous. It's being used everywhere around the web, from blog posts to <a href="http://twitter.com/">tweets</a>. Since the creation of TinyURL there have been numerous copy-cat sites.</p>
<p>What I really wanted to talk about briefly was how these services work. It's probably not too to difficult to figure out. Basically you have a database table with an ID field and a URL field. When someone requests a URL with an ID, you map it to the URL and perform the redirect.</p>
<p>But notice how these sites are using the <em>base 36</em> number system rather then base 10 (our decimal system). This makes it possible to create very short URLs even when the ID's in the database are huge. Base 36 is most convenient because it can be encoded using plain ASCII characters 0-9 and (case insensitive) letters A-Z. Using base 36 we can represent an ID of 1000000 (1 million) as "LFLS" which is both shorter, and easier to write out then a long series of numbers.</p>
<p>Since it is easy to convert between base 36 and base 10 (using PHP's built-in <a href="http://php.net/base_convert">base_convert</a> function), we can still take advantage of the efficient indexes database systems have to offer on integers.</p>
<pre id="raw-php-2" style="display:none; width: 1px; height: 1px; overflow: hidden;">$base10 = 1000000;
echo &quot;$base10 in base 36: &quot; . base_convert($base10, 10, 36); // lfls

$base36 = 'ceft';
echo &quot;$base36 in base 10: &quot; . base_convert($base36, 36, 10); // 578585</pre>
<div class="igBar">
<div class="wrap"><span id="lphp-2" style="float:right"><a href="#" onclick="javascript:showCodeTxt('php-2'); return false;">Plain Text</a></span><span class="langName">PHP:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="php-2">
<div class="php">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$base10</span> = <span style="color:#CC66CC;color:#800000;">1000000</span>;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><a href="http://www.php.net/echo"><span style="color:#000066;">echo</span></a> <span style="color:#FF0000;">"$base10 in base 36: "</span> . <a href="http://www.php.net/base_convert"><span style="color:#000066;">base_convert</span></a><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#0000FF;">$base10</span>, <span style="color:#CC66CC;color:#800000;">10</span>, <span style="color:#CC66CC;color:#800000;">36</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// lfls</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$base36</span> = <span style="color:#FF0000;">'ceft'</span>;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><a href="http://www.php.net/echo"><span style="color:#000066;">echo</span></a> <span style="color:#FF0000;">"$base36 in base 10: "</span> . <a href="http://www.php.net/base_convert"><span style="color:#000066;">base_convert</span></a><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#0000FF;">$base36</span>, <span style="color:#CC66CC;color:#800000;">36</span>, <span style="color:#CC66CC;color:#800000;">10</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// 578585 </span></div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p></p>
<p>(Even more efficient might be to use base 64 which makes a distinction between upper and lower-case letters, but that is less user-friendly/portable.)</p>
<p>Have you ever thought of using base 36 to encode your ID's? Do you think it is really any more user-friendly than decimal numbers? One might argue it's <em>less</em> user friendly because you introduce ambiguous characters like 1/i and 0/O. But certainly for some cases it is something to consider.</p>
<p>Just some food for thought!</p>
]]></content:encoded>
			<wfw:commentRss>http://devlog.info/2008/05/22/encoding-numbers-as-base-36/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Basic Regular Expressions</title>
		<link>http://devlog.info/2007/12/27/regular-expressions/</link>
		<comments>http://devlog.info/2007/12/27/regular-expressions/#comments</comments>
		<pubDate>Thu, 27 Dec 2007 18:00:00 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[regular expressions]]></category>

		<guid isPermaLink="false">http://devlog.info/2007/12/27/regular-expressions/</guid>
		<description><![CDATA[Today I want to talk about regular expressions (usually referred to as regex or regexp). No matter what application you are creating, chances are you will need to parse text in some way. It might be for validating user input or for extracting information from a string of data in some arbitrary format. I have [...]]]></description>
			<content:encoded><![CDATA[<p>Today I want to talk about regular expressions (usually referred to as <em>regex</em> or <em>regexp</em>). No matter what application you are creating, chances are you will need to parse text in some way. It might be for validating user input or for extracting information from a string of data in some arbitrary format. I have yet to work on any project where regex was not required.<span id="more-25"></span></p>
<h2>About Regular Expressions</h2>
<p>Regex is a powerful language used to process text. It allows you to define a <em>pattern</em> that a regex engine uses to examine a string of data. The engine applies your pattern to the supplied string and matches the text that was specified in the pattern. What you do with regular expressions depends on the circumstance:</p>
<ul>
<li>Matching / Counting: Check if a string matches a pattern. For example, check if the user inputted a correctly formatted email address.</li>
<li>Replacement: Replacing parts of a string with another. For example, parsing BB-Code into HTML.</li>
<li>Extraction: Extracting parts of a string. For example, you might want to extract all of the href's in an HTML document.</li>
</ul>
<p>In this post I am only going over the regex language briefly. See the end for some links to further reading. This article is simply a precursor for another post I wanted to write on using regex with PHP.</p>
<h2>The Basics</h2>
<p>A regular expression <em>pattern</em> is made up of several simple parts:</p>
<ul>
<li>Characters: The actual characters you want to match. You can insert literal strings like "chris", or define a list of characters ("only a, e, i, o and u"), or use the wildcard meta-character to match anything. There are also sets of character types you can use like "any digit" or "any whitespace character".</li>
<li>Alteration: Used to define a set of alternatives like "chris or christopher or christoph".</li>
<li>Quantification: Used to explain how many times a character or characters should appear. For example, "only a, e, i, o and u once".
<li>Grouping: Group a part of a pattern into larger chunks, to define scope, and to specify quantity of a larger chunk.</li>
<li>Assertions: An expression that is applied to the left or right (that is, before or after) the current matching position. This makes it possible to do patterns like "chris not followed by 'topher'".</li>
<li>Anchoring: Anchoring a pattern to the start of end of a string lets you define the context; where you want the pattern to match. For example, "match 'chris' at the beginning of the string".</li>
</ul>
<h3>Characters</h3>
<p>There are several ways you can define characters that you want to match.</p>
<p>The first way of course is a literal string:
<pre id="raw-code-21" style="display:none; width: 1px; height: 1px; overflow: hidden;">chris</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-21" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-21'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-21">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">chris </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
This would match the string "chris", "christopher", "thechris" etcetera.</p>
<p>The second way is to define a character class. A character class is a list of characters that can (or can not) be matched:</p>
<pre id="raw-code-22" style="display:none; width: 1px; height: 1px; overflow: hidden;">[aeiou]
[^aeiou]
[a-z0-9]
[a-z0123456789]
[a-z0-9\-]
</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-22" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-22'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-22">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>aeiou<span style="color:#006600; font-weight:bold;">&#93;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>^aeiou<span style="color:#006600; font-weight:bold;">&#93;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>a-z0-<span style="color:#800000;color:#800000;">9</span><span style="color:#006600; font-weight:bold;">&#93;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>a-z0123456789<span style="color:#006600; font-weight:bold;">&#93;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>a-z0-<span style="color:#800000;color:#800000;">9</span>\-<span style="color:#006600; font-weight:bold;">&#93;</span> </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
The first will match any vowel. The caret (^) in the second example means "not", so it makes the pattern match anything <em>but</em> a vowel. The third example you see the use of a dash. This creates a range of characters. "a-z" means "any letter from a to z", just like "0-9" means "any number from 0-9". Thus, the two last patterns mean the exact same thing. If you want to insert a literal dash (ie. "match a dash character") you must escape it, as demonstrated in the last example.</p>
<p>The third way is to use the wildcard character, or a pre-defined character type:</p>
<pre id="raw-code-23" style="display:none; width: 1px; height: 1px; overflow: hidden;">.
\w
\d</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-23" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-23'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-23">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">.</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">\w</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">\d </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
The first pattern (the dot) simply matches any character. The second pattern is the special escape sequence that means "any word character" (a word character is something like letters or numbers). The final patterns is another escape sequence that means "any digit" (that is, any number 0-9).</p>
<p>You can combine these to make up a fairly complex pattern:</p>
<pre id="raw-code-24" style="display:none; width: 1px; height: 1px; overflow: hidden;">\wchris\d</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-24" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-24'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-24">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">\wchris\d </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
This would match "tchris9" and "zchris7", but not "chris", "chris98" or "zchris". You might wonder why the latter strings would not match. It is because we have not defined any rules for repetition, so the pattern literally means "a single word character followed by the string 'chris' followed by a single number".</p>
<h3>Alteration</h3>
<p>Alteration is a simpler concept. You simply use the pipe character to separate alternate expressions:</p>
<pre id="raw-code-25" style="display:none; width: 1px; height: 1px; overflow: hidden;">chris|christopher|christoph
color|colour
</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-25" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-25'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-25">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">chris|christopher|christoph</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">color|colour </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
You will see some more useful examples of alternation soon when we talk about grouping.</p>
<h3>Quantification</h3>
<p>There are three ways you can define the quantity of characters.</p>
<p>First way is by providing no quantification at all. When there is none, it means once:</p>
<pre id="raw-code-26" style="display:none; width: 1px; height: 1px; overflow: hidden;">[a-z]
[a-zA-Z]</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-26" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-26'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-26">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>a-z<span style="color:#006600; font-weight:bold;">&#93;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>a-zA-Z<span style="color:#006600; font-weight:bold;">&#93;</span> </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
The first pattern means "one lowercase letter" and the second means "one lower or uppercase letter".</p>
<p>The second way is by using a meta-character. There are three different meta-characters to choose from:</p>
<pre id="raw-code-27" style="display:none; width: 1px; height: 1px; overflow: hidden;">[a-z]*
u?
\d+</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-27" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-27'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-27">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>a-z<span style="color:#006600; font-weight:bold;">&#93;</span>*</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">u?</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">\d+ </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p></p>
<ul>
<li>The asterisk (*) means "zero or more times". The first pattern means "any letter, any number of times".</li>
<li>The question mark (?) means "zero or one". The second pattern means "u is optional".</li>
<li>The plus sign (+) means "one or more". The third pattern means "any digit one or more times".</li>
</ul>
<p>The third way is by explicitly defining the minimum and maximum times the character can be repeated:</p>
<pre id="raw-code-28" style="display:none; width: 1px; height: 1px; overflow: hidden;">[a-z]{1,5}
u{1}
\d{,5}</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-28" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-28'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-28">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#91;</span>a-z<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">1</span>,<span style="color:#800000;color:#800000;">5</span><span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">u<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">1</span><span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">\d<span style="color:#006600; font-weight:bold;">&#123;</span>,<span style="color:#800000;color:#800000;">5</span><span style="color:#006600; font-weight:bold;">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
The format is <em>{min,max}</em>. Either of the numbers can be excluded. For example, by not defining the maximum number, you just define the least number of times the character matches. The fist pattern means "any lowercase letter 1 to 5 times". The second pattern means "exactly 1 u". The third pattern means "at most 5 digits" (since there is no minimum, this would also match no digit!).</p>
<h3>Grouping</h3>
<p>Grouping characters is done with parenthesis. There are three situations where you might want to group characters together.</p>
<p>The first is to define the scope for an alteration:</p>
<pre id="raw-code-29" style="display:none; width: 1px; height: 1px; overflow: hidden;">\wchris|christopher\d
\w(chris|christopher)\d</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-29" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-29'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-29">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">\wchris|christopher\d</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">\w<span style="color:#006600; font-weight:bold;">&#40;</span>chris|christopher<span style="color:#006600; font-weight:bold;">&#41;</span>\d </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
Compare these two patterns. The first means "a word character followed by 'chris' OR 'christopher' followed by a digit". The only way to make the "chris" part alternate is by grouping it together. The second pattern means "a word character, followed by 'chris' or 'christopher', followed by a digit".</p>
<p>You can also group entire subpatterns for quantification:</p>
<pre id="raw-code-30" style="display:none; width: 1px; height: 1px; overflow: hidden;">([a-z]{1,5}\d+)+</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-30" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-30'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-30">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#91;</span>a-z<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">1</span>,<span style="color:#800000;color:#800000;">5</span><span style="color:#006600; font-weight:bold;">&#125;</span>\d+<span style="color:#006600; font-weight:bold;">&#41;</span>+ </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
This means "any letter 1-5 times followed by at least one digit, at least once". It would match "a5", "a5b2", "zs49bf9" etcetera.</p>
<p>The final use is for capturing. When you group an expression, the matching characters are saved and can be re-used later in the pattern:</p>
<pre id="raw-code-31" style="display:none; width: 1px; height: 1px; overflow: hidden;">&lt;(b|strong)&gt;\w*&lt;/\1&gt;</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-31" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-31'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-31">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&lt;<span style="color:#006600; font-weight:bold;">&#40;</span>b|strong<span style="color:#006600; font-weight:bold;">&#41;</span>&gt;\w*&lt;/\<span style="color:#800000;color:#800000;">1</span>&gt; </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
The first group matches either "b" or "strong". Then later in the pattern you see "\1" (an escaped '1') to represent that match. So that pattern will match a properly formatted "b" or "strong" tag. Here's another example you might use to extract a single or double quoted string:</p>
<pre id="raw-code-32" style="display:none; width: 1px; height: 1px; overflow: hidden;">('|&quot;)\w*\1</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-32" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-32'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-32">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">'|&quot;)<span style="color:#000099; font-weight:bold;">\w</span>*<span style="color:#000099; font-weight:bold;">\1</span> </span></div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
This would match things like:</p>
<ol>
<li>"hello"</li>
<li>'world'</li>
</ol>
<p>But <em>not</em>:</p>
<ol>
<li>"hello'</li>
<li>'world"</li>
</ol>
<p>The second set doesn't match because the quote characters are not the same, so the match would fail.</p>
<h3>Assertion (aka Lookahead and Lookbehind)</h3>
<p>Assertions are used to test the preceding or following characters against some expression, without actually consuming the characters. Let me explain that a bit more.</p>
<p>When the regex engine tries to apply a pattern to a string, it "consumes" the string as it goes. It has an internal pointer that moves along the string to keep the current position. Matching "(chris|christopher)" against the string "chris98" would put the internal pointer right after the "s", because thats where the pattern stops. Using an assertion simply checks back or forward, without moving the internal pointer.</p>
<p>For example, let's say I want to match my name "Chris" only when it's part of "Christopher". That is, I don't want to match "christoph" or "christine" or anything else. Here's how I might do it with a so called <em>lookahead</em>:</p>
<pre id="raw-code-33" style="display:none; width: 1px; height: 1px; overflow: hidden;">(?=Christopher)Chris</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-33" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-33'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-33">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#40;</span>?=Christopher<span style="color:#006600; font-weight:bold;">&#41;</span>Chris </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
This would match the "Chris" part of "Christopher Nadeau", but would not match "Christine Doe".</p>
<p>The way the regex engine applies this pattern is to look ahead at the starting point to see if all of the characters ahead are "Christopher". The internal pointer is not moved at all. So by the time the lookahead is finished, the engine applies the rest of the pattern "Chris" as normal, starting from the beginning. When the whole pattern is finished being applied to "Christopher Nadeau", the internal pointer is after the "s".</p>
<p>As programmers, we are used to using escape sequences. For example, to insert a double-quote character within a double-quoted string, we escape it like so: "Hello Chris \"Chroder\" Nadeau".</p>
<p>As an example, let's say we are writing some custom parser and need to do the same thing. We want to capture all of the double-quoted strings. This is easy, right?</p>
<pre id="raw-code-34" style="display:none; width: 1px; height: 1px; overflow: hidden;">&quot;(.*)&quot;</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-34" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-34'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-34">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#CC0000;">"(.*)"</span> </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
That captures any character (the dot meta-character means "anything" remember), any number of times when it appears within double-quotes. But what if we wanted to allow the user to escape the double-quote so strings like the one above would be read correctly? That pattern would match "Hello Chris\", which isn't what we want.</p>
<p>We can use a lookbehind to make sure the preceding character is not a backslash:</p>
<pre id="raw-code-35" style="display:none; width: 1px; height: 1px; overflow: hidden;">&quot;(.*?)(?&lt;=[^\\])&quot;</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-35" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-35'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-35">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#CC0000;">"(.*?)(?&lt;=[^<span style="color:#000099; font-weight:bold;">\\</span>])"</span> </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p></p>
<p>By using the lookbehind, we make it so the regex engine will not match the ending quote when it is preceded by a backslash. </p>
<p>There are four types of assertions, two of which I have already demonstrated:</p>
<ul>
<li>Positive lookahead: (?=<u>expression</u>)<br />
		The pattern is successful if the expression matches the characters to the right of the current position</li>
<li>Negative lookahead: (?!<u>expression</u>)<br />
		The pattern is successful if the expression does not match the characters to the right of the current position</li>
<li>Positive lookbehind: (?&lt;=<u>expression</u>)<br />
		The pattern is successful if the expression matches the characters to the left of the current position</li>
<li>Negative lookbehind: (?&lt;!<u>expression</u>)<br />
		The pattern is successful if the expression does not match the characters to the left of the current position</li>
</ul>
<h3>Anchoring</h3>
<p>The last concept, anchoring, is very simple to understand. Say you wanted to match "chris", but only when it was at the very beginning of the string. You do this by <em>anchoring</em> the regular expression to the beginning of the string. What if you wanted to match only at the end of the string? Yup, you need to anchor to the end of the string. Here are three examples:</p>
<pre id="raw-code-36" style="display:none; width: 1px; height: 1px; overflow: hidden;">^(chris|christopher)
(chris|christopher)$
^(chris|christopher)$</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-36" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-36'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-36">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">^<span style="color:#006600; font-weight:bold;">&#40;</span>chris|christopher<span style="color:#006600; font-weight:bold;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#40;</span>chris|christopher<span style="color:#006600; font-weight:bold;">&#41;</span>$</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">^<span style="color:#006600; font-weight:bold;">&#40;</span>chris|christopher<span style="color:#006600; font-weight:bold;">&#41;</span>$ </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p></p>
<p>The caret (^), when it is the first thing in the pattern, anchors it to the beginning of the string. The dollar sign ($), when the last thing in the pattern, anchors it to the end of the string.</p>
<p>The first pattern means "chris or christopher at the start". The second patterns means "chris or chirstopher at the end". The third pattern means "chris or christopher at the beginning and end", which just means "the string is exactly chris or exactly christopher".</p>
<p>You might be thinking, "why is anchoring important?". Well, let's say you want to validate a number that is in the form of ####-##-## (ie. year-month-day). You might want write:</p>
<pre id="raw-code-37" style="display:none; width: 1px; height: 1px; overflow: hidden;">\d{4}-\d{2}-\d{2}</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-37" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-37'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-37">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">\d<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">4</span><span style="color:#006600; font-weight:bold;">&#125;</span>-\d<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#125;</span>-\d<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
This would work. It matches "1988-12-30". Success? No! It also matches "Blah blah 1988-12-30 blah". The pattern is only telling the regex engine to look for that one expression, it will just skip over all of the non-matching text. So to properly validate the string, you need to anchor it to the beginning and end:</p>
<pre id="raw-code-38" style="display:none; width: 1px; height: 1px; overflow: hidden;">^\d{4}-\d{2}-\d{2}$</pre>
<div class="igBar">
<div class="wrap"><span id="lcode-38" style="float:right"><a href="#" onclick="javascript:showCodeTxt('code-38'); return false;">Plain Text</a></span><span class="langName">CODE:</span>
</div>
</div>
<div class="syntax_hilite">
<div class="wrap">
<div id="code-38">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B; font-weight:bold;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">^\d<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">4</span><span style="color:#006600; font-weight:bold;">&#125;</span>-\d<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#125;</span>-\d<span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#800000;color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#125;</span>$ </div>
</li>
</ol>
</div>
</div>
</div>
</div>
<p>
By anchoring to both the beginning and end, you are essentially saying "the <em>entire</em> string must match this pattern".</p>
<h2>Further Reading</h2>
<p>I think one of the best sites available on the subject of regular expressions is <a href="http://www.regular-expressions.info/">regular-expressions.info</a>. You might find their <a href="http://www.regular-expressions.info/reference.html">reference page</a> particularly useful.</p>
<p>If you're a book person, you should definately pick up a copy of <a href="http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/1565922573/">Mastering Regular Expressions</a> from O'Reilly.</p>
]]></content:encoded>
			<wfw:commentRss>http://devlog.info/2007/12/27/regular-expressions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Devlog Opens</title>
		<link>http://devlog.info/2007/05/24/devlog-opens/</link>
		<comments>http://devlog.info/2007/05/24/devlog-opens/#comments</comments>
		<pubDate>Fri, 25 May 2007 00:53:58 +0000</pubDate>
		<dc:creator>Christopher</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[After months of wanting to start a developers blog, and after months of putting it off, I've finally got around to setting everything up. So here it is.
I want to give a little background information on this little endeavor of mine. I first started playing around with the idea about a year ago. I had [...]]]></description>
			<content:encoded><![CDATA[<p>After months of wanting to start a developers blog, and after months of putting it off, I've finally got around to setting everything up. So here it is.</p>
<p>I want to give a little background information on this little endeavor of mine. I first started playing around with the idea about a year ago. I had just solved a really annoying problem in a project I was working on. You know the kind: something that was supposed to be simple, but ends up taking entirely too much time. I remember thinking that in hindsight the solution was simple. That's just how things work isn't it?</p>
<p>Afterwards I thought, "I wonder how many people I could help if I were to write about what I've just learned." This line of reasoning led me to the idea of Devlog. On the many topics I plan to cover in the coming months and (hopefully) years, I really want to add a few gems that you won't find anywhere else.</p>
<p>There are an overabundance of tutorials online pertaining to web development, but there are some topics I don't see covered much. There are lots of how-to's ranging from rounded corners with CSS to CAPTCHA images with PHP. Out of the thousands and thousands of sites, there are some pretty key topics of discussion that really seem to be lacking. You might find a tutorial on how to connect to a database using PHP but it never goes into much detail about things like SQL injection or keeping stored information safe. It is these kinds of forgotten but important subjects I want to cover.</p>
<p>The first topic, unfortunately, will not be the aforementioned "annoying problem." After all this time and after countless <em>other</em> "annoying problems," I cannot seem to remember the exact one that ignited the cascade of thoughts that led me here. Good thing there are always more problems to be solved!</p>
<p>So that about wraps up this first introductory post on the brand new Devlog. I cannot guarantee any consistent posting time line,  but I will do my best to try and provide some fresh content every once and while. With RSS readers so popular these days, no doubt you can just add the <a href="http://devlog.info/feed/" title="Devlog RSS Feed">Devlog feed</a> to your client and never worry about it until there's something new and interesting to read.</p>
<p>Until next time, happy coding!</p>
]]></content:encoded>
			<wfw:commentRss>http://devlog.info/2007/05/24/devlog-opens/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
