Today I want to talk about best practices in regards to reading user input. And if you haven't already, I'm going to convince you why using PHP's superglobal arrays are bad.
Best Practice: Cast to what you want
The first part of this post is all about how important casting incoming data is. As you know, security rule number one is never trust the user. So if you are expecting an Article ID which is an integer, you should make sure that you actually got an integer.
The most common approch is to cast user input when you get it. So if an attacker tries to supply some arbitrary string, PHP will cast it to an integer which usually results in a harmless 0. Sure, you have precautions in place against SQL injection or XSS but by casting data to a specific type, you can greatly simplify code because you can start working under assumptions ("I know for sure that $id is a harmless integer").
Here are a few examples:
-
// ID's (the use of max to get rid of negatives; Usually ID's are unsigned)
-
-
// Simple strings
-
-
// Booleans
-
$show_drafts = (bool)$_GET['show_drafts'];
Note that this isn't really about form validation. If someone is filling out a form, you still need to make sure what they input is valid. For example, the "Simple strings" snippet above is useless to users when they're registering because you're not telling them that their input is invalid.
These strict casts are useful only when you don't expect users to fiddle with values. So things like view.php?id=34 where the URL is generated by your app, or <select> values where the user shouldn't change values etc.
Getting rid of the superglobals
Well, sort of. We obviously can't completely stop using $_GET/$_POST/$_COOKIE for reading incoming data because there's no replacement. What I'm talking about is ridding your code of these superglobals -- pushing them into a special input class. A code snippet is worth a thousand words; here's what I'm talking about:
-
// Bad
-
$article_id = $_GET['id'];
-
-
// Better
-
$article_id = $input->get('id');
Why
The above snippet doesn't really show off anything other than the concept. So let's talk about what this mysterious input class is meant to do, and why it's better than using superglobals.
Control
You don't have much control over superglobals. They are simply arrays. You can't do anything special before trying to fetch values. You might be thinking about what kind of processing you would want to do. But think about this for a moment.
How many times have you done something like this:
-
$article_id = (int)$_GET['id'];
-
-
// Or
-
$article_id = (int)$_GET['id'];
-
}
Since you control the input class, you can add a bunch of features to make gathering and casting input really easy.
-
$article_id = $input->getInt('id');
-
$search_user = $input->getSimpleString('search_user');
-
$delete_ids = $input->getArrayOfInts('delete_ids');
-
$article_content = $input->getString('content');
UTF-8 Handling
One place where this sort of functionality is really useful is when you've switched to using UTF-8. As you know, it is possible for UTF-8 strings to be malformed -- and this is a security risk. So every time you read strings you must make sure they're valid. Without an input class, this would be a lot of work. With an input class, you can just modify your 'getString' method to add UTF-8 checking.
Stripping magic quotes
Another common task PHP programmers routinely need to do is handle magic quotes. Most of us simply test to see if magic quotes is enabled, and run the entire superglobal array family through a function that strips the slashes -- basically undoing this devil-feature.
But by doing this, you're affecting the entire app (because, well, you're modifying a superglobal of course!). This might not matter to you, but it gets a bit dangerous if you use third-party libraries.
A friend of mine creates Facebook applications. The Facebook PHP library is clever enough to test for magic quotes and strips slashes out for values it needs. But what if you're a clever PHP programmer, and you stripped out the slashes as part of your "global.php" file? FB doesn't know that, so it just strips them again! This is an example of how modifying global data is a bad idea.
By moving input-reading into your own class, you can do whatever the hell you want to your data and can rest assured knowing you didn't affect anything else.
Abstraction
By creating this new class, you abstract the details of how input is gathered. You might be wondering why this is important -- isn't there just get/post/cookie? No! While the vast majority of your webpages will only use these types of input, there are certainly others. Here are two that jump to my mind.
Command-Line
If you make command-line scripts, getting arguments may be a perfect use for your input class.
Friendly-URL's
$_GET only works for values encoded in the query string. But if you are a creating an app with friendly URL's, values are sometimes embedded right into the URL.
For example: /article/42/edit
This URL might mean to "edit" the article with ID of "42". Most frameworks, like Zend Framework's MVC components, make creating URL's like these very easy. In ZF, you define a route with placeholders of the values:
-
$router->addRoute('article_action', new Zend_Controller_Router_Route(
-
'article/:article_id/:action',
-
));
The framework provides a way to get the value of 'article_id' and 'action' that you could plug right into your input class. It might look something like this:
-
$article_id = $input->getIntFromUrl('article_id');
-
$action = $input->getStringFromUrl('action');
Building an input class
Building your own input class is an excersice you can complete yourself. But I'll get you started.
First of all, what is our goal?
- Ability to clean data
- Ability to get data from multiple sources
I think a well-designed system calls for a handful of classes:
Cleanertakes values and cleans/casts them to a correct data type. We're using a separate class because it's not directly tied to input -- we might reuse this functionality elsewhere.InputSource_*are classes that read raw data from some source. We have one reader for each source. For example,InputSource_Arrayfor reading information from an array (like supergloabls) orInputReader_Urlfor reading information from a friendly URL.InputReaderis the main class that ties theCleanerand theInputSource_*classes together.
Here's how it might work:
-
$cleaner = new Cleaner();
-
$input = new InputReader($cleaner);
-
$input->addSource('req', new InputSource_Array($_REQUEST));
-
$input->addSource('get', new InputSource_Array($_GET));
-
$input->addSource('post', new InputSource_Array($_POST));
-
$input->addSource('cookie', new InputSource_Array($_COOKIE));
-
$input->setDefaultSource('req');
-
-
$input->getInt('id'); // Get from the 'req' source, a default
-
$input->getInt('id', 'get'); // Get from 'get' source
-
-
// $input->getInt() calls a corresponding $cleaner method to
-
// clean an integer.
February 7th, 2009 at 5:37 pm
Very good read, and I even learned something! I will try to master this and apply it upcoming code. Thanks for writing this all out.
March 4th, 2009 at 10:27 am
You're right in the aticle, great blog, keep up the quality!
I have one complaint though. The comments form's inputs should have a color:black when active, as some of us are really using black themes at the OS level :D
February 4th, 2010 at 3:25 am
[...] finally, best practice is to abstract your input reading out (see Stop using superglobals!) so you can define yourself exactly how variables are read and from where. Posted by [...]