PHP makes it relatively easy to build a web-based system, which is much of the reason for its popularity. But its ease of use notwithstanding, PHP has evolved into quite a sophisticated language with many frameworks, nuances, and subtleties that can bite developers, leading to hours of hair-pulling debugging. This article highlights ten of the more common problems that PHP developers need to beware of.
Common Mistake #1: Leaving dangling array references after foreach
loops
Not sure how to use foreach loops in PHP? Using references in foreach
loops can be useful if you want to operate on each element in the array that you are iterating over. For example:
1 |
$arr = array (1, 2, 3, 4);
|
2 |
foreach ( $arr as & $value ) {
|
3 |
$value = $value * 2;
|
4 |
} |
5 |
// $arr is now array(2, 4, 6, 8) |
The problem is that, if you’re not careful, this can also have some undesirable side effects and consequences. Specifically, in the above example, after the code is executed, $value
will remain in scope and will hold a reference to the last element in the array. Subsequent operations involving $value
could therefore unintentionally end up modifying the last element in the array.
The main thing to remember is that foreach
does not create a scope. Thus, $value
in the above example is a reference within the top scope of the script. On each iteration foreach
sets the reference to point to the next element of $array
. After the loop completes, therefore, $value
still points to the last element of $array
and remains in scope.
Here’s an example of the kind of evasive and confusing bugs that this can lead to:
1 |
$array = [1, 2, 3];
|
2 |
echo implode( ',' , $array ), "\n" ;
|
3 |
4 |
foreach ( $array as & $value ) {} // by reference
|
5 |
echo implode( ',' , $array ), "\n" ;
|
6 |
7 |
foreach ( $array as $value ) {} // by value (i.e., copy)
|
8 |
echo implode( ',' , $array ), "\n" ;
|
The above code will output the following:
1 |
<code>1,2,3 |
2 |
1,2,3 |
3 |
1,2,2 |
4 |
</code> |
No, that’s not a typo. The last value on the last line is indeed a 2, not a 3.
Why?
After going through the first foreach
loop, $array
remains unchanged but, as explained above, $value
is left as a dangling reference to the last element in $array
(since that foreach
loop accessed $value
by reference).
As a result, when we go through the second foreach
loop, “weird stuff” appears to happen. Specifically, since $value
is now being accessed by value (i.e., by copy), foreach
copies each sequential $array
element into $value
in each step of the loop. As a result, here’s what happens during each step of the second foreach
loop:
-
Pass 1: Copies
$array[0]
(i.e., “1”) into$value
(which is a reference to$array[2]
), so$array[2]
now equals 1. So$array
now contains [1, 2, 1]. -
Pass 2: Copies
$array[1]
(i.e., “2”) into$value
(which is a reference to$array[2]
), so$array[2]
now equals 2. So$array
now contains [1, 2, 2]. -
Pass 3: Copies
$array[2]
(which now equals “2”) into$value
(which is a reference to$array[2]
), so$array[2]
still equals 2. So$array
now contains [1, 2, 2].
To still get the benefit of using references in foreach
loops without running the risk of these kinds of problems, call unset()
on the variable, immediately after the foreach
loop, to remove the reference; e.g.:
1 |
$arr = array (1, 2, 3, 4);
|
2 |
foreach ( $arr as & $value ) {
|
3 |
$value = $value * 2;
|
4 |
} |
5 |
unset( $value ); // $value no longer references $arr[3]
|
Common Mistake #2: Misunderstanding isset()
behavior
Despite its name, isset()
not only returns false if an item does not exist, but also returns false
for null
values.
This behavior is more problematic than it might appear at first and is a common source of problems.
Consider the following:
1 |
$data = fetchRecordFromStorage( $storage , $identifier );
|
2 |
if (!isset( $data [ 'keyShouldBeSet' ]) {
|
3 |
// do something here if 'keyShouldBeSet' is not set
|
4 |
} |
The author of this code presumably wanted to check if keyShouldBeSet
was set in $data
. But, as discussed, isset($data['keyShouldBeSet'])
will also return false if $data['keyShouldBeSet']
was set, but was set to null
. So the above logic is flawed.
Here’s another example:
1 |
if ( $_POST [ 'active' ]) {
|
2 |
$postData = extractSomething( $_POST );
|
3 |
} |
4 |
5 |
// ... |
6 |
7 |
if (!isset( $postData )) {
|
8 |
echo 'post not active' ;
|
9 |
} |
The above code assumes that if $_POST['active']
returns true
, then postData
will necessarily be set, and therefore isset($postData)
will return true
. So conversely, the above code assumes that the only way that isset($postData)
will return false
is if $_POST['active']
returned false
as well.
Not.
As explained, isset($postData)
will also return false
if $postData
was set to null
. It therefore is possible for isset($postData)
to return false
even if $_POST['active']
returned true
. So again, the above logic is flawed.
And by the way, as a side point, if the intent in the above code really was to again check if $_POST['active']
returned true, relying on isset()
for this was a poor coding decision in any case. Instead, it would have been better to just recheck $_POST['active']
; i.e.:
1 |
if ( $_POST [ 'active' ]) {
|
2 |
$postData = extractSomething( $_POST );
|
3 |
} |
4 |
5 |
// ... |
6 |
7 |
if ( $_POST [ 'active' ]) {
|
8 |
echo 'post not active' ;
|
9 |
} |
For cases, though, where it is important to check if a variable was really set (i.e., to distinguish between a variable that wasn’t set and a variable that was set to null
), the array_key_exists()
method is a much more robust solution.
For example, we could rewrite the first of the above two examples as follows:
1 |
$data = fetchRecordFromStorage( $storage , $identifier );
|
2 |
if (! array_key_exists ( 'keyShouldBeSet' , $data )) {
|
3 |
// do this if 'keyShouldBeSet' isn't set
|
4 |
} |
Moreover, by combining array_key_exists()
with get_defined_vars()
, we can reliably check whether a variable within the current scope has been set or not:
1 |
if ( array_key_exists ( 'varShouldBeSet' , get_defined_vars())) {
|
2 |
// variable $varShouldBeSet exists in current scope
|
3 |
} |
Common Mistake #3: Confusion about returning by reference vs. by value
Consider this code snippet:
01 |
class Config
|
02 |
{ |
03 |
private $values = [];
|
04 |
05 |
public function getValues() {
|
06 |
return $this ->values;
|
07 |
}
|
08 |
} |
09 |
10 |
$config = new Config();
|
11 |
12 |
$config ->getValues()[ 'test' ] = 'test' ;
|
13 |
echo $config ->getValues()[ 'test' ];
|
If you run the above code, you’ll get the following:
1 |
PHP Notice: Undefined index: test in /path/to/my/script.php on line 21 |
What’s wrong?
The issue is that the above code confuses returning arrays by reference with returning arrays by value. Unless you explicitly tell PHP to return an array by reference (i.e., by using&
), PHP will by default return the the array “by value”. This means that a copy of the array will be returned and therefore the called function and the caller will not be accessing the same instance of the array.
So the above call to getValues()
returns a copy of the $values
array rather than a reference to it. With that in mind, let’s revisit the two key lines from the above the example:
1 |
// getValues() returns a COPY of the $values array, so this adds a 'test' element |
2 |
// to a COPY of the $values array, but not to the $values array itself. |
3 |
$config ->getValues()[ 'test' ] = 'test' ;
|
4 |
5 |
// getValues() again returns ANOTHER COPY of the $values array, and THIS copy doesn't |
6 |
// contain a 'test' element (which is why we get the "undefined index" message). |
7 |
echo $config ->getValues()[ 'test' ];
|
One possible fix would be to save the first copy of the $values
array returned by getValues()
and then operate on that copy subsequently; e.g.:
1 |
$vals = $config ->getValues();
|
2 |
$vals [ 'test' ] = 'test' ;
|
3 |
echo $vals [ 'test' ];
|
That code will work fine (i.e., it will output test
without generating any “undefined index” message), but depending on what you’re trying to accomplish, this approach may or may not be adequate. In particular, the above code will not modify the original $values
array. So if you do want your modifications (such as adding a ‘test’ element) to affect the original array, you would instead need to modify the getValues()
function to return a reference to the $values
array itself. This is done by adding a &
before the function name, thereby indicating that it should return a reference; i.e.:
01 |
class Config
|
02 |
{ |
03 |
private $values = [];
|
04 |
05 |
// return a REFERENCE to the actual $values array
|
06 |
public function &getValues() {
|
07 |
return $this ->values;
|
08 |
}
|
09 |
} |
10 |
11 |
$config = new Config();
|
12 |
13 |
$config ->getValues()[ 'test' ] = 'test' ;
|
14 |
echo $config ->getValues()[ 'test' ];
|
The output of this will be test
, as expected.
But to make things more confusing, consider instead the following code snippet:
01 |
class Config
|
02 |
{ |
03 |
private $values ;
|
04 |
05 |
// using ArrayObject rather than array
|
06 |
public function __construct() {
|
07 |
$this ->values = new ArrayObject();
|
08 |
}
|
09 |
10 |
public function getValues() {
|
11 |
return $this ->values;
|
12 |
}
|
13 |
} |
14 |
15 |
$config = new Config();
|
16 |
17 |
$config ->getValues()[ 'test' ] = 'test' ;
|
18 |
echo $config ->getValues()[ 'test' ];
|
If you guessed that this would result in the same “undefined index” error as our earlier array
example, you were wrong. In fact, this code will work just fine. The reason is that, unlike arrays, PHP always passes objects by reference. (ArrayObject
is an SPL object, which fully mimics arrays usage, but works as an object.)
As these examples demonstrate, it is not always entirely obvious in PHP whether you are dealing with a copy or a reference. It is therefore essential to understand these default behaviors (i.e., variables and arrays are passed by value; objects are passed by reference) and also to carefully check the API documentation for the function you are calling to see if it is returning a value, a copy of an array, a reference to an array, or a reference to an object.
All that said, it is important to note that the practice of returning a reference to an array or an ArrayObject
is generally something that should be avoided, as it provides the caller with the ability to modify the instance’s private data. This “flies in the face” of encapsulation. Instead, it’s better to use old style “getters” and “setters”, e.g.:
01 |
class Config
|
02 |
{ |
03 |
private $values = [];
|
04 |
|
05 |
public function setValue( $key , $value ) {
|
06 |
$this ->values[ $key ] = $value ;
|
07 |
}
|
08 |
|
09 |
public function getValue( $key ) {
|
10 |
return $this ->values[ $key ];
|
11 |
}
|
12 |
} |
13 |
14 |
$config = new Config();
|
15 |
16 |
$config ->setValue( 'testKey' , 'testValue' );
|
17 |
echo $config ->getValue( 'testKey' ); // echos 'testValue'
|
This approach gives the caller the ability to set or get any value in the array without providing public access to the otherwise-private $values
array itself.
Common Mistake #4: Performing queries in a loop
It’s not uncommon to come across something like this if your PHP is not working:
1 |
$models = [];
|
2 |
3 |
foreach ( $inputValues as $inputValue ) {
|
4 |
$models [] = $valueRepository ->findByValue( $inputValue );
|
5 |
} |
While there may be absolutely nothing wrong here, but if you follow the logic in the code, you may find that the innocent looking call above to $valueRepository->findByValue()
ultimately results in a query of some sort, such as:
1 |
$result = $connection ->query( "SELECT `x`,`y` FROM `values` WHERE `value`=" . $inputValue );
|
As a result, each iteration of the above loop would result in a separate query to the database. So if, for example, you supplied an array of 1,000 values to the loop, it would generate 1,000 separate queries to the resource! If such a script is called in multiple threads, it could potentially bring the system to a grinding halt.
It’s therefore crucial to recognize when queries are being made by your code and, whenever possible, gather the values and then run one query to fetch all the results.
One example of a fairly common place to encounter querying being done inefficiently (i.e., in a loop) is when a form is posted with a list of values (IDs, for example). Then, to retrieve the full record data for each of the IDs, the code will loop through the array and do a separate SQL query for each ID. This will often look something like this:
1 |
$data = [];
|
2 |
foreach ( $ids as $id ) {
|
3 |
$result = $connection ->query( "SELECT `x`, `y` FROM `values` WHERE `id` = " . $id );
|
4 |
$data [] = $result ->fetch_row();
|
5 |
} |
But the same thing can be accomplished much more efficiently in a single SQL query as follows:
1 |
$data = [];
|
2 |
if ( count ( $ids )) {
|
3 |
$result = $connection ->query( "SELECT `x`, `y` FROM `values` WHERE `id` IN (" . implode( ',' , $ids ));
|
4 |
while ( $row = $result ->fetch_row()) {
|
5 |
$data [] = $row ;
|
6 |
}
|
7 |
} |
It’s therefore crucial to recognize when queries are being made, either directly or indirectly, by your code. Whenever possible, gather the values and then run one query to fetch all the results. Yet caution must be exercised there as well, which leads us to our next common PHP mistake…
Common Mistake #5: Memory usage headfakes and inefficiencies
While fetching many records at once is definitely more efficient than running a single query for each row to fetch, such an approach can potentially lead to an “out of memory” condition in libmysqlclient
when using PHP’s mysql
extension.
To demonstrate, let’s take a look at a test box with limited resources (512MB RAM), MySQL, and php-cli
.
We’ll bootstrap a database table like this:
01 |
// connect to mysql |
02 |
$connection = new mysqli( 'localhost' , 'username' , 'password' , 'database' );
|
03 |
04 |
// create table of 400 columns |
05 |
$query = 'CREATE TABLE `test`(`id` INT NOT NULL PRIMARY KEY AUTO_INCREMENT' ;
|
06 |
for ( $col = 0; $col < 400; $col ++) {
|
07 |
$query .= ", `col$col` CHAR(10) NOT NULL" ;
|
08 |
} |
09 |
$query .= ');' ;
|
10 |
$connection ->query( $query );
|
11 |
12 |
// write 2 million rows |
13 |
for ( $row = 0; $row < 2000000; $row ++) {
|
14 |
$query = "INSERT INTO `test` VALUES ($row" ;
|
15 |
for ( $col = 0; $col < 400; $col ++) {
|
16 |
$query .= ', ' . mt_rand(1000000000, 9999999999);
|
17 |
}
|
18 |
$query .= ')' ;
|
19 |
$connection ->query( $query );
|
20 |
} |
OK, now let’s check resources usage:
1 |
// connect to mysql |
2 |
$connection = new mysqli( 'localhost' , 'username' , 'password' , 'database' );
|
3 |
echo "Before: " . memory_get_peak_usage() . "\n" ;
|
4 |
5 |
$res = $connection ->query( 'SELECT `x`,`y` FROM `test` LIMIT 1' );
|
6 |
echo "Limit 1: " . memory_get_peak_usage() . "\n" ;
|
7 |
8 |
$res = $connection ->query( 'SELECT `x`,`y` FROM `test` LIMIT 10000' );
|
9 |
echo "Limit 10000: " . memory_get_peak_usage() . "\n" ;
|
Output:
1 |
Before: 224704 |
2 |
Limit 1: 224704 |
3 |
Limit 10000: 224704 |
Cool. Looks like the query is safely managed internally in terms of resources.
Just to be sure, though, let’s boost the limit one more time and set it to 100,000. Uh-oh. When we do that, we get:
1 |
PHP Warning: mysqli::query(): (HY000/2013): |
2 |
Lost connection to MySQL server during query in /root/test.php on line 11
|
What happened?
The issue here is the way PHP’s mysql
module works. It’s really just a proxy for libmysqlclient
, which does the dirty work. When a portion of data is selected, it goes directly into memory. Since this memory is not managed by PHP’s manager, memory_get_peak_usage()
won’t show any increase in resources utilization as we up the limit in our query. This leads to problems like the one demonstrated above where we’re tricked into complacency thinking that our memory management is fine. But in reality, our memory management is seriously flawed and we can experience problems like the one shown above.
You can at least avoid the above headfake (although it won’t itself improve your memory utilization) by instead using the mysqlnd
module. mysqlnd
is compiled as a native PHP extension and it does use PHP’s memory manager.
Therefore, if we run the above test using mysqlnd
rather than mysql
, we get a much more realistic picture of our memory utilization:
1 |
Before: 232048 |
2 |
Limit 1: 324952 |
3 |
Limit 10000: 32572912 |
And it’s even worse than that, by the way. According to PHP documentation, mysql
uses twice as many resources as mysqlnd
to store data, so the original script using mysql
really used even more memory than shown here (roughly twice as much).
To avoid such problems, consider limiting the size of your queries and using a loop with small number of iterations; e.g.:
1 |
$totalNumberToFetch = 10000;
|
2 |
$portionSize = 100;
|
3 |
4 |
for ( $i = 0; $i <= ceil ( $totalNumberToFetch / $portionSize ); $i ++) {
|
5 |
$limitFrom = $portionSize * $i ;
|
6 |
$res = $connection ->query(
|
7 |
"SELECT `x`,`y` FROM `test` LIMIT $limitFrom, $portionSize" );
|
8 |
} |
When we consider both this PHP mistake and mistake #4 above, we realize that there is a healthy balance that your code ideally needs to achieve between, on the one hand, having your queries being too granular and repetitive, vs. having each of your individual queries be too large. As is true with most things in life, balance is needed; either extreme is not good and can cause problems with PHP not working properly.
Common Mistake #6: Ignoring Unicode/UTF-8 issues
In some sense, this is really more of an issue in PHP itself than something you would run into while debugging PHP, but it has never been adequately addressed. PHP 6’s core was to be made Unicode-aware, but that was put on hold when development of PHP 6 was suspended back in 2010.
But that by no means absolves the developer from properly handing UTF-8 and avoiding the erroneous assumption that all strings will necessarily be “plain old ASCII”. Code that fails to properly handle non-ASCII strings is notorious for introducing gnarly heisenbugs into your code. Even simple strlen($_POST['name'])
calls could cause problems if someone with a last name like “Schrödinger” tried to sign up into your system.
Here’s a small checklist to avoid such problems in your code:
- If you don’t know much about Unicode and UTF-8, you should at least learn the basics. There’s a great primer here.
- Be sure to always use the
mb_*
functions instead of the old string functions (make sure the “multibyte” extension is included in your PHP build). - Make sure your database and tables are set to use Unicode (many builds of MySQL still use
latin1
by default). - Remember that
json_encode()
converts non-ASCII symbols (e.g., “Schrödinger” becomes “Schrödinger”) butserialize()
does not. - Make sure your PHP code files are also UTF-8 encoded to avoid collisions when concatenating strings with hardcoded or configured string constants.
A particularly valuable resource in this regard is the UTF-8 Primer for PHP and MySQL post by Francisco Claria on this blog.
Common Mistake #7: Assuming $_POST
will always contain your POST data
Despite its name, the $_POST
array won’t always contain your POST data and can be easily found empty. To understand this, let’s take a look at an example. Assume we make a server request with a jQuery.ajax()
call as follows:
1 |
// js |
2 |
$.ajax({ |
3 |
url: 'http://my.site/some/path' ,
|
4 |
method: 'post' ,
|
5 |
data: JSON.stringify({a: 'a' , b: 'b' }),
|
6 |
contentType: 'application/json'
|
7 |
}); |
(Incidentally, note the contentType: 'application/json'
here. We send data as JSON, which is quite popular for APIs. It’s the default, for example, for posting in the AngularJS $http
service.)
On the server side of our example, we simply dump the $_POST
array:
1 |
// php |
2 |
var_dump( $_POST );
|
Surprisingly, the result will be:
1 |
array (0) { }
|
Why? What happened to our JSON string {a: 'a', b: 'b'}
?
The answer is that PHP only parses a POST payload automatically when it has a content type of application/x-www-form-urlencoded
or multipart/form-data
. The reasons for this are historical — these two content types were essentially the only ones used years ago when PHP’s $_POST
was implemented. So with any other content type (even those that are quite popular today, like application/json
), PHP doesn’t automatically load the POST payload.
Since $_POST
is a superglobal, if we override it once (preferably early in our script), the modified value (i.e., including the POST payload) will then be referenceable throughout our code. This is important since $_POST
is commonly used by PHP frameworks and almost all custom scripts to extract and transform request data.
So, for example, when processing a POST payload with a content type of application/json
, we need to manually parse the request contents (i.e., decode the JSON data) and override the $_POST
variable, as follows:
1 |
// php |
2 |
$_POST = json_decode( file_get_contents ( 'php://input' ), true);
|
Then when we dump the $_POST
array, we see that it correctly includes the POST payload; e.g.:
1 |
array (2) { [ "a" ]=> string(1) "a" [ "b" ]=> string(1) "b" }
|
Common Mistake #8: Thinking that PHP supports a character data type
Look at this sample piece of code and try guessing what it will print:
1 |
for ( $c = 'a' ; $c <= 'z' ; $c ++) {
|
2 |
echo $c . "\n" ;
|
3 |
} |
If you answered ‘a’ through ‘z’, you may be surprised to know that you were wrong.
Yes, it will print ‘a’ through ‘z’, but then it will also print ‘aa’ through ‘yz’. Let’s see why.
In PHP there’s no char
datatype; only string
is available. With that in mind, incrementing the string
z
in PHP yields aa
:
1 |
php> $c = 'z' ; echo ++ $c . "\n" ;
|
2 |
aa |
Yet to further confuse matters, aa
is lexicographically less than z
:
1 |
php> var_export((boolean)( 'aa' < 'z'
|