Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 10943

Improving Disk I/O in PHP Apps | Rodney Rehm

$
0
0

Comments:"Improving Disk I/O in PHP Apps | Rodney Rehm"

URL:http://blog.rodneyrehm.de/archives/12-Improving-Disk-IO-in-PHP-Apps.html


For the release of Smarty 3.1.0 I refactored most of Smarty's disk access. As this optimization from September 2011 kept popping up questions on the Smarty forums, I finally felt the need to explain what I did. Although Smarty is the reason I discovered this, this post applies to PHP in general, not Smarty in particular. This post is all about laying the groundwork for you to realize two things:

Your operations are not atomic Avoid accessing the hard disk unnecessarly

Warning: Unless you know what you're doing and why you're doing it, this content is to be considered harmful advice!

Multiple Processes and Race Conditions

You probably remember that your computer switches different processes (running applications, …) in and out of the CPU. This is done so parallel execution of processes could be achieved on a single CPU core. This is what we call Multi Tasking - and the same principles apply to computers with more than one CPU core.

This means that your program (PHP script) is not executed consecutively. Some of the program is executed, then it's paused so something else can run, then it continues execution, then it's paused again, and so forth.

In some languages we can tell the executing host to treat a bunch of operations as a single operation - we call that atomicity. PHP doesn't know this concept. It's safe to say that your PHP script can be disrupted at any given time at any given operation.

Unless you've had the chance of working on fairly high-traffic sites, you've probably never seen a race conditions in action. A race condition is what we call the occasion of two (parallel executing) processes doing contradicting things. For example Process 1 is writing the file /tmp/file.txt, while Process 2 is trying to delete that same file.

Those processes don't have to run in the same context. While Process 1 could be a PHP script, Process 2 could be a shell script triggered by Cron, or some manual rm /tmp/file.txt via SSH.

File Locking

To prevent these race conditions, we're allowed to lock files. When a file is locked by Process 1 and Process 2 is trying to acquire the lock, Process 2 is blocking execution until the lock was released by Process 1. PHP provides this functionality with flock().

flock() has a couple of problems, though. For one it only works with resources, so you need to have the file opened with fopen() prior to obtaining a lock. Also flock() will fail on certain file systems like FAT or NFS. On top of that it seems quite ridiculous to open a file, only to obtain a lock, only to delete the file.

So in real life, where a PHP script does not know which file system is used, flock() won't help.

Potential Race Condition

At first glance, the following code is considered to be good code, as we check if a file exist prior to unlinking it. That is because unlink() issues an E_WARNING whenever it can't find the file to unlink:

But we remember that PHP has no atomic operator and a script can be disrupted at any given time:

$filepath="/tmp/some.file";
if(file_exists($filepath)){
  // <- potential race condition
  unlink($filepath);
}

Considering the above code to be Process 1, we could encounter the following condition:

*Process 1*: file_exists("/tmp/some.file")
*Process 2*: unlink("/tmp/some.file")
*Process 1*: unlink("/tmp/some.file") -> E_WARNING, file not found!

Between checking if the file existed and actually removing it, another process had the chance to delete the file. Now the unlink() of our script issues an E_WARNING because the unlink() failed.

Mitigating the Race Condition

Fear not, PHP knows the almighty @ silence-operator. Prefixing a function call with @ makes PHP ignore any errors issued by that function call scope. The following code will prevent any E_WARNING issued due to a race condition (or any other fault, for that matter):

With that little @ we've opened the door to a slight simplification of our code. Since we're performing the file_exists() to make sure unlink() won't issue any warnings, and @unlink() won't issue any warnings, we can simply drop file_exists():

$filepath="/tmp/some.file";
@unlink($filepath);

Et voila, we have successfully mitigated the race condition. And by doing so, we have accidentally reduced the Disk I/O by 50%.

Reducing Disk I/O (stats)

Besides the implications on race conditions, ditching file_exists() has the other benefit of reducing stat calls. Whenever you have to touch an HDD, imagine your Ferrari-application hitting the brakes. Compared to the CPU any hard disk (yes, even SSDs) are turtles chained to a rock. So the ultimate goal is to avoid touching the file system whenever possible.

Consider the following well coded program to identify if a file exists and when it's been modified last:

$filepath="/tmp/some.file";
$file_exists=file_exists($filepath);
$file_mtime=null;
if($file_exists){
  $file_mtime=filemtime($filename);
}

Did you know, that filemtime() returns false (and issues an E_WARNING) if it can't find the file? So how about reversing things and ditching the file_exists():

$filepath="/tmp/some.file";
$file_mtime=@file_mtime($filepath);
$file_exists=!!$file_mtime;

Custom Error Handling

As mentioned initially, ditching file_exists() was done to Smarty 3.1.0. We did numerous tests and benchmarks and came to the conclusion that we'd be stupid not to do it. And at that point I figured nobody would ever notice. That might've been true, hadn't it been for set_error_handler().

set_error_handler() allows you to register your own custom method for handling errors. It's pretty neat to push certain errors to a database or send mails or something like that. It gives you absolute power over each and every notice or warning issued. Even those that would've been masked by error_reporting() or the @ operator.

Apparently some people register custom error handlers to get ALL THE ERRORS. Even the masked ones. Some developers failed to understand hints in the docs, others did it deliberately. Intentions aside, these ill-conceived error handlers break the way we expect PHP to work. All of a sudden errors like error in 'test.php' on line 2: unlink(/tmp/some.file): No such file or directory (2) started popping up.

In their minds Smarty was misbehaving. After all its code was raising E_WARNINGs all over the place. They didn't know (and didn't care) about the improvements we've made. They didn't want to "fix" their error handlers, as they did not see them broken. So in Smarty 3.1.2 I introduced Smarty::muteExpectedErrors() - a custom error handler that that would proxy their handlers, filtering out errors Smarty actually expected to happen.

Warning (added Jan 10th 2013)

This post appeared on hacker news, triggering a couple of comments. I added a warning to the top of the post. That said, here are a couple of reasons I chose this route:

  • I really don't care if a file couldn't be accessed due to privilege (or any other) reasons. There is a global systems-check to take care of that. This code assumes everything is fine. If it is not, the systems-check will tell us what is going on.
  • This is the least amount of code needed to "just make it work" across any setup.
    • Regardless of the number of physical machines running in parallel.
    • Regardless of the filesystem used (yes, some don't provide locking)
    • Regardless of the frequency and concurrency a single file is touched
    • Regardless of the PECLs some environment may have installed

This is the fire and forget approach. This is something you can do when you caching, when you simply don't care about integrity and persistency.

Would I do any of the above if I cared about the data and could define the environment? HELL NO! But then, I probably wouldn't be using PHP either…


Viewing all articles
Browse latest Browse all 10943

Trending Articles