I have something embarrassing to admit. I wrote this entire chapter based on theory alone. I’ve implemented all of these techniques at Twitpic, because on paper— IN THEORY, it makes complete sense. These changes will cut our the cruft, reduce useless data shuffling, and optimize the whole process. Right? Right!?
Well, when I went to benchmark each individual change, it turns out… not so much. In fact, each optimization described (tmpfs,client_body_buffer_size, andHTTP Upload Module) only added, at best, a 5% performance improvement over the “default” setup. Oops. And Ouch. It’s a lesson in premature optimization, for sure, and why you should benchmark everything! Regardless, I’ve kept this case study for two reasons:
1. It’s really important to understand EXACTLY how data finds its way to your PHP code. Following the code path is a journey that everyone should make. This knowledge will help you debug in the future.
2. To drive home the whole premature optimization thing and that even if an optimization looks correct on paper, it might not actually pan out in the real world.
The only thing that seemed to make a large, measurable impact was the difference between ImageMagick and GraphicsMagick, which I talk about in the next section.
I benchmarked the different settings, one by one, using ab (Apache Benchmark), with 1000 uploads and a concurrency of 20— that is, 1000 independent uploads with 20 happening at the same time. I used a 10MB JPEG as the test upload. The server, a EC2 m3.xlarge, was running Debian Squeeze 6.0.6, nginx 1.2.7, and PHP 5.4.13 for the benchmark.
These are the different tests, explained one-by-one. tmpfs
I enabledtmpfsand changedclient_body_temp_pathin nginx andupload_tmp_dirin php to use it. Little to no difference between the stock setup. Same number of requests/second, same time per request.
Here’s why:
1. Nginx doesn’t fsync the temporary data to disk, and the writes are mostly are mostly sequential, so writing the upload data to disk doesn’t have a huge impact on IO usage. 2. When PHP reads the data back from disk, it’s reading the entire file sequentially, most of
Case Study: Optimizing image handling in PHP 188 Increasingclient_body_buffer_size
I increased theclient_body_buffer_sizevariable in nginx to 20MB, so nginx would be able to buffer the entire 10MB file upload in memory and never have to hit disk. Again, no performance improvement (due to the reasons above), except it increased the memory usage of my nginx worker processes to 200-300MB each, instead of the typical 25MB. Fail! There seems to be almost no reason to increaseclient_body_buffer_sizefrom the default8KB.
HTTP Upload Module
My last hope— I knew this had to improve performance. Not only does it save the file from being copied, but it saves PHP from having to parse the 10MB of POST data. I saw a small 5% improvement in performance here, mostly in reduction of CPU usage by PHP, as it’s less data PHP needs to chug through before it can run your code. Not worth it.
Benchmarks of various changes
Benchmarking file uploads withab(Apache Benchmark)
I figured I would make a quick note about how you can test this yourself— it took me a long time to figure out a way to easily benchmark/test file uploads from the command line. There are plenty of tutorials onab, but none really mention how to use it to upload files.
You can only upload raw multipart/form-data withab, it won’t do any magic for you, so you actually need to assembly the raw multipart/form-data to upload an image withab. Something about “manual” and “multipart/form-data” just sounds awful.
I used this PHP script to generate it for me, you just need to edit the form field name, file path, and file name. It’ll take care of everything else. I used a PHP script because I had a hell of a time getting all of the\r\n line breaks to be perfect doing it manually— if the format isn’t perfect, your uploads will get ignored by nginx and php.
Case Study: Optimizing image handling in PHP 189 1 <?php 2 3 $boundary = "1234567890"; 4 $field_name = "upload"; 5 $file_path = "./"; 6 $file_name = "test.jpg"; 7 8 echo "--{$boundary}\r\n";
9 echo "Content-Disposition: form-data; name=\"{$field_name}\";"; 10 echo " filename=\"{$file_name}\"" . "\r\n";
11 echo "Content-Type: image/jpg\r\n"; 12 echo "\r\n";
13
14 echo file_get_contents($file_path . $file_name); 15
16 echo "\r\n";
17 echo "--{$boundary}--\r\n";
Put the script above into a file called generate.php and create a post.txt file holding the multipart/form-data with the following command:
1 > php generate.php > post.txt
Now, you can usepost.txtwithabto start benchmarking your own file uploads.
1 > ab -n 50 -c 10 -p post.txt -T "multipart/form-data; boundary=1234567890" htt\
2 p://localhost/upload.php
ImageMagick vs GraphicsMagick
There’s a lesser known image library called GraphicsMagick¹⁴⁰— a leaner and faster fork of the ImageMagick code. That’s a win in itself (given the same source image and settings, GraphicsMagick will often produce a smaller output image in less time), but it also has a regularly updated PHP C Extension. Having a Native PECL extension is a huge win, because it us allows to remove the twoexec()calls from the code and subsequently the two poorly performingfork()s. The installation is pretty painless, too.
1 > sudo apt-get install php5-dev php-pear build-essential libgraphicsmagick-dev 2 > sudo pecl install --force gmagick
3 > echo "echo "extension=gmagick.so" > /etc/php5/conf.d/gmagick.ini" | sudo sh You can grab the docs forPECL GMagick¹⁴¹here, but it uses very similar options as ImageMagick, except with an object oriented interface instead of command-line arguments.
¹⁴⁰http://www.graphicsmagick.org/
Case Study: Optimizing image handling in PHP 190 1 <?php
2
3 class Image_Controller extends Controller { 4
5 // Image upload is posted to this method
6 public function create() {
7 if (isset($_FILES["upload"])) {
8 // Create the Scaled 640x800 Size
9 $gm = new GMagick(); 10 $gm->readImage($_FILES["upload"]["tmp_file"]); 11 $gm->setCompressionQuality(90); 12 $gm->scaleimage(640,800); 13 $gm->write("./f/scaled.jpg"); 14
15 // Create the 150x150 Thumbnail
16 $gm->cropthumbnailimage(150,150); 17 $gm->write("./f/thumbnail.jpg"); 18 19 unlink($_FILES["upload"]["tmp_file"]); 20 } 21 } 22 }
Hooray! No more nasty exec()calls, and since we can share the GMagick object for multiple resizes, we only have to read the raw image data into memory once opposed to twice with ImageMagick. This will only work if you’re downsizing every step.
Newer versions of GraphicsMagick and ImageMagick are built with OpenMP, a framework for parallelizing image processing across multiple cores. In theory, sounds great— but I ran into a number of issues with random seg faults and crashes. I prefer to either recompilelibgraphicsmagickwith--disable-openmpor setputenv(MAGICK_- THREAD_LIMIT=1);in PHP to disable OpenMP.
Benchmarking ImageMagick vs GraphicsMagick
In a fairly straightforward benchmark, I found that GraphicsMagick was nearly twice as fast as ImageMagick in terms of requests per second— jumping to more than 3x faster if you’re able to re-use the same GraphicsMagick when doing multiple resizes. That’s an incredible payoff, with little effort, especially if you’re working with an upload-heavy application or have to do the image resizing inside of the web request.
Case Study: Optimizing image handling in PHP 191
ImageMagick vs GraphicsMagick