Overview of graphics file formats
2.2 F INDING THE SIZE AND TYPE OF AN IMAGE
The first thing you do when you receive an image for further manipulation is to find out as much information about it as necessary. In the case of image files, that is, at the very least, the size and type of the image. Once you have such data, you can make decisions about which steps to take to get the image into the required format.
2.2.1 Image::Size
For most Perl applications, the fastest and easiest way to find the size of an image is to use the Image::Size module. It will handle the most commonly used image formats and it is easy to use. The imgsize() subroutine takes either a file name or an open file handle as its argument, and returns the width, height, and type of image.
use Image::Size qw(:all);
my $img_file = 'file.gif';
my ($width, $height, $id) = imgsize($img_file);
open(IN, $img_file) or die "Cannot open $img_file: $!";
($width, $height, $id) = imgsize(\*IN);
The last argument on the use Image::Size line is a directive to the standard Perl exporting mechanism. Many modules make some of their internal names optionally available for export.3 Image::Size always exports imgsize() into the caller’s name space, and optionally allows the import of the html_imgsize() and attr_imgsize() functions. The tag :all imports all three of them. In other words, if you only plan to use the imgsize() function, a simple use Image::Size; will suffice.
Alternatively, you can read the file yourself and pass a reference to the file contents to the imgsize() subroutine. This can also be handy if you get your image data from a source other than a file, such as a database or a pipe.
binmode(IN);
my $img_buf;
{
local($/) = undef;
$img_buf = <IN>;
}
close(IN);
($width, $height, $id) = imgsize(\$img_buf);
3 For a full explanation, see the documentation of the standard Perl module Exporter.
Image::Size also offers two convenient methods, which can be used to generate HTML tags in a print statement:
use Image::Size qw(html_imgsize);
my $html_width_height = html_imgsize($img_file);
print qq(<IMG SRC="$img_file" $html_width_height>);
or to cooperate with the methods of the CGI module.
use CGI qw(:standard);
use Image::Size qw(attr_imgsize);
my @width_height_attributes = attr_imgsize($img_file);
print img {src => $img_file, @width_height_attributes};
or directly:
print img {src => $img_file, attr_imgsize($img_file)};
2.2.2 Image::Magick
If you have images in a format that Image::Size doesn’t support, then you still have a few options. The simplest is to use a more powerful module, such as Image::Magick.
Image::Magick’s Ping() method gives you the width and height of the images in a file of any of the formats it can read. As a bonus you also get the size (in bytes) and format of the image.4
use Image::Magick;
my $img_file = 'file.gif';
my ($width, $height, $size, $format) = Image::Magick->Ping($img_file) or die "Cannot get info for $img_file";
This works well if you are interested only in the dimensions of the image and don’t plan to do anything else with it. If, however, you also need to read the image for manipulation with Image::Magick, it is probably better to do something such as:
my $img_file = 'file.gif';
my $im = Image::Magick->new();
my $rc = $im->Read($img_file);
die "Cannot read $img_file: $rc" if $rc;
my ($width, $height, $format) = $im->Get('width', 'height', 'magick');
Note that most Image::Magick methods return undef on success.5 This means that you have to check whether the return value is true to detect an error, while most of the time, in Perl, you check whether a return value is false. This can be a bit counterintuitive.
Image::Magick is a fairly large and heavy module, which takes quite some CPU power to load. The reason for this is that Image::Magick is a very general purpose graphics manipulation module, and anything that is general purpose is bound to be
4 In older versions of Image::Magick, the Ping() method returned a single string with comma-separated fields in the same order as in the example. A further explanation of this method appears on page 268.
5 There are some exceptions to this rule; see appendix A, on page 241.
slower than something that has been written with only one specific task in mind.
Newer versions have improved this situation by delaying the load phase of many com-ponents until they’re needed. If you need to know some information on only one or two images, loading Image::Magick just for this might be too expensive.6 If you plan to procure information on many images, the cost of loading is negligible.
2.2.3 Do it yourself
If you need something fast and lightweight that will work almost everywhere, espe-cially when you know that you will only have to deal with one file format, writing your own subroutines can be the best option. As an illustration, we will do this for PNG and for XCF, the native format for the Gimp. The subroutines will return the same values as the imgsize() subroutine from Image::Size.
PNG
The PNG format[14] specifies a fixed header of 8 bytes, followed by a chunk of type IHDR. This chunk first contains a 4-byte integer, then a 4-byte identifier, and two 4-byte integers for the width and height. In PNG, all integers are stored in network byte order. If we translate this knowledge into Perl code, we get something like the following:
sub png_size {
my $file = shift or return;
my $buf;
local(*IMG);
open(IMG, $file) or return;
binmode(IMG);
read(IMG, $buf, 24);
my ($hdr, $l, $ihdr, $w, $h) = unpack("a8 N a4 N N", $buf);
return unless
$hdr eq "\x89PNG\x0d\x0a\x1a\x0a" &&
$ihdr eq 'IHDR';
return ($w, $h, 'PNG');
}
You will notice the local(*IMG) and the absence of an explicit close(IMG). By localizing the file handle, we first make certain that we don’t trample on any file han-dles in the rest of the program, and we assure that the file gets closed on exiting the block, i.e., when the subroutine returns. In more modern versions of Perl (post 5.6.0) you can also use a lexically scoped variable as a file handle, which has the same effect.
In this subroutine, unpack() is used to split up the binary information in the 24 header bytes into the parts in which we are interested. The translation of the unpack
6 Generally, the newer your version of Image::Magick, the less this is a problem. For example, version 5.4.4 is about 20 percent faster than the previous version.
O
Read the first 8 + 4 + 4 + 4 + 4 bytestemplate can almost literally be found in the paragraph preceding the code. The sec-ond return value of the unpack() operation, which is the total length of the IHDR chunk, is captured, but not used, because it is not important to us. Next $hdr and
$ihdr are checked to see if they are what they should be, and if they are not, a false value is returned. To round things off, the width, height and image type are returned.
The Gimp’s XCF format
The documentation on XCF, the native format for the Gimp, is distributed with its source code, and is, in fact, the source code. So, to learn how to read this format, we have to get our hands on a source distribution of the GIMP.7 In the file apps/xcf.c we find that there are currently (as of version 1.1.10) two versions of the file format, in comments called versions 0 and 1. The only difference for our purposes is the version number in the header of the file.
The first thing to be noted about the XCF format is that, like in the PNG format, integers are stored in network byte order. We read from the source code that the first 9 bytes contain a fixed header, and that the version number can be found, as a null terminated string, in the next 5 bytes. The width and height of the image are the next two 4-byte integers:
sub xcf_size {
my $file = shift or return;
my $buf;
local(*IMG);
open(IMG, $file) or return;
binmode(IMG);
read(IMG, $buf, 22);
my ($hdr, $v, $w, $h) = unpack("a9 Z5 N N", $buf);
return unless ($hdr eq "gimp xcf ");
SWITCH: {
$v eq 'file' and $v = 'XCF0', last SWITCH;
$v eq 'v001' and $v = 'XCF1', last SWITCH;
# Unknown version. $w and $h may be unreliable return;
}
return ($w, $h, $v);
}
This code is very similar to png_size(), except that the type of the image is deter-mined from the version number found in the file. The XCF version 0 format formerly had the first 13 bytes set to gimp xcf file, while the version 1 header contains gimpxcfv001. We capture the version as the last 4 bytes of this string, and rewrite it into something slightly more meaningful.
7 Full sources for the GIMP are available from http://www.gimp.org/.
O
Read the first 9 + 5 + 4 + 4 bytes2.2.4 More on file size and information
This section should have given you enough of a beginning to read the basic informa-tion of any image format. If none of the modules can handle it, you can still find out by reading this yourself, if you know the image format specification. Of course, if you don’t know that, you’re on your own, but before despairing, have a look at the Wotsit site [13], to see if it has the specification of the format you’re seeking.