Verify Optimized Code Against the Unoptimized Version

The Problem: We've decided that Hack 92 works for us. But how can we tell if we've implemented the class correctly?

The Hack: When replacing one system with another, test twice and compare results.

In this Hack 92 we started with pixels represented by float and replaced it with a

fixed_point

implementation. Both versions should produce the same result.

But as hackers we know that there's a vast difference between “should” and “is”. That's where testing comes in.

A good test to see if the optimization was correct is to run the unoptimized program and save the results. Then run the optimized version and check to see that we get the same results.

In real life Hack 92 was used to speed up a sophisticated dithering

algorithm for a color inkjet printer. When a binary comparison was done on the output the result were different. So in spite of what our high powered numerical analysis expert said, two digits were not enough to generate identical results.

However when the test images were printed, the optimized results looked just as good to the Print Committee as the unoptimized versions. So the

computer could tell the difference between the two algorithms but the Print Committee could not. In the inkjet business, the Print Committee rules so we kept the newer, faster algorithm.

There's a moral to this story and as soon an I figure out what it is I'll put it in this book.

Case Study: Optimizing bits_to_bytes

Let's take a look how a hacker would optimize a simple function. The job of this function is to figure out how many bytes it takes to store a given number of bits. Here's the function as it first appeared:

short int bits_to_bytes(short int bits) {

short int bytes = bits / 8; if ((bytes % 8) != 0) { bits++;

}

return (bytes); }

The first thing we notice about this function is that the code is pretty bad. As hackers we are frequently faced with horrible code.

I once optimized a program that was taking 20 hours per run down to the point where it was taking about 8 seconds per run. Now I'm a good hacker, but I'm not that good. The original program was very badly written. In defense of the original programmer, it was the first program he had ever written and he did a remarkably job of implementing a sophisticated cryptographic algorithm

despite not knowing many basic features of the C language.

The

_{bits_to_bytes}

function was targeted for a cell phone (ARM processor). Running the code through the compiler and taking a look at the assembly code we find some interesting things going on. For example the implementation of the line:

short int bytes = bits / 8;

The generated code looks like:

movw r1, bits ; r1 (bytes) = bits

asr r1, 3 ; r1 = r2 >> 3 (aka r1 = r1/8) lsl r1, 16 ; r1 = r1 << 16 (?)

asr r1, 16 ; r1 = r1 >> 16 (?)

What's going on with the two funny instructions that shift the result left by 16 then right by 16? At first glance this is a rather useless piece of code.

The processor uses 32 bit arithmetic. On this machine a short int is 16 bits. When the system does the divide by 8 a 32 bit result is generated. So the compiler generates two instruction designed to convert a 32 bit value into a 16 bit one.

This occurs after the divide and after the increment. This gives us a total of 4 useless instructions.

Changing the function to use int instead of short int eliminates these instructions and makes our code faster.

The next step is to see if we can write a better algorithm and eliminate that conditional:

int bits_to_bytes(int bits) {

return (bits + 7) / 8; }

This completely eliminates all the conditional logic an saves us a few more instructions. Now the actual body of the function is just a hand full of

How about cutting it down to 0 instructions? It's possible. All we have to is to add the inline keyword:

inline int bits_to_bytes(int bits) {

return (bits + 7) / 8; }

Now when the optimizer sees a line like:

store_size = bits_to_bytes(41);

it will optimize it down to;

store_size = 6;

The function is not even called. All the computations are being done at compile time.

But we're not done optimizing. There's a difference between a function declared inline and one declared static inline. When a function is declared inline the compiler will inline all the function calls it sees, then generate a regular non-inline function body in case someone from the outside wants to call this function.

To counter this we declare our function static inline and stick it in a header file. Thus saving us a couple of dozen bytes in this example – total – in a 3.5MB program.

Now as hackers there's one more thing we need to consider. What

happens when things go wrong. After all we never trust the caller to do things right and the we can be called with a negative number. We need to answer the question “How much storage does

-87

bits take up?”

The easiest thing to do is to assume that this will never happen and just ignore it. But if we do this we need to document the fact in the program.

* bits_to_bytes – Given a number of bits, return the * number of bytes needed to store them.

* WARNING: This function does no error checking so

* if you give it a very wrong value you get a very wrong * result.

Assuming something bad will never happen is not really a good idea. As hackers we know that lots of things that can “never happen” actually do.

For example, we could insert an

assert

statement:

static inline int bits_to_bytes(int bits) {

assert(bits >= 0); return (bits + 7) / 8; }

The problem with

assert

statements is that they cause the program to abort. In real life this code lived in a mobile phone and a failed

_assert

would cause the phone to reset. This was not good because when this happened the phone reset it would play the “welcome sound”. End users were wondering why their phone would restart for “no reason at all”.

The phone maker's “solution” to this problem was simple. They changed the code so that a a failed

_assert

restarted the phone silently. The code was still very buggy, but the bugs became less visible (audible?) to the end users.

Also you should remember that assertions can be compiled out, so this code provides no protection at all.

Since this is C++ throwing an exception is one way of handling the error:

static inline int bits_to_bytes(int bits) {

if (bits < 0) throw(memory_error(“bits_to_bytes”); return (bits + 7) / 8;

}

These error checking options are expensive. One inexpensive thing we can do is to make sure that error can never happen. How do we do that? All we have to is make the argument (and the return value) unsigned.

static inline unsigned int

bits_to_bytes(unsigned int bits) {

return (bits + 7) / 8; }

Finally there's one more way of dealing with this error. Just change the function to silently ignore it and return a default value:

static inline int bits_to_bytes(int bits) {

if (bits < 0) return (0); return (bits + 7) / 8; }

This is usually not a good idea since such code tends to hide errors in other pieces of code. In general you don't want to silently fix things. Maybe output a log message:

static inline int bits_to_bytes(int bits) {

if (bits < 0) { log_error(

“Illegal parameter in bits_to_bytes(%d)”, bits);

log_error(“Standard fixup taken”); return (0);

}

return (bits + 7) / 8; }

In examining bits_to_bytes we can see that it is actually a very short function. But it does illustrate some of the things good hackers consider when working with code. These include:

● Dealing with lousy code

● Knowing how the compiler generates code and designing your code to make optimal use of this information.

● Making maximum use of the language features such as inline and static inline.

● Being paranoid17. Deciding what to do with bad data.

There's a lot to be learned from this little program. As hackers we're always learning. We study, research, experiment, and play all to gain a better understanding of the programming process.

Chapter 9: g++ Hacks

The GNU gcc package is one of the most C and C++ compilers out there. Two things lead to its popularity. First it's a high quality compiler. The second is that it's free.

The GNU compiler has extended the C and C++ languages in some useful ways (and some useless ones too). If you are willing to sacrifice portability the non-standard language can be very useful.

In document The C++ Hackers Guide - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 138-144)