Bryon Hapgood, Kodiak Interactive
S
elf-modifying code, also known as "RAM-code," is a fascinating technique that actually allows a program to alter its own code as it executes. It has been used in everything from genetic algorithms to neural networks with amazing results. In games it can be used as a powerful optimization technique. Recently I used this technique on a GameBoy Color title Test Drive Cycles to decompress artwork on the fly at 60 fps, decode 14 palettes of color information (instead of the standard eight), and enable multiple levels of parallax scrolling. In this gem, we will cover how to write self-modifying applications.The Principles of RAM-Code
RAM-code is a simple idea, but one that can take an inordinate amount of time to get just right. It is written for the most part in hexadecimal and can be difficult to debug.
Let's look at a very simple case. We want to load a pointer from a 16-bit variable stored somewhere in RAM.
getjil:
Id hl,ptr_var Load HL register with the address ptr_var
Id a,(hli) Id h,(hl) Id l,a
Load A register with low byte and increment HL
Load L register with high byte of ptr_var Save low byte into L
ret ; Return
This example can be improved by writing it as:
getjil:
db $2a ; Id hi,...
ptr_var
dw $0000 ; ...ptr_var ret
These two routines are logically no different from each other, but can you see the difference? The second example is stating the variable that stores the address to be loaded in HL as an immediate value! In other words, instead of physically going out and loading an address, we just load HL. It's quicker to load an immediate value than
91
to access main memory, and because there are fewer bytes to decode, the code runs much faster. We can take that idea much further when it comes to preserving regis-ters. Instead of pushing and popping everything, which can be expensive, we simply write the value ahead into the code. For example, instead of writing:
get_hl: : Id hl,ptr_var
Id a,(hli) Id l,(hl) Id h,a Id a,(hi)
push af ; Save A register j
; do something with A
pop af ; Restore A registerj ret
this code can be optimized down to:
getjil:
db $2a ; Id hi,...
ptr_var
dw ptr_var ; ...ptr_var Id a,(hi)
Id (var1),a i
; do something with A db $2F ; Id a , . . .
varl db $00 ; ...saved register value ret
This is not a huge saving, but it illustrates the point.
A Fast Bit Blitter
In many games, it is often crucial to convert from one pixel format to another, such as from 16-bit (565) RGB to 24-bit RGB. Whether this is done in some offline tool or within the game itself can be satisfied with this one routine. We can define a structure (call it BITMAP) that contains information about an image. From this, our blitter can then use RAM-code techniques to construct an execute-buffer—a piece of code that has been allocated with malloc and filled with assembly instructions.
The blitter works by taking a routine that knows how to read 16-bit (565) RGB pixels and convert them to 32-bit RGBA values, and a routine that knows how to write them in another format. We can paste these two functions together, once for images with odd widths, or multiple times in succession to effectively unroll our loop.
The example shown next takes the former approach.
So, let's define our bitmap structure and associated enumerated types.
enum Format{
Now, it's really important that we have the same structure defined on the assem-bly side of things.
The next step is to define our execute buffer.
execute_buffer db 128 dup(?)
For this code to work in C++, we must use a mangled C++ name for the member function BITMAP: :draw. After that comes some initialization code:
?draw@BITMAP@<aQAEXHHAAU1 @HHHH@Z:
push lea
ebp
ebp,[esp+8] get arguments address
push ebx
The first thing we must decide is whether we need to do a conversion at all.
Therefore, we test to see if the two pixel formats of the bitmap objects are the same. If so, we can further ask whether they are the same size. If that is the case, we can just do a fast string copy from one to the other. If not, but they're the same width, then we can still do the string copy. If the two have different widths, then we can do string copies line by line.
; same w different h
i
; find smallest h -> ebx j
add ebp,12
; calc offsets with intentional reg swap push
If the two bitmaps have completely different pixel formats, we have no choice but to convert every single pixel from one format to the other. The following code shows this in action. There's another way to further improve this routine by unrolling the loop—this would be as simple as repeating the build step four or more times.
dislike :lea
stosd
pq:
Another important step in this blitter is to correctly calculate the x and y offsets into the source and destination images. This routine does exactly that.
calc esdi:
For this whole RAM-code idea to work, we need some initialization that gets placed at the top of the RAM-code buffer. It simply loads the ECX register with the number of scan lines to copy.
exec head dd
db OB9h,OOOh,OOOh,OOOh,OOOh
(size) mov ecx,0
The next few routines are the actual read and write routines (RC and WC). The first byte tells us how many bytes make up the code in each subroutine.
RC_BGR_1x8 dd 18
WC_BGR_4x8
that tells us which routine
RC BGR 3x8
to use for every pixel format in