• No results found

The idea of a computer program is that it is a sequence of instructions: in this book we are looking at machine instructions that the CPU directly understands. Assembly language is just a symbolic (more meaningful) way of writing the machine instructions.

The CPU executes the instructions sequentially that is, one after the other in order of increasing addresses but can also jump out of sequence.

LOOP, JMP, The topic of this section is those instructions that cause execution

CALL. to to some other place in the program. The main ones are:

CALL, and Jx. this section we will examine CALL, JMP, and Jx. LOOP and INT are examined a little bit later:

Figure 2.2: Stack handling for CALL and RET. Involvement of the stack for CALL

and RET. These two must always occur in pairs.

cs

Code segment . . . . .

InthecaseofaNEARCALL,onlythe . . . .

CPU’s offset is altered: a FAR CALL Note that the instruction has theCALL

will also alter CS. value as its

The CALL pushes onto the stack, and loads its operand into IP.

CPU

Stack segment When has the new value, the

subroutine is executed, and the RET instruction causes a return to the caller, by popping off the stack, back into

. . . . . . . .

Figure 2.2 illustrates how the CALL and its companion RET use the stack. The basic idea is that the value in the Instruction Pointer, IP, is always the next instruction to be executed, so when “CALL ROUTINEX” is executing, IP will have in it. Since the value in IP has to be changed to the subroutine, the return value has to be saved somewhere: hence the stack is used to save

The RET instruction must always be placed at the end of a procedure, as it pops the top off the stack, back into IP.

If you have programmed in C or Pascal, you know that you don’t put a RET, or anything special, at the end of a procedure or function. CALL and RET do go into the code, though, because the

FAR and NEAR Code labels Code DEBUG What is DEBUG?

compiler translates the high-level source code to machine instructions.

This topic does need some careful thought. Any CALL, RET, or JMP instruction can be a FAR or NEAR jump. What this means is that if the jump is NEAR, the jump is only within the current code segment; that is, only the IP is altered, as per Figure 2.2.

A FAR jump or call, however, can be to anywhere in the entire address range, as both CS and IP are altered. In Figure 2.2, the procedure ROUTINEX is shown as being in the same code segment as the CALL instruction, but it could be somewhere entirely different. Obviously, if ROUTINEX is in a different code segment, then both CS and IP in the CPU would have to be changed to the new values.

Note that it also logically follows that the original values of CS:IP, immediately after the CALL, would both have to be saved on the stack, and RET would have to restore both of them at the end of the procedure.

Note that with what is called 32-bit programming, the distinction between NEAR and FAR just about disappears.

One thing that you will notice Figure 2.2, is that I used a code label, ROUTINEX, to name the start of the procedure. This is basically what you expect to be able to do in any high-level language, and you can also do this in assembly language. A code label marks, or identities, that point in the code, hence a CALL was able to be made to that place.

With a professional assembler, such as the Borland TASM, or Microsoft MASM, these labels are a normal part of writing a program, but DEBUG is a different story.

DEBUG CANNOT HAVE LABELS!

With DEBUG any instruction that transfers control to another address must contain the actual offset.

What is DEBUG? It is a program that comes with DOS, and from the DOS prompt you will only have to type the name of the program to execute it. DEBUG.EXE is a way of becoming familiar with the instruction set it allows you to try out the instructions and put together simple programs.

These examples show that DEBUG must have an actual address, not labels: MOV is at 113 (say) MOV LOOP 113 -arbitrary instr

a l a b e l .

SHORT, NEAR, and FAR

However, by writing the code in “proper” assembly language, we do not need to know actual addresses. The second example here shows how a proper assembler can have a symbolic address marker, in this case PLACE1 .

In Figure 2.2, we looked at a CALL instruction, but there is also a JMP (jump) instruction that transfers execution to the address specified in its operand in the same manner as the CALL instruction, but with a major difference: no return address is saved on the stack. This is because JMP is used when you do not want execution to come back.

It was also explained above that the CALL can be NEAR or FAR, but the JMP can be SHORT, NEAR, or FAR.

The example code below shows a JMP to a label. Usually, an assembler defaults to a NEAR jump, as the destination is usually in the same segment.

jmp PLACE1

l a b e l .

mov i n s t r u c t i o n .

At this point, it is instructive to consider how the assembler will assemble this instruction into memory. Obviously, it has to be converted to “machine language”, or binary bits. That is what any compiler or assembler does.

Figure 2.3: Generation of machine code, NEAR jump.

Increasing addresses downward

In Figure 2.3 you can see the basic scenario. The first one (or sometimes two) memory location(s) contain the instruction-code, or operation-code, often referred to as the op-code, that identifies this as a instruction (or whatever), while the following zero or more bytes are the operand.

In the case of the NEAR jump instruction, the operand contains a offset, which is the place to jump to. But, and this is most important, the addressing structure of all the Intel x86 uses

FAR

SHORT

Range of a

byte addressing, meaning that each address addresses a one-byte (8 bit) memory location.

Therefore, the operand requires two memory locations, as shown in Figure 2.3 as operand-low and operand-high. The Intel x86 convention is that the low-half of the value is stored at the lower address.

It is also useful to note that if the IMP is a FAR jump, that is, to another code segment, the operand of the instruction will have to contain the destination CS:IP, which is two values. Hence it would be 32 bits.

The FAR jump would assemble as the one-byte (or two) op-code, followed by a one-word IP then one-word CS value. Note that the FAR jump can also jump within the current code segment but is slightly inefficient because it is a longer instruction, taking a little longer to execute and using more memory.

The IMP instruction has one interesting difference from the CALL: it is able to perform a SHORT jump. This is shown in Figure 2.4:

Figure 2.4: SHORT jump machine code.

Operation-code downward Operand

This reduces the instruction down to the one-byte op-code

followed by a one-byte displacement. This

displacement allows jumps to be only to -128 about the current IP position.

In some circumstances, the assembler will automatically make the jump SHORT, but it can also be forced to, by means of the

SHORT directive.