Note all instructions have different size!

Another accessible register is called rflags. It stores flags, which reflect the current program state—for example, what was the result of the last arithmetic instruction: was it negative, did an overflow happened, etc. Its smaller parts are called eflags (32 bit) and flags (16 bit).

■

Question 1 it is time to do preliminary research based on the documentation [15]. refer to section 3.4.3 of the first volume to learn about register

rflags

. What is the meaning of flags

,

? What is the difference between

and

?

In addition to these core registers there are also registers used by instructions working with floating point numbers or special parallelized instructions able to perform similar actions on multiple pairs of operands at the same time. These instructions are often used for multimedia purposes (they help speed up multimedia decoding algorithms). The corresponding registers are 128-bit wide and named xmm0 - xmm15.

We will talk about them later.

Some registers have appeared as non-standard extensions but became standardized shortly after. These are so-called model-specific registers. See section 6.3.1 “Model specific registers” for more details.

1.3.3 System Registers

Some registers are designed specifically to be used by the OS. They do not hold values used in computations.

Instead, they store information required by system-wide data structures. Thus their role is supporting a framework, born from a symbiosis of the OS and CPU. All applications are running inside this framework.

The latter ensures that applications are well isolated from the system itself and from one another; it also manages resources in a way more or less transparent for a programmer.

It is extremely important that these registers are inaccessible by applications themselves (at least the applications should not be able to modify them). This is the goal of privileged mode (see section 3.2).

We will list some of these registers here. Their meaning will be explained in detail later.

• cr0, cr4 store flags related to different processor modes and virtual memory;

• cr2, cr3 are used to support virtual memory (see sections 4.2 “Motivation”, 4.7.1

“Virtual address structure”);

Figure 1-4. rax decomposition

• cr8 (aliased as tpr) is used to perform a fine tuning of the interrupts mechanism (see section 6.2 “Interrupts”).

• efer is another flag register used to control processor modes and extensions (e.g., long mode and system calls handling).

• idtr stores the address of the interrupt descriptors table (see section 6.2 “Interrupts”).

• gdtr and ldtr store the addresses of the descriptor tables (see section 3.2 “Protected mode”).

• cs, ds, ss, es, gs, fs are so-called segment registers. The segmentation mechanism they provide is considered legacy for many years now, but a part of it is still used to implement privileged mode. See section 3.2 “Protected mode”.

Figure 1-5. rsi and rdi decomposition

Figure 1-6. rsp and rbp decomposition

1.4 Protection Rings

Protection rings are one of the mechanisms designed to limit the applications’ capabilities for security and robustness reasons. They were invented for Multics OS, a direct predecessor of Unix. Each ring corresponds to a certain privilege level. Each instruction type is linked with one or more privilege levels and is not executable on others. The current privilege level is stored somehow (e.g., inside a special register).

Intel 64 has four privilege levels, of which only two are used in practice: ring-0 (the most privileged) and ring-3 (the least privileged). The middle rings were planned to be used for drivers and OS services, but popular OSs did not adopt this approach.

In long mode, the current protection ring number is stored in the lowest two bits of register cs (and duplicated in those of ss). It can only be changed when handling an interrupt or a system call. So an application cannot execute an arbitrary code with elevated privilege levels: it can only call an interrupt handler or perform a system call. See Chapter 3 “Legacy” for more information.

1.5 Hardware Stack

If we are talking about data structures in general, a stack is a data structure, a container with two operations: a new element can be placed on top of the stack (push); the top element can be taken away from the stack (pop).

There is a hardware support for such data structure. It does not mean there is also a separate stack memory. It is just sort of an emulation implemented with two machine instructions (push and pop) and a register (rsp). The rsp register holds an address of the topmost element of the stack. The instructions perform as follows:

• push argument

1. Depending on argument size (2, 4, and 8 bytes are allowed), the rsp value is decreased by 2, 4, or 8.

2. An argument is stored in memory starting at the address, taken from the modified rsp.

• pop argument

1. The topmost stack element is copied into the register/memory.

2. rsp is increased by the size of its argument. An augmented architecture is represented in Figure 1-7.

The hardware stack is most useful to implement function calls in higher-level languages. When a function A calls another function B, it uses the stack to save the context of computations to return to it after B terminates.

Here are some important facts about the hardware stack, most of which follow from its description:

1. There is no such situation as an empty stack, even if we performed push zero times.

A pop algorithm can be executed anyway, probably returning a garbage “topmost”

stack element.

2. Stack grows toward zero address.

3. Almost all kinds of its operands are considered signed integers and thus can be expanded with sign bit. For example, performing push with an argument B9₁₆ will result in the following data unit being stored on the stack:

0xff b9, 0xffffffb9 or 0xff ff ff ff ff ff ff b9.

By default, push uses an 8-byte operand size. Thus an instruction push -1 will store 0xff ff ff ff ff ff ff ff on the stack.

4. Most architectures that support stack use the same principle with its top defined by some register. What differs, however, is the meaning of the respective address.

On some architectures it is the address of the next element, which will be written on the next push. On others it is the address of the last element already pushed into the stack.

Figure 1-7. Intel 64, registers and stack

■

Working with Intel docs: How to read instruction descriptions open the second volume of [15]. Find the page corresponding to the

push

instruction. it begins with a table. For our purpose we will only investigate the columns opCoDe, iNstruCtioN, 64-Bit moDe, and DesCriptioN. the opCoDe field defines the machine encoding of an instruction (operation code). as you see, there are options and each option corresponds to a different DesCriptioN. it means that sometimes not only the operands vary but also the operation codes themselves.

iNstruCtioN describes the instruction mnemonics and allowed operand types. here r stands for any general purpose register, m stands for memory location, imm stands for immediate value (e.g., integer constant like 42 or 1337). a number defines operand size. if only specific registers are allowed, they are named. For example:

•

push r/m16

—push a general purpose 16-bit register or a 16-bit number taken from memory into the stack.

•

push CS

—push a segment register

.

the DesCriptioN column gives a brief explanation of the instruction’s effects. it is often enough to understand and use the instruction.

• read the further explanation of

push

. When is the operand not sign extended?

• explain all effects of the instruction

push rsp

on memory and registers.

1.6 Summary

In this chapter we provided a quick overview of von Neumann architecture. We have started adding features to this model to make it more adequate for describing modern processors. So far we took a closer look at registers and the hardware stack. The next step is to start programming in assembly, and that is what the next chapter is dedicated to. We are going to view some sample programs, pinpoint several new architectural features (such as endianness and addressing modes), and design a simple input/output library for *nix to ease interaction with a user.

In document Low Level Programming (Page 34-38)