• No results found

How Words Are Implemented

In document Low Level Programming (Page 133-139)

■ Question 115 draw a finite state machine to check whether there is an even or an odd number of words in the input string

7.2 Forth Machine

7.2.4 How Words Are Implemented

There are three ways to implement words.

• Indirect threaded code

• Direct threaded code

• Subroutine threaded code

We are using a classic indirect threaded code way. This type of code needs two special cells (which we can call Forth registers):

PC points at the next Forth command. We will see soon that the Forth command is an address of an address of the respective word’s assembly implementation code.

In other words, this is a pointer to an executable assembly code with two levels of indirection.

W is used in non-native words. When the word starts its execution, this register points at its first word.

These two registers can be implemented through a real register usage. Alternatively, their contents can be stored in memory.

Figure 7-10 shows how words are structured when using the indirect threaded code technique. It incorporates two words: a native word dup and a colon word square.

Each word stores the address of its native implementation (assembly code) immediately after the header. For colon words the implementation is always the same: docol. The implementation is called using the jmp instruction.

Execution token is the address of this cell, pointing to an implementation. So, an execution token is an address of an address of the word implementation. In other words, given the address A of a word entry in the dictionary, you can obtain its execution token by simply adding the total header size to A.

Listing 7-3 provides us with a sample dictionary. It contains two native words (starting at w_plus and w_dup) and a colon word (w_sq).

Listing 7-3. forth_dict_sample.asm section .data

w_plus:

dq 0 ; The first word's pointer to the previous word is zero db '+',0

db 0 ; No flags

xt_plus: ; Execution token for `plus`, equal to ; the address of its implementation dq plus_impl

w_dup:

dq w_plus db 'dup', 0 db 0 xt_dup:

dq dup_impl w_double:

dq w_dup db 'double', 0 db 0

dq docol ; The `docol` address -- one level of indirection dq xt_dup ; The words consisting `dup` start here.

Figure 7-10. Indirect threaded code

dq xt_plus

The core of the Forth engine is the inner interpreter. It is a simple assembly routine fetching code from memory. It is shown in Listing 7-4.

Listing 7-4. forth_next.asm

1. It reads memory starting at PC and sets up PC to the next instruction. Remember, that PC points to a memory cell, which stores execution token of a word.

2. It sets up W to the execution token value. In other words, after next is executed, W stores the address of a pointer to assembly implementation of the word.

3. Finally, it jumps to the implementation code.

Every native word implementation ends with the instruction jmp next. It ensures that the next instruction will be fetched.

To implement colon words we need to use a return stack in order to save and restore PC before and after a call.

While W is not useful when executing native words, it is quite important for the colon words. Let us take a look at docol, the implementation of all colon words, shown in Listing 7-5 It also features exit, another word designed to end all colon words.

Listing 7-5. forth_docol.asm

exit:

mov pc, [rstack]

add rstack, 8 jmp next

docol saves PC in the return stack and sets up new PC to the first execution token stored inside the current word. The return is performed by exit, which restores PC from the stack.

This mechanism is akin to a pair of instructions call/ret.

Question 119 read [32]

.

What is the difference between our approach (indirect threaded code) and direct threaded code and subroutine threaded code? What advantages and disadvantages can you name?

To better grasp the concept of an indirect threaded code and the innards of Forth, we prepared a minimal example shown in Listing 7-6. It uses routines developed in the first assignment from section 2.7.

Take your time to launch it (the source code is shipped with the book) and check that it really reads a word from input and outputs it back.

Listing 7-6. itc.asm

; this one cell is the program main_stub: dq xt_main

; The dictionary starts here

; The first word is shown in full

; Then we omit flags and links between nodes for brevity

; Each word stores an address of its assembly implementation

; Drops the topmost element from the stack dq 0 ; There is no previous node

db "drop", 0

; Initializes registers xt_init: dq i_init i_init:

mov rstack, rstack_start mov pc, main_stub jmp next

; Saves PC when the colon word starts xt_docol: dq i_docol

; Returns from the colon word xt_exit: dq i_exit

i_exit:

mov pc, [rstack]

add rstack, 8 jmp next

; Takes a buffer pointer from stack

; Reads a word from input and stores it

; starting in the given buffer xt_word: dq i_word

; Takes a pointer to a string from the stack

; and prints it

; Loads the predefined buffer address xt_inbuf: dq i_inbuf

i_inbuf:

push qword input_buf jmp next

; This is a colon word, it stores

; execution tokens. Each token

; corresponds to a Forth word to be

; executed

; The inner interpreter. These three lines

; fetch the next instruction and start its

; execution next:

mov w, [pc]

add pc, 8 jmp [w]

; The program starts execution from the init word _start: jmp i_init

7.2.5 Compiler

Forth can work in either interpreter or compiler mode. Interpreter just reads commands and executes them.

When executing the colon : word, Forth switches into compiler mode. Additionally, the colon : reads one next word and uses it to create a new entry in the dictionary with docol as implementation. Then Forth reads words, locates them in dictionary, and adds them to the current word being defined.

So, we have to add another variable here, which stores the address of the current position to write words in compile mode. Each write will advance here by one cell.

To quit compiler mode we need special immediate words. They are executed no matter which mode we are in. Without them we would never be able to exit compiler mode. The immediate words are marked with an immediate flag.

The interpreter puts numbers in the stack. The compiler cannot embed them in words directly, because otherwise they will be treated as execution tokens. Trying to launch a command by an execution token 42 will most certainly result in a segmentation fault. However, the solution is to use a special word lit followed by the number itself. The lit’s purpose is to read the next integer that PC points at and advance PC by one cell further, so that PC will never point at the embedded operand.

7.2.5.1 Forth Conditionals

We will make two words stand out in our Forth dialect: branch n and 0branch n. They are only allowed in compilation mode!

They are similar to lit n because the offset is stored immediately after their execution token.

In document Low Level Programming (Page 133-139)