In this chapter we are going to study two models of computations: finite state machines and stack machines.
Model of computation is akin to the language you are using to describe the solution to a problem.
Typically, a problem that is really hard to solve correctly in one model of computation can be close to trivial in another. This is the reason programmers who are knowledgeable about many different models of computations can be more productive. They solve problems in the model of computation that is most suitable and then they implement the solution with the tools they have at their disposal.
When you are trying to learn a new model of computation, do not think about it from the “old” point of view, like trying to think about finite state machines in terms of variables and assignments. Try to start fresh and logically build the new system of notions.
We already know much about Intel 64 and its model of computation, derived from von Neumann’s. This chapter will introduce finite state machines (used to implement regular expressions) and stack machines akin to the Forth machine.
7.1 Finite State Machines
7.1.1 Definition
Deterministic finite state machine (deterministic finite automaton) is an abstract machine that acts on input string, following some rules.
We will use “Finite automatons” and “state machines” interchangeably. To define a finite automaton, the following parts should be provided:
1. A set of states.
2. Alphabet—a set of symbols that can appear in the input string.
3. A selected start state.
4. One or multiple selected end states
5. Rules of transition between states. Each rule consumes a symbol from input string.
Its action can be described as: “if automaton is in state S and an input symbol C occurs, the next current state will be Z.”
If the current state has no rule for the current input symbol, we consider the automaton behavior undefined.
The undefined behavior is a concept known more to mathematicians than to engineers. For the sake of brevity we are describing only the “good” cases. The “bad” cases are of no interest to us, so we are not defining the machine behavior in them. However, when implementing such machines, we will consider all undefined cases as erroneous and leading to a special error state.
Why bother with automatons? Some tasks are particularly easy to solve when applying such paradigm of thinking. Such tasks include controlling embedded devices and searching substrings that match a certain pattern.
For example, we are checking, whether a string can be interpreted as an integer number. Let’s draw a diagram, shown in Figure 7-1. It defines several states and shows possible transitions between them.
• The alphabet consists of letters, spaces, digits, and punctuation signs.
• The set of states is {A, B, C}.
• The initial state is A.
• The final state is C.
Figure 7-1. Number recognition
Table 7-1. Tracing a finite state machine shown in Figure 7-1, input is: +34
OLD STATE RULE NEW STATE
A + B
B 3 C
C 4 C
We start execution from the state A. Each input symbol causes us to change current state based on available transitions.
■
Note arrows labeled with symbol ranges like 0. . . 9 actually denote multiple rules. each of these rules describes a transition for a single input character.
Table 7-1 shows what will happen when this machine is being executed with an input string +34. This is called a trace of execution.
The machine has arrived into the final state C. However, given an input idkfa, we could not have arrived into any state, because there are no rules to react to such input symbols. This is where the
automaton’s behavior is undefined. To make it total and always arrive in either yes- state or no-state, we have to add one more final state and add rules in all existing states. These rules should direct the execution into the new state in case no old rules match the input symbol.
7.1.2 Example: Bits Parity
We are given a string of zeros and ones. We want to find out whether there is an even or an odd number of ones. Figure 7-2 shows the solver in the form of a finite state machine.
Figure 7-2. Is the number of ones even in the input string?
The empty string has zero ones; zero is an even number. Because of this, the state A is both the starting and the final state.
All zeros are ignored no matter the state. However, each one occurring in input changes the state to the opposite one. If, given an input string, we arrive into the finite state A, then the number of ones is even. If we arrive into the finite state B, then it is odd.
■
Confusion in finite state machines, there is no memory, no assignments, no if-then-else constructions.
this is thus a completely different abstract machine comparing to the von neumann’s. there is really nothing but states and transitions between them. in the von neumann model, the state is the state of memory and register values.
7.1.3 Implementation in Assembly Language
After designing a finite state machine to solve a specific problem, it is trivial to implement this machine in an imperative programming language such as assembly or C.
Following is a straightforward way to implement such machines in assembly:
1. Make the designed automaton total: every state should possess transition rules for any possible input symbol. If this is not the case, add a separate state to design an error or an answer “no” to the problem being solved.
For simplicity we will call it the else-rule.
2. Implement a routine to get an input symbol. Keep in mind that a symbol is not necessarily a character: it can be a network packet, a user action, and other kinds of global events.
3. For each state we should
• Create a label.
• Call the input reading routine.
• Match input symbol with the ones described in transition rules and jump to corresponding states if they are equal.
• Handle all other symbols by the else-rule.
To implement the exemplary automaton in assembly, we will make it total first, as shown in Figure 7-3
Figure 7-3. Check if the string is a number: a total automaton
Figure 7-4. Check if the string is a number: a total automaton for a null-terminated string
We will modify this automaton a bit to force the input string to be null-terminated, as shown in Figure 7-4.
Listing 7-1 shows a sample implementation.
Listing 7-1. automaton_example_bits.asm section .text
; getsymbol is a routine to
; read a symbol (e.g. from stdin)
; into al _A:
call getsymbol cmp al, '+' je _B cmp al, '-' je _B
; The indices of the digit characters in ASCII
; tables fill a range from '0' = 0x30 to '9' = 0x39
; This logic implements the transitions to labels
; _E and _C cmp al, '0' jb _E
cmp al, '9'
; code to notify about success _E:
; code to notify about failure
This automaton is arriving into states D or E; the control will be passed to the instructions on either the _D or _E label.
The code can be isolated inside a function returning either 1 (true) in state _D or 0 (false) in state _E.
7.1.4 Practical Value
First of all, there is an important limitation: not all programs can be encoded as finite state machines. This model of computation is not Turing complete, it cannot analyze complex recursively constructed texts, such as XML-code.
C and assembly language are Turing complete, which means that they are more expressive and can be used to solve a wider range of problems.
For example, if the string length is not limited, we cannot count its length or the words in it. Each result would have been a state, and there is only a limited number of states in finite state machines, while the word count can be arbitrary large as well as the strings themselves.
■
Question 114 draw a finite state machine to count the words in the input string. the input length is no more than eight symbols.
The finite state machines are often used to describe embedded systems, such as coffee machines. The alphabet consists of events (buttons pressed); the input is a sequence of user actions.
The network protocols can often also be described as finite state machines. Every rule can be annotated with an optional output action: “if a symbol X is read, change state to Y and output a symbol Z.” The input consists of packets received and global events such as timeouts; the output is a sequence of packets sent.
There are also several verification techniques, such as model checking, that allow one to prove certain properties of finite automatons—for example, “if the automaton has reached the state B, he will never reach the state C.” Such proofs can be of a great value when building systems required to be highly reliable.
■