Finite State Machine - Techniques for gesture recognition

A.2 Techniques for gesture recognition

A.2.2 Finite State Machine

A finite state machine (FSM) or finite state automaton, is a mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time. The state it is in at any given time is called the current state. When a triggering event or condition occurs, the machine can change from one state to another. This is called a transition. Formally, a finite state machine is defined as a 5-tuple (S, I, f , S0, F) where,

• S is a finite set of states.

• I is a finite set of symbols or the alphabet. • f : S × I → S is the transition function. • S0is an element of S called the start state, and

126 Appendix A. Tools for Gesture Recognition

Figure A.9: An Example Moore Machine.

• F is a subset of S called the set of accept states.

The Moore and Mealy machines are extensions of the FSM that add an output alphabet and a function to generate the output.

A Moore machine is a 6-tuple (S, I, O, f , S0, g) where,

• S is a finite set of states.

• I is a finite set of symbols called the input alphabet. • O is a finite set of symbols called the output alphabet. • f : S × I → S is the transition function.

• S0is an element of S called the start state, and

• g : S → O is the output function mapping the current state to the output. In the Moore machine, the set of accept states, F, has been replaced with an output function giving the symbols to be generated. In a Moore machine, an output symbol is generated each time a state is entered. This output symbol does not depend on how the state was entered, which means that the output is strictly a function of the state being entered and not of the input symbol being read. Figure A.9 shows a diagram of a simple Moore machine. Each state label includes the state number followed by the output symbol to be generated when this state is entered. Each of the six elements of this example Moore machine are given below,

A.2. Techniques for gesture recognition 127

• S = 1, 2, 3.

• I = i, which can have values X,Y,Z. • O = o, which can have values A,B.

• f = ([1, X ] → 2, [1,Y ] → 3, [1, Z] → 1, [2, X ] → 3, [2,Y ] → 1, [2, Z] → 2, [3, X ] → 1, [3,Y ] → 3, [3, Z] → 3). • S0= 1 • g = (1 → A, 2 → B, 3 → A).

Sometimes it would be useful to generate a different output depending upon the input symbol being read and the state from which the transition is occurring. The Mealy machine offers this capability. A Mealy machine is a 6-tuple (S, I, O, f , S0, h) where,

• S is a finite set of states.

• I is a finite set of symbols called the input alphabet. • O is a finite set of symbols called the output alphabet. • f : S × I → S is the transition function.

• S0is an element of S called the start state, and

• h : S × I → O is the output function mapping the current transition to the output.

The Mealy machine is the same as the Moore machine except that the output function ghas been replaced with the output function h, which maps the Cartesian product of the set of states S and the set of input symbols I to the set of output symbols O. This means that the symbol being output depends on the transition rather than the state being entered. Figure A.10 shows a Mealy machine designed to produce the same output as the Moore machine shown in Figure A.9.

128 Appendix A. Tools for Gesture Recognition

Figure A.10: An Example Mealy Machine.

In the FSM approach, a gesture can be modeled as an ordered sequence of states in a spatio-temporal configuration space [137, 138, 139, 140]. The number of states in the FSM may vary between applications. Generally a gesture is represented by a prototype trajectory defined as a set of points (e.g., sampled positions of the head, hand, and eyes) or as a set of motion properties (speed, direction, etc.).

The training of the model is done off-line, using many possible examples of each gesture as training data, and the parameters (criteria or characteristics) of each state in the FSM are derived. The recognition of gestures can be performed online using the trained FSM. When input data (feature vectors such as trajectories) are supplied to the gesture recognizer, the latter decides whether to stay at the current state of the FSM or jump to the next state based on the parameters of the input data. If it reaches a final state, we say that a gesture has been recognized.

The state-based representation can be extended to accommodate multiple models for the representation of different gestures, or even different phases of the same gesture. Membership in a state is determined by how well the state models can represent the current observation. If more than one model (gesture recognizer) reach their final states at the same time, we can apply a winning criteria to choose the most probable gesture.

In document Toward Effective Physical Human-Robot Interaction (Page 131-134)