Input preparation - Recognition methods - Multi-touch Interaction with Gesture Recognition

4.2 Recognition methods

4.2.1 Input preparation

Common for all the gesture recognition methods is preparation of the input data. This has to be done because computers do not have the same gesture recognition ability as humans do. To people gestures would look almost the same regardless of where and how they were drawn. A computer relies only on coordinates, and a small gesture in the corner will look completely different from a big gesture drawn in the middle of the screen. Some people write small, some big, and in combination with either fast motions or really careful drawing it becomes evident that some kind of normalization is necessary. The input preparation code is inspired by the C code examples by Oleg Dopertchouk [8].

To make the recognition as accurate as possible the input need to be normalized and converted into a common format for storage. This will enable the possibility to later compare the gestures to new gestures. Since the chosen framework is PyMT, I have based the normalization part on the existing code, rather than recoding the whole procedure. The normalization process is based on common knowledge, and includes coordinate scaling, uniformly distributing the coordinates and gesture centering around point (0,0). The whole process can be seen in figure 4.3, and is further detailed below.

To make the process as fast as possible, it is important to keep it simple. The raw input from CCV is a stream of touch points with an associated ID sent over the TUIO protocol. The PyMT framework saves these points, and when a sequence of reoccurring IDs are identified, gesture recognition is initiated. Points with the same ID is grouped in a sequence, which from now on will be called a gesture stroke, or only stroke. It is this stroke that has to be normalized before it is used in the recognition phase.

During training of the gesture recognizer these normalized strokes are saved to permanent storage with an associated name. These names does not serve as an ID, so it will be possible to have different variations of in example a ’circle’. The name

44 CHAPTER 4. GESTURE RECOGNITION

along with the score for the best match is what a user will see when testing gestures on a fully trained system.

The first step in the normalization process is gesture stroke scaling. This is done in a fast an efficient way by finding the outer bounds of respectively the x and y dimension of the stroke coordinates. Then assign the scaled coordinates according to this formula 4.1. When the scaling is complete the gesture stroke will look like the second image in figure 4.3. See how the axes have changed from the first image.

(xs, ys) = (x− xmin xmax ,y− ymin ymax ) (4.1) 143 213 1 1 1 1 1 1 1 1

Figure 4.3: Illustrations of the four steps needed to prepare input for gesture recognition. The first image represent the raw input data received from the CCV. In the second image the gesture stroke has been scaled, and now all the points lies between 0 and 1. In the third image the gesture is normalized to include a predefined number of points with equal distance between them. The fourth image show the gesture stroke completely normalized and centered around origo.

In the second step we have to remove the variations caused by different peoples method of drawing gestures. A way to do this is to uniformly distribute the points in the gesture stroke. To be able to do this the gesture points we need to know the total length of the gesture. To measure the length, Pythagoras theorem is used. The length of all the individual line segments between each consecutive point in the gesture stroke is added together. For easier comparison to later gestures we should chose a fixed number of points to represent each gesture stroke. As long as all the gestures in the gesture database is represented with the same number of points, it does not matter what this number is. If you chose a small number of points the gesture recognition will execute fast, but you will lose some of the accuracy provided by a large number of points. 32 should be a good compromise between speed and accuracy, but this can be changed by a parameter later on if satisfactory results are not achieved.

When total length of the gesture stroke is calculated, the spacing between points can be obtained by dividing the total length by the number of points you have

4.2. RECOGNITION METHODS 45 chosen to represent the gesture stroke with. In this case 32. To find the new points the old gesture is traversed. For each 1/31 of the total length, a point is added to the new gesture. Regardless of how many points the gesture stroke included, this 32 point long stroke will then be the new uniformly distributed gesture. The result of this can bee seen in the third image in figure 4.3. Compared to the second image, the stroke in image three contains a less number of points.

Finally the gesture stroke has to be centered around (0,0). To do this the arithmetic-mean of the x-coordinates and the y-coordinates are calculated sepa- rately. The mean will tell where the middle of the gesture can be found. Each stroke point gets subtracted with the arithmetic-mean for x and y respectively, and the result will be that the middle is moved to co-locate with the (0,0). The result of this can be seen in the last image in figure 4.3.

The normalized gesture stroke is now made, and can be used in the recognition process.

In document Multi-touch Interaction with Gesture Recognition (Page 65-67)