Robust and Automatic Optical Motion Tracking

(1)

Robust and Automatic Optical Motion Tracking

Alexander Hornung, Leif Kobbelt

Lehrstuhl für Informatik VIII RWTH Aachen

52074 Aachen Tel.: +49 (0)241 80-21815 Fax: +49 (0)241 80-22899 E-Mail: [email protected]

Abstract: Marker-based optical motion tracking is an established technique to capture and re-construct the skeleton and motion of a subject. However, several practical problems still arise, mostly due to ambiguities caused by occluded or wrongly identified markers which often makes post-processing by a human user inevitable. We present techniques to make marker-based optical motion tracking an automatic and robust process. The aim of our framework is a self-calibrating system, which automatically identifies rigid cliques of markers and recovers the skeleton topology and geometry of a tracked subject without any auxiliary information about the tracking setup. The gathered information is used to make the actual motion recording phase robust to marker occlu-sions by reconstructing missing limbs or joints of the subject using inverse kinematic methods. The resulting techniques provide a simple, general framework to perform optical motion tracking which minimizes the need for complex and manual post-processing.

Keywords: Optical Motion Tracking, Self-calibration, Retargetting, Animation

1 Introduction

Figure 1: A subject’s arm equipped with opti-cal markers.

Capturing a real actor’s motion plays an important role in computer an-imation as well as in motion analysis for medicine or sport science. Tracking the position and orientation of the subject’s limbs allows the realistic reproduction and transfer of this motion to virtual characters with the same skeleton topology.

Several approaches to track motion exist, like contour finding [CTMS03], or marker-based methods, where the trajectory of markers attached to the subject’s limbs are tracked magnetically [OBBH00] or optically [HFP+_{00] (Fig. 1). Although optical tracking is in general}

the most reliable system in terms of accuracy and robustness to external

influences, it suffers from two fundamental problems. On the one hand optical markers are visu-ally indistinguishable. Therefore we need appropriate methods to identify markers based on other

(2)

criteria in order to associate detected markers with the respective limbs. The second fundamental problem is that of occlusion. To reconstruct the three dimensional position of a marker, it has to be seen by at least two cameras. This cannot be ensured for a freely moving actor. Hence we need methods to compensate for missing markers, so that we can still reconstruct the position and orientation of the actor’s limbs, even if a significant number of markers is occluded.

Existing research and commercial solutions like our ART tracking system [ART] often need a considerable amount of time for a manual calibration to allow for reliable and robust marker recognition and tracking. In contrast, this project focuses on methods to make optical motion tracking a completely automated pipeline which minimizes the necessity for intervention.

Beginning with a self-calibrating system initialization, which uniquely identifies rigid cliques of markers, we automatically compute the underlying skeleton geometry and topology of a tracked subject. In particular, we do not constrain the degrees of freedom of the underlying model in any way, so that we are able to track arbitrary articulated bodies. Moreover, we reconstruct the com-plete position and orientation of limbs in contrast to other methods, which often have open degrees of freedom concerning the orientation of limbs, or which have to constrain the skeleton topology beforehand. During the actual motion recording phase we take advantage of this information to compensate for occluded markers, and to reconstruct the position and orientation of limbs and joints in an accurate and robust way.

2 Related Work

Several partial solutions increasing robustness and automation in optical motion tracking have been proposed. [RL02] present an automatic method to identify marker cliques. However, they need an explicitly occlusion-free training sequence which is processed offline to determine marker cliques and model parameters like the skeleton structure. Our method does not impose constraints on the initial training sequence and provides permanent feedback since it is computed online.

Our work on marker tracking and the dynamic identification of rigid marker cliques by for-mulating them as instances of a generic correspondence estimation problem is based on [SLH91]. They present an elegant algorithm to associate the features in two images for applications in com-puter vision. The transfer of their method to the domain of optical motion tracking allows us to solve several tracking-related problems in a unified manner.

[OBBH00] show how to estimate the structure and geometry of an unknown skeleton model. They describe a least squares fit of input motion data of individual limbs to a rotary joint model. Other methods like [SPB+_{98] compute joints by estimating the rotation center of markers and}

their associated limbs. We use the technique of [OBBH00] since it results in higher accuracy and robustness concerning noise.

Approaches to make the actual recording of motion data more robust range from predicting future marker positions using a Kalman filter [DU03] or search space reductions based on other prediction quality measures [vLvR03] to resolving occlusions based on the skeletal model of the tracked person as described in [HFP+_{00]. Our method does not try to identify or reconstruct}

(3)

markers based on predictions of future states but focuses on their robust recognition based on gen-erated marker-signatures. This ensures a reliable identification even after occlusions during several frames, where prediction models possibly fail due to unconstrained movements of the tracked sub-ject. We improve the actual tracking quality in the case of missing markers by applying methods of inverse kinematics to the computed skeleton as presented in [TGB99]. They show how to re-construct missing inner limbs of a skeleton up to one degree of freedom based on adjacent limbs in real-time. We extend their solution to determine the remaining degree of freedom if only one additional marker on the lost limb is known.

In a recent work [ZH03] use a force-based forward dynamic model to map optical motion tracking data to a body model. Their technique fails to estimate the skeleton of the tracked subject and is also running offline. However they explicitly mention benefits of a real-time system for motion capture.

Commercial systems like [Vic] provide software tools for all phases of the tracking pipeline. However, such systems focus on setups with single markers attached to limbs, resulting in the above mentioned restrictions. [ART], which is a commonly used system for VR-applications, pro-vides only low-level tracking and marker recognition without methods for automatic calibration, skeleton estimation or robust tracking of articulated bodies.

3 Self-Calibration

! #"

Figure 2: In this figure, eight markers are part of two distinct cliques, while one marker is tracked isolated. Within each clique, the inter-marker distances are constant. Between two sets Mt−1

andMtthree classes of markers have to be distin-guished during continuous tracking: Lost mark-ers (empty dashed boxes), tracked markmark-ers, and new markers. We solve this problem by finding corresponding elements in Mt−1 and Mt. Every

time a new marker is found it is assigned to an unused global marker identitym˜i.

To track an unknown subject’s motion, it is equipped with a set of spherical optical mark-ersm˜1, . . . ,m˜k(Fig. 1). As the subject moves, the tracking system reconstructs the 3D posi-tion of a marker at time t when it is seen by

at least two cameras. Due to indistinguishable markers and occlusions, our input data con-sists of an unstructured set of detected markers

Mt = {mt1, . . . , mtk(t)} given by their

corre-sponding 3D positions Pt = {pt1, . . . ,ptk(t)},

at Frame Ft with k(t) ≤ k. To calibrate and prepare the motion tracking system for the ac-tual motion recording phase, we have to solve the above mentioned two fundamental prob-lems of marker distinction (mapmt

i tom˜j) and temporal marker occlusion (recognizemt_i+nas mt

j). We present a method to solve both prob-lems as instances of a general correspondence estimation problem, resulting in a simple, co-herent general framework.

(4)

Our methods for correspondence

estima-tion are motivated by the work of [SLH91]. They presented an elegant approach to find a partial mapping between two sets of objects which minimizes the overall squared sum of some inter-object measurements based on a singular value decomposition of a proximity matrix. [SLH91] used this method to find an assignment between feature points in two images. However, using their method in a more general sense by formulating each tracking-subproblem as a matching problem between two partially corresponding sets of objects yields a simple framework to solve tracking-related tasks. We extended their method to work better with sets of objects, where the actual number of corresponding elements can be arbitrary.

The first low level tracking task where we apply our correspondence based method is the con-tinuous tracking of markers between successive frames Ft−1 and Ft. As mentioned above, our

input data consists of unstructured sets of markersMt−1 andMtand their corresponding 3D posi-tionsPt−1 andPt. Due to occlusion some of the markers inMt−1 will be lost inMt, some will be

trackable through both frames with slightly different positions, and some markers will be new in

Mt(Fig. 2). Our modified correspondence estimation algorithm identifies these classes of markers by assigning corresponding positions between the two sets Pt−1 and Pt. In particular, our

algo-rithm finds a partial mapping ofsubsetsof markers within the two setsMt−1andMt, allowing for

vanishing or newly appearing markers.

Figure 3: “Ripping strings”. Initially all mark-ers are connected to each other. By moving the respective cliques around, edges between differ-ent cliques are destroyed and only the final rigid cliques remain.

As depicted in Fig. 2 markers can be tem-porally occluded for the tracking system and are therefore lost during the continuous track-ing approach. Since markers cannot be dis-tinguished the system does not know, whether a new marker was already tracked before and thus should be associated with the former iden-tity. To resolve these ambiguities, one attaches not only one but several markers to every limb of the tracked person. Markers located on the same limb form rigid cliques with characteris-tic invariant inter-marker distances, while dis-tances to markers on other limbs will change over time. We can think of attaching a string between each pair of markers. As we go from

frame to frame, we record the length variation of each string. If a string is stretched too much it rips (Fig. 3). In the end only the strings between the rigid cliques remain. These constant distances to other markers within the same clique form a unique distance pattern or signatureSigi for ev-ery markerm˜i. This makes it possible to identify a currently unknown markermtj. Suppose this marker was already seen by the system before, then there exists some marker identity m˜i with a unique distance patternSigi corresponding tomtj. So after computing the set of distances ofmtj to all other markers found in the same frameFt, we can identifymt

(5)

patternSigi within this set. This is in particular difficult since signatures are often only partially available because of marker occlusions. Furthermore, signatures are often partially equal since the range of possible marker distances within one rigid clique is restricted by marker sizes, the precision of the tracking system, and of course the wearability of a clique for the tracked subject. This problem of identifying temporally occluded markers based on signatures is also solved by our correspondence estimation algorithm.

During the run of our algorithm we continuously track markers, dynamically create these sig-natures to identify single markers and rigid cliques, and resolve ambiguities caused by temporally occluded markers. As soon as a target number of rigid cliques is found, we automatically compute the position and orientation of each corresponding limb by embedding a local coordinate system into the corresponding clique of markers. At this point, the basic calibration of the tracking system is accomplished. We can reliably track a subject equipped with several sets of markers. In contrast to systems like [ART], this initialization is done completely automatic in real-time with a perma-nent feedback to the user, which allows a maximum efficiency and flexibility. In the following step, the underlying skeleton model is automatically reconstructed for higher-level tracking tasks.

4 Skeleton Reconstruction

Each of the identified rigid cliques corresponds to a limb of the tracked subject. Several meth-ods were proposed to automatically reconstruct the underlying skeleton structure. Under the as-sumption of a skeleton model with rigid bones and rotational joints, [OBBH00] show how to ro-bustly compute precise joint positions by solving a least squares system of motion measurements. Each limb li is associated by a time-varying local coordinate system. Thus there is a transform Lt_i = [Rt

i|tti]which maps from li’s current local coordinates to world coordinates. The joint be-tween two limbsli andlj has the property that it has constant local coordinatesci with respect to

liand constant local coordinatescjwith respect tolj. The coordinatesciandcj are related to each other by the fact that they map to the same position in world coordinates, i.e., Lt_ici = Ltjcj for every frame Ft. For every possible pair of limbs and measurements inn frames this leads to an

overdetermined system that we can solve for the local joint coordinates in the least squares sense.

   R0_i −R0 j ... ... Rn−_i 1 −Rn−j 1    " ci cj # =    t0_j −t0_i ... tn−_j 1−tn−_i 1    (1)

The skeleton structure can be computed by a minimum spanning tree connecting the limbs. Since every joint is defined by the two respective positions Lt

ici and Ltjcj, it can be computed during the actual tracking phase in the following section by averaging the two positions. Even if one limb is completely lost, all joint positions are still explicitly defined. The geometry of the bones is given by distance between adjacent joints. The computed joint positions allow us to calculate a further type of signature to identify markers, since the distance between markers and joints associated with the same limb remain constant. We will exploit these additional signatures

(6)

in the following section.

The extracted skeleton allows us to retarget the captured motion data to arbitrary objects with the same skeleton structure. Moreover, the computed model helps us to make the actual motion capturing phase more robust in cases of occluded markers.

5 Robust Motion Capture

During the actual motion capturing phase we are able to use several methods to make the tracking robust. We have a robust method to identify formerly lost markers, redundant orientation and position information for every limb, and a skeleton which reduces the degrees of freedom for every limb by imposing constraints from neighboring limbs. In this section we provide an example of how to exploit the skeleton structure to reconstruct “lost” limbs in cases, where complete cliques of markers could not be tracked due to occlusions.

Figure 4: This figure shows a situation, where the upper and lower arm are lost during tracking. The lost inner joint can be reconstructed by in-tersecting three spheres as described below. The orange circle visualizes the intersection of two of them. The third sphere intersects this circle in two points, yielding the lost inner joint.

Consider the case where two inner limbs like the forearm and upper arm of a tracked hu-man body, a so called huhu-man arm like chain (HAL-chain) are lost. In this case the position of the missing inner joint can still be computed up to one degree of freedom. [TGB99] show that it has to lie on a circle defined by the in-tersection of two spheres (Fig. 4) given by the outer joint positionsj1 andj2(wrist and

shoul-der) and both inner limb lengthsl1 andl2.

In practice it is very unlikely that both cliques of the lost limbs are completely oc-cluded. In most cases there will be at least one additional marker position p available. This marker can be identified by its rigid distance to its corresponding joint. Knowing its con-stant distance d to the missing inner joint

po-sition, it is possible to define a third sphere centered at the marker’s position p with ra-dius d. The lost inner joint position j has to be the intersection of these three spheres.

j ∈S(j1, l1)∩S(j2, l2)∩S(p, d). Usually this yields two possible solutions. Additional markers

constrain the left ambiguities in a similar way, until the the correct joint position can be recon-structed exactly. Otherwise one can choose the most plausible of the left solutions, according to continuity assumptions or other heuristics.

(7)

6 Results

In our current setup we use four ARTTrack1 cameras [ART], placed in the four upper corners of a rectangular room. While this setting allows us to track an unconstrained moving person, it also results in very frequent marker occlusions. For example, while tracking the HAL-chain in Figure 4 with 17 markers in 4 cliques, we had an average of21%of markers lost between two successive frames at a tracking rate of approximately 50 frames per second. In spite of this high percentage of lost markers it is still possible to capture the motion of a subject reliably since reappearing markers and partially visible cliques are generally identified within a few frames.

Figure 5: The deviation between the computed joint position using the inverse kinematic method and the actual joint position for the elbow. The peaks correspond to falsely computed joint posi-tion due to wrongly identified markers. However, they can be easily detected and compensated. To compare the quality of the inverse

kine-matic technique for inner joints to the actual joint position, we measured the deviation (Fig. 5) between both estimates for the elbow posi-tion (Fig. 4). The average deviaposi-tion of the re-constructed joint from the actual joint position is only about 8 mm with a standard deviation from this value of 7 mm. This is very close to the actual precision of the tracking system for the used marker size. The high peaks re-sult from wrongly identified single markers, in which case the sphereS(p, d)is of wrong size and the computed circle intersections result in wrong marker positions. However, such errors can be identified easily by assuming a continu-ously moving subject. The wrong positions can be compensated by enforcing physically plau-sible movements of the joints.

The necessary length of the initial phase for learning characteristic signatures depends on the presentation of cliques to the system. In an optimal setting, the self-calibration phase can be accomplished with all marker cliques already attached to the subject. In the case of a high loss rate like the above mentioned 21%for our system, it can be more efficient to calibrate the marker cliques in such a way, that a minimal number of occlusions is ensured, e.g. by attaching the marker cliques to the subject after the initial clique-calibration step. For continuously tracked markers without frequent occlusions, the self-calibration is finished as soon as the last clique gets visible. In cases where all markers are visible from the beginning, the calibration is completed after just a few frames. Since the system is running in real-time, one has direct feedback about the tracking quality, during the initialization as well as during the final motion capture.

The computation of the skeleton geometry and topology is also a matter of seconds. For the skeleton shown in Figure 6 with markers attached to 12 limbs, we generally consider motion sequences between 20 and 60 seconds duration. The actual model estimation takes approximately 1 second for 60 seconds of recorded motion at 50 fps. For a good model approximation it is of

(8)

primary importance that the tracked subject exercises all degrees of freedom for each joint. If the initial phase was already performed with the subject equipped with markers, these measurements can already be used for the skeleton reconstruction.

Figure 6 shows a few frames from a captured sequence using our system. Despite the fact that only 79%of markers could be tracked between two successive frames, the marker identification and inverse kinematic methods allowed us to reliably record the motion of a person equipped with 12 cliques of markers.

7 Conclusions

Figure 6: A few frames from a captured motion sequence using our system.

We presented different methods to make op-tical motion tracking a robust and automatic process. Automation was achieved by creat-ing a real-time capable self-calibration method, which learns characteristic marker signatures to identify rigid cliques of markers correspond-ing to limbs of the tracked subject. One key aspect of our method was the mapping of sev-eral crucial subproblems to different instances of one generic correspondence matching prob-lem, resulting in a simple and robust algorithm. From the rigid cliques of markers the system automatically extracts the geometry and topol-ogy of the subject’s skeleton. The robustness of the tracking process is improved by the reliable identification of markers even after long and frequent occlusions. Based on the computed skeleton we showed how to apply inverse kine-matic methods to reconstruct limb or joint po-sitions in cases, where an explicit clique-based

computation is impossible due to an insufficient number of tracked markers.

In the future we will integrate more sophisticated prediction filters for the movement of mark-ers, which will improve the tracking during those periods, where insufficient information is avail-able to apply the presented inverse kinematic methods as in the case of lost outer limbs.

References

[ART] ARTtrack1 & DTrack, A.R.T. advanced realtime tracking GmbH, http://www.ar-tracking.de/.

(9)

[CTMS03] Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. Free-viewpoint video of human actors. In ACM Transactions on Graphics, volume 22, pages 569–577, 2003.

[DU03] Klaus Dorfmüller-Ulhaas. Robust optical user motion tracking using a kalman filter. In10th ACM Symposium on Virtual Reality Software and Technology, 2003.

[HFP+_{00] Lorna Herda, Pascal Fua, Ralf Plänkers, Ronan Boulic, and Daniel Thalmann.}

Skeleton-based motion capture for robust reconstruction of human motion. In Proc. Computer Animation, 2000.

[OBBH00] James F. O’Brien, Robert E. Bodenheimer, Gabriel J. Brostow, and Jessica K. Hod-gins. Automatic joint parameter estimation from magnetic motion capture data. In

Graphics Interface, 2000.

[RL02] Maurice Ringer and Joan Lasenby. A procedure for automatically estimating model parameters in optical motion capture. In British Machine Vision Conference, pages 747–756, 2002.

[SLH91] Guy L. Scott and H. Christopher Longuet-Higgins. An algorithm for associating the features of two images. InProc. R. Soc. London, volume 244, pages 21–26, 1991. [SPB+_{98] Marius-Calin Silaghi, Ralf Plänkers, Ronan Boulic, Pascal Fua, and Daniel Thalmann.}

Local and global skeleton fitting techniques for optical motion capture. Lecture Notes in Computer Science, 1537:26–40, 1998.

[TGB99] Deepak Tolani, Ambarish Goswami, and Norman I. Badler. Real-time inverse kine-matics techniques for anthropomorphic limbs. Graphical Models, (62):353–388, 1999.

[Vic] Vicon iQ, Vicon Motion System Ltd, http://www.vicon.com/.

[vLvR03] Robert van Liere and Arjen van Rhijn. Search space reduction in optical tracking. In A. Kunz and J. Deisinger, editors,Ninth Eurographics Workshop on Virtual Environ-ments, number 9, 2003.

[ZH03] Victor B. Zordan and Nicholas C. Van Der Horst. Mapping optical motion capture data to skeletal motion using a physical model. In D. Breen and M. Lin, editors,