2.3 Collision Computation and Force Rendering
2.3.2 Collision Computation
2.3.2.5 Acceleration Strategies
Due to its complexity, collision computation tends to become a bottleneck in the sim- ulation pipeline. Several methods have been proposed to speed it up and this section collects the most important ones.
Bounding Volume Hierarchies (BVH)
Bounding Volume Hierarchies (BVH) consist of nodes of BVs that are ordered in trees, as illustrated in Figure 2.7. Each node encapsulates a part of the object, and its size depends on the level of the tree where it is located: the root node bounds all the ob- ject, whereas nodes close to the leaf level contain smaller object parts or limited sets of primitive features. The nodes are connected with links that establish children and parent relationships, so that each parent node bounds all object parts contained by its children recursively.
Level III (leaves) Level II Level I (root) 1 2 3 4 6 5 7 A B C F G D E 1 2 3 4 5 6 7 A B C D E F G D4 D5 E4 E5 D6 D7 E6 E7 F4 F5 G4 G5 F6 F7 G6 G7 B2 B3 C2 C3 A1
Bounding Volume Hierarchies (BVH)
Bounding Volume Test Tree (BVTT) depth-first
breadth-first
Figure 2.7: Bounding Volume Hierarchies (BVH) of the Stanford bunny and the Utah teapot; in the shown example, BVHs use spheres as node BVs and are binary trees, i. e., they have two children per node. Root nodes (level I, nodes 1 and A in red) contain all children nodes and primitive mesh parts. Leaf nodes, here in level III, point to primitive mesh parts. A BVH can be created bottom- up if leaf primitive features are clustered recursively, or top-down if the root BV is split recursively. The Bounding Volume Test Tree (BVTT) combines one-by-one nodes from two BVHs and it can be traversed depth-first if whole branches are evaluated for collision in a row (e. g., left to right), or breadth-first if each level is checked from root to leaf level.
Thanks to the BVHs, conservative proximity and collision checks between BV nodes are done recursively and refined progressively, determining very fast regions in the un- derlying objects that might be close to each other or in collision. For instance, in the example of Figure2.7, if the distance between two sphere nodes is greater or equal than the current distance approximation, the sets of descendant sphere nodes can be culled, otherwise, the children nodes must be further examined. Once leaf nodes are reached, primitive features can be checked using the techniques explained in Section2.3.2.2.
BVHs have several relevant features:
• The shape of the nodes can be any of the BVs introduced in Section2.3.1, such as spheres [Qui94], [Hub96], swept spheres or capsules [LGLM99], AABBs [CLMP95],
[vdB97], OBBs [GLM96], k-DOPs [KHM+98], [Zac98], and CHs [GME+00]. As
mentioned, a trade-off between fitting and testing efficiency must be found, with the simplest BVs usually being the preferred ones.
• A defining property of the tree is the degree or branching factor, i. e., the number of children nodes each (parent) node has; this value influences the height of the hierarchy, i. e., how many levels it has from the root node downwards, and how many nodes of a given resolution each level has. Optimum values for the degree
have been discussed [KHM+98], with binary trees probably being the most common ones, i. e., trees with degree 2.
• The building approach can be bottom-up or top-down and it can affect the distri- bution of the nodes. The former starts with leaves or primitives and builds upwards grouping nodes close to each other taking advantage of any local information. The latter starts with the highest root node that encapsulates the whole object, which is then recursively split into smaller nodes; typical splitting heuristics include even or isotropic volume or primitive distribution along directions aligned with the ones determined after principal component analysis (PCA). It is worth to mention that, whereas each node’s BV can fully contain all its children and leaf primitives, in wrapped hierarchies [WZ09b] each BV contains exclusively all primitive leafs, with- out conservatively enclosing children node BVs. Thus, tighter representations are achieved.
• A fundamental characteristic is how the runtime traverse is done. Commonly, the Bounding Volume Test Tree (BVTT) is defined as the tree composed of nodes that combine the nodes of the two tested BVHs. The BVTT has the same amount of levels as the BVHs, but the number of nodes in each level is the product of the nodes of the BVHs in that level. Checking two BVHs implies traversing the BVTT, and the most common approaches for that are depth-first and breadth-first. The former starts in the root and explores one branch after the other, each of them as deep as possible. The latter starts also in the root but visits all the nodes in a level before jumping to the next level. While the first focuses in specific regions at a time, the second refines the sampling on the object gradually reaching the leaf nodes only at the end.
The cost of computing collisions with BVHs (i. e., the cost of traversing the BVTT) has been defined to be [GLM96], [KHM+98]:
TBVTT = NvCv | {z } BV check + NpCp | {z } primitive check + NuCu | {z } BV update (2.7) with N being the number and C the unitary cost of each of the types of checks or updates. Clearly, all terms are in conflict; for instance, a tight BV decreases Nv and
Np, it but increases Cv and potentially Cu. In general, the appropriate number of BV
checks (Nv) can significantly decrease the number of more costly primitive checks (Np).
However, worst case scenarios in which most of the surface is checked for proximity or collision lead to higher computation times than in situations in which primitive features are checked alone.
BVHs make possible Level-of-Detail (LoD) or multi-resolution checks, especially when breadth-first traverse is employed. If for each node a simplification or a sample of the underlying primitive features is stored (e. g., a simplified mesh or a set of points), each hi- erarchy level represents an approximation of the original object with a different resolution. Then, nodes can be further refined if the details of the children are perceptible [OL05], or graceful degradation [BJ08] in time-critical situations. This strategy requires refining the collision output every node check.
Spatio-Temporal Coherence
Under sampling frequencies of 1 kHz, the configuration of objects will change minimally from frame to frame– in other words, objects that are far away will remain so, and regions in collision will probably still collide for successive time stamps. If not an acceleration technique, this notion of spatio-temporal coherence is at least a fundamental assumption with strong practical applications:
• Queues of object, node, or primitive pairs to be checked can be priorized or ordered according to their expected time to collision [LGLM99], [Mir00], [MPT06]. In the lack of this time value, it can be approximated by assuming an upper bound or maximum velocity and the current distance to collision. If that time lasts longer than the cycle time itself, the pair can be safely neglected; when the expected times are correctly updated, many superfluous checks are avoided.
• Previous colliding objects, nodes, or primitive features can be the seed or start- ing point for each computation cycle. The LC algorithm (see Section 2.3.2.2) is able to achieve close-to-constant computation times [LC91]. Similarly, front track- ing [EL01] techniques have been used for BVHs; these consist basically in starting the traverse with nodes from a lower level to which they were previously colliding, instead of the root node itself.
GPU versus CPU
In the past decade, a shift in the focus of processor architectures has been happening: parallelism is increasingly sought (achieved with Graphics Processing Units or GPUs) in contrast to the more traditional single core clock rate approaches (related to Central Processing Units or CPUs). That shift of focus should be regarded considering the devel- opment of General Purpose GPU (GPGPU) frameworks, such as CUDA∗ or OpenCL†,
∗https://www.khronos.org/opencl/ †https://developer.nvidia.com/cuda-zone
which have provided with abstraction interfaces for leveraging the power of GPUs for computations not necessarily related to graphics.
Although the methods explained so far are in theory agnostic with respect to the used architecture, GPGPU implementations do require specific designs that take into account the particular architectural and memory hierarchy characteristics of GPUs. In general, data structures need to be built in such a way that different parts of them (i) can be concurrently loaded in relatively small memory portions (compared to CPU implementa- tions), and (ii) independently processed by threads that run relatively simple functions, known as kernels. Therefore, the use of hierarchical representations is a challenge.
Several works have started exploiting highly parallel architectures. Given the afore- mentioned technological constraints, some approaches define kernel functions for each of the (or selected) primitive features of the moved object; in these processes the features are tested against the environment, e. g., points against distance fields [ZLSWF13] or spheres against streamed point clouds [KWZ17]. Other works dealing with BVHs cluster queries that correspond to adjacent configurations in the same process [PM11]. In order to take advantage of the full capacity offered by GPUs, smart workload balancing tech- niques applied to BVHs have also been presented [LMM10]. In that contribution, tasks or instantiated kernels, which process small work units consisting of node pairs to be tested, are launched concurrently in separate cores. The result of processing each work unit may be a new set of children node pairs to be checked, which are added to the work queue of the task. Every task keeps its own work queue, which is accessible only locally, without needing synchronisation or coordination between tasks. Meanwhile, a balancing is performed between cores with a global counter.
Probabilistic Approaches
Given the computational cost of collision detection, some methods opt for avoiding to check features or elements with a low likelihood of overlap. As mentioned in Sec-
tion 2.3.2.2, the number of axis tests has been efficiently decreased for the SAT method
applied to AABBs [vdB97]. More recently, probabilistic overlap tests with BVHs which have collision confidence values as output have been used [PPM17]. For that, upper bounds of multidimensional Gaussian probabilities for collision are derived, since dis- tribution densities cannot be obtained in a closed-form, and approximating them with Monte Carlo methods is computationally too expensive for realtime applications. This and similar methods could be of interest in several robotic applications.
Moreover, machine learning methods that incrementally compute contact configura- tion spaces by minimizing sampling have been exploited [PZM13]. This can be done by iteratively refining the resolution of the contact configuration space with support vector
machines (SVM), which are basically the samples used to determine the hyperplane that separates colliding and non-colliding regions in the highly dimensional sample space; in other words, SVMs are the key samples that are close to the contact boundary and there- fore implicitly define it. In the refining process, given a configuration which needs to be tested, the nearest sample neighbors are processed to find the closest one. This closest sample is then projected on the contact space approximation defined by the support vectors, leading to a contact configuration.
Probabilistic approaches are not restricted to collision computation. Hou and Sou- rina [HS16] presented a predictive force filter for smooth haptic rendering. The computed contact forces are continuously directed to a buffer rather than to the user. In a parallel thread, linear regression is applied on the last 300 values of the buffer to predict the next force and torque vectors, which are subsequently smoothened using a B-spline function also aware of previous prediction values.