GPU Implementation Details - Efficient and high quality clustering

In this section we describe GPU-specific implementation details for the algorithms proposed in Section 6.2 and 6.3. As for meshes, we aim at performing the complete clustering entirely on the GPU, reducing any data transfer between CPU and GPU to a minimum.

With small changes, the techniques for based mesh clustering carry over to GPU-based data clustering. Most of the implementation details described in Section 5.2 apply equally to data clustering. Thus, the reader who is not familiar with these is referred to read Section 5.2 first. In this chapter we address only the new elements which are related to data clustering.

6.4.1 Data Representation and Processing

As for meshes, on the GPU a per-point and a per-cluster processing applies. Correspond-ingly, 2D 32-bit textures are used to store the associated PointData and ClusterData.

Given a d-dimensional data set with m points, we save the input data set in an image based pfs format [MKMS07]. This is done in order to obey the same input format as for meshes. The number of channels in the pfs file corresponds to the d dimensions of the data set. Note that no additional information besides this is saved in the pfs data file.

In contrast to meshes which are 3D, the data usually tend to be high dimensional, i.e.

d > 4. To cope with this, the Multiple Render Targets (MRT) mechanism is used to store and process any point or cluster data information. Current hardware can support up to 8 RGBA targets. Thus, theoretically up to 32-dimensional data can be stored and processed using ceil(d/4) render targets. Since each cluster needs to store 2d neighbors, we can only process data up to 16 dimensions in the current implementation.

However, it must be recognized that this limit only applies to the current implemen-tation. Thus, if the neighbors of each cluster are saved in two texels instead of one, i.e.

doubling the size of the NeighborsInfo texture, then up to 32 dimensions can be supported.

Interestingly, this technique can be used to deal with even higher number of dimensions.

Now, regardless of whether the data point is clustered or not, each data entity belongs to a specific cluster with index IDcl. In the beginning all data belong to the “null” cluster, i.e. they have IDcl = −1. This information is saved in a PointInfo texture, where each texel corresponds to a data entity in the original data set. As for meshes, reassigning a data point which belongs to the cluster m, i.e. with IDcl = m, to a cluster n means that the data IDcl index changes to IDcl = n, see Figure 5.8 on page 103.

6.4.2 Initial Clustering Configuration

Both the Local Neighbors kmeans and the Multilevel algorithms proposed in Section 6.2 -6.3 start with an initial configuration as described in Section 6.2.

Given a step-size a the bounding box spanned by the data set is subdivided into l = a^d voxels. Each voxel correspond to a starting cluster. Thus the cluster texture size is set to clusterT exSize = ceil(sqrt(l)). From the bounding box limits for each dimension k the voxelSizek, as the voxel size for a specific dimension, is computed as (limitM axk − limitM in_k)/a.

However, since the subdivisions are done in a d-dimensional space but we work on 2D textures, we need a mapping from the subdivided d-dimensional space to the 2D texture space. This is achieved using the following formula:

IDcl =

d−1

k=0

(base)^kmk. (6.2)

where base is the basis from which the mapping is done and the mkare the correspond-ing numerals for each dimension with values between 0 and base − 1. As an example, for ClusterData textures we have base = clusterT exSize and the numerals m_k are exactly the texture coordinates.

If the cluster index IDcl is known the numerals mk are obtained as:

m_k = IDcl

base^k % base. (6.3)

for k between 0 and d − 1, and % as modulo operation.

If we want to map from one base to another, the cluster IDcl is computed first in one basis using Eq. (6.2) and then mapped to a different one using the numerals computed according to the Eq. (6.3).

Having a valid initialization means that each d-dimensional data point has assigned a correct cluster index IDcland that each cluster with index IDclhas 2d valid local neighbors.

This is done as follows:

1. Assigning the d-dimensional data points to correct starting cluster IDcl is done using a fragment shader. For each fragment which corresponds to a texel in PointInfo texture an IDcl is computed according to Eq. (6.2) with base = a and base vectors mk = P oint[k]/voxelSizek. Where P oint[k] is the k-th point coordinate component and voxelSizek is the voxel size for a given k-th dimension.

2. Initially, the neighbors are assigned according to the voxel-based spatial segmenta-tion, see Section 6.2 and Figure 6.5 on page 125. For a given cluster with index IDcl

the numerals mk in basis a can be computed according to Eq. (6.3), see the getNu-merals(int forID) subroutine in Algorithm B.1 in the Appendix B. These numerals are used to identify the indices ID_cl^∗ of the neighboring clusters. Figure 6.7 depicts a 2D example. By increasing and correspondingly decreasing by one the value mk

for each dimension (see the getNeighbors(int forDim) subroutine in Algorithm B.1 in the Appendix B) different cluster’s indices according to Eq. (6.2) can be computed.

These are assigned as the indices of the neighboring clusters.

m^cl₁ m^cl₀ + 1

m^cl₀ m^cl₁ − 1

m^cl₀ m^cl₁ + 1 m^cl₀

m^cl₁

m^cl₁ m^cl₀ − 1

Figure 6.7: A 2D example for identifying the indices of the neighboring clusters. For a given cluster with index IDcl and corresponding numerals (m^cl₀, m^cl₁), the indices for four neigh-bors (indicated by arrows) are computed according to Eq. (6.2) using four corresponding numerals pairs.

6.4.3 Data Clustering

Local Neighbors k-means:

The approach, see Section 6.2, is implemented similarly to the Boundary-based algorithm, as described in Section 5.2.3. The PointInfo texture is rasterized and for each fragment, which corresponds to a given point Qj, the closest cluster is identified according to Algo-rithm 6.1. The points which remain assigned to the same clusters are simply discarded. In this case an occlusion query can be used to count the number of written fragments. If all fragments are discarded the algorithm stops, meaning that no points can be reassigned to other neighboring clusters such that the energy decreases.

Multilevel clustering:

The workflow and the way in which the ML data clustering works is the same as for mesh clustering, for additional details refer to Section 5.2.4. Here, the NeighborsInfo texture is used instead of the FaceInfo texture. In the NeighborsInfo texture each cluster Ci has a reference to its N_kⁱ neighbors. Thus, for each of these neighbors a DE can be computed, resulting in maximally 2d DEs.

6.4.4 Brute-force K-Means on the GPU

In order to compare the newly proposed Local Neighbors k-means, Section 6.2, with the classical brute-force k-means algorithm, see Section 2.2.1, we integrated the latter into our framework.

This can be achieved very easily. As described in Section 5.2.3, the PointInfo is used to update the clusters centroid, i.e by applying GatherClusterData() and ComputeCluster-Proxy() subroutines.

To perform the optimization step we load the current cluster’s centroid and the cluster ID into a vertex stream. In the geometry shader we generate for each vertex, i.e. cluster, a quad that covers the complete PointCoordinate texture. For each generated fragment the

“point to cluster energy” is computed. Using as fragment depth the computed energy and performing the depth test one obtains for each point in the PointInfo texture the ID of the closest cluster, i.e a new PointInfo texture. This process, i.e. computing clusters centroids and reassigning the points to the closest clusters, is repeated until no points are reassigned to other clusters.

In document Efficient and high quality clustering (Page 139-142)