Adjustment and Interface Movement Kernels

Three kernels are responsible for the streaming adjustments (source adjustments, obstacle adjustments and interface adjustments) and four kernels are responsible for interface movement (convert full interface cells, convert empty interface cells, update interface and finalise interface). There are two important differences between these kernels and the streaming and collide kernels: these kernels operate on a small percentage of the entire lattice, and these kernels are unavoidably divergent. With the exception of the source adjustment kernel, the cells on which these kernels operate is dependent on the shapes of both the fluid and obstacles in the scene. The fluid shape is dynamic and changes throughout the simulation, and the positions of obstacles are dependent of the layout of the scene. As a result, these cells will seldom align themselves with warps, which results in inefficient memory access and threads within the same warp following different code branches. Therefore, we expect these kernels to make inefficient use of GPU resources. In this section we will discuss the behaviour of these kernels with regards to their inefficiencies.

6.2.1 Source Adjustments

The source adjustment kernel is responsible for maintaining a fluid interface around all source cells, and adjusting the distribution functions of the source cells to create the effect of fluid flow- ing into the scene. This kernel only operates on source cells and their neighbours and does not need to be executed if source cells are not present in the simulation. Usually there will only be a few source cells (< 1% of all cells, if they are present) and these cells will be clustered together. While some coalescing is possible, if the source is parallel to the x-axis, this will not always be possible. The fact that fewer than 1% of all warps are required to perform source adjustments for larger scenes, is more significant. This means that the cost of reseting distribution functions and maintaining a fluid interface around the source cells will be dwarfed by the cost of reading the cell type of all cells in the scene — a cost that will be constant throughout the simulation. We thus expect consistent performance from the source adjustments throughout the simulation that is almost equivalent to performing a single global read for each cell in the simulation lattice.

6.2.2 Obstacle Adjustments

The obstacle adjustments operate on obstacle-fluid boundaries. In a typical large scene, boundary cells around the edges of the scene amount to 3–5% of all lattice cells. While the boundaries on the xy and xz planes align perfectly with warps, the boundaries of the yz planes are particularly inefficient with only a single thread per warp performing changes to the scene. We can calculate the percentage of warps performing obstacle adjustments on the scene boundaries for a lattice with dimensions(x, y, z)as follows:

1−number of non-obstacle warps total number of warps

=1− (w−2)(y−2)(z−2) wyz

where w=_x

32 is the number of warps along the x-dimension of the lattice. For large scenes

this results in between 20% and 40% of all warps being tied up performing obstacle adjustments. This percentage is lower for larger scenes with fewer obstacles and higher for smaller scenes and more obstacles.

The operations required also result in varied performance. For each obstacle cell, an extra 18 global memory reads are required (one for each neighbour) to decide if there are any neighbouring fluid cells that require adjustments. Then for each of those 18 neighbours that is a fluid cell, an extra two global memory operations are required. This means that the performance of this kernel is dependent on the fluid-obstacle surface area at each time step. Since the shape of the fluid is dynamic, the performance of the obstacle adjustments will vary slightly between time steps.

6.2.3 Interface Adjustments and Interface Movement

The interface adjustment kernel is responsible for managing the flow of mass throughout the lattice as a result of the streaming distribution functions, and the interface movement kernels are responsible for controlling the movement of the fluid interface as a result of changes in mass at the fluid interface. This means the interface adjustment kernel and interface movement kernels operate only on the fluid interface cells and have similar inefficiencies to each other.

For larger, turbulent simulations (i.e. simulations with larger fluid surface areas) the fluid interface is usually represented by fewer than 3% of all lattice cells and these interface cells intersect with approximately 30% of all running warps. In scenes with little turbulence or scenes in which turbulence subsides, the fluid interface can be represented by fewer than 1.5% of all lattice cells, which intersect with fewer than 20% of all warps. These proportions vary throughout the simulation as the fluid interface changes shape, causing different levels of warp-efficiency as the simulation progresses. The intersection of interface cells with warps is important, because warps that do not intersect with interface cells do not need to perform any calculations or memory operations and complete instantly, whereas those that do intersect with interface cells have a number of inefficient memory accesses and calculations to perform.

The fact that the interface cells are unlikely to align themselves with warps for efficient memory accesses is particularly problematic for the interface adjustment and update interface kernels. On average, each warp for these kernels that intersects with the fluid interface has only three threads performing operations, which means, the memory bandwidth from an average of 29 other threads is wasted. Compounding the issue, these kernels need to perform numerous global memory accesses to obtain information from neighbouring cells. Therefore, these kernels are expected to perform poorly relative to the other kernels.

-∼-

With the limitations of the kernel discussed above in mind, we are now able to look at how the kernels perform together when producing a full fluid simulation.

In document Lattice Boltzmann Liquid Simulations on Graphics Hardware (Page 85-87)