5.2 Dynamic light curve engine
5.2.3 Efficient image area searching method
To address the drawbacks and performance issues of the recursive version of our algorithm, we developed an iterative version which is more efficient and can be better integrated into our GPU ray shooting code. It is designed with the GPU procedure in mind, as it can produce the active cells list that is required by the GPU as an input at no extra cost.
Custom queue data structure
Custom Queue
Push Pop
Index Pop by moving index
Figure 5.10: Custom queue data structure for storing the active cells array. These are ready to be used by the GPU without rearrangement.
We designed a custom queue data structure for use as the core model for our data processing procedure as well as the storage of the results of our algorithm. At the end, the results in the queue data structure are ready to be copied directly to the GPU memory without any extra processing. Figure 5.10 shows the custom queue data structure used in our algorithm. The queue data structure is made up by a flexible array that will dynamically
enlarge itself when needed. Each memory space in the array is pointed to a C struct which contains two variables for storing coordinate information. The coordinates of active cells are stored by the two variables when an active cell is being pushed into the queue. When data is being popped from the queue, an index is moved and pointed to the next memory space instead of deleting the content of the popped data. Therefore, all the pushed active cells’ coordinates will be stored in the array and are ready to be used by the GPU without extra rearrangement that cause overhead.
Active cells labeling and overlap checking
Another central component of our image area searching algorithm is the mechanism for checking already labeled active cells to avoid over counting. Since each active cell should only be used once for ray shooting, it is important to avoid duplicated active cells being fetched to the GPU. An efficient method is crucial to give high performance for our algorithm as every grid cell needs to be checked before being labeled as active and pushed into the queue. Moreover, a reliable and robust mechanism is essential for the image area searching algorithm to work probably. We have designed two mechanisms for this purpose.
The first mechanism uses a large piece of memory space for labeling active cells. Memory space for an array of 8-bits char is allocated before the image searching procedure starts. Each grid cell is represented by a memory space and marked as 1 if it is an active cell, otherwise 0. The size of the array depends on the resolution and size of the grid. The resolution depends on the source star size and the size of the grid is predefined. The array usually takes hundreds of megabytes to several gigabytes of memory space.
This mechanism provides very fast active cell checking and labeling to avoid over count- ing. The trade off is a large piece of memory space is required as it trades off memory space for computing speed. It may seem highly inefficient to allocate such a large piece of memory space. However, on some platforms/distubutions, like BSD Unix, Ubuntu and Max OSX, the method callcalloc() allocates memory space dynamically and only allocates extra space when needed. Therefore the usage of memory space only grows as the algorithm utilizes more memory space when more active cells are being discovered.
Image area searching algorithm
Once the image locations of the nine point-source locations are resolved, the coordinates of the grid cells that contain the images are pushed into the custom queue data structure.
Those grid cells are also labeled as active cells using the labeling mechanism. The queue data structure serves a dual purpose, tracking which active cells are being processed and acting as a storage space for active cells’ information that will be consumed by the GPU ray shooting code.
The following code shows the active cells searching algorithm. Once the procedure is started, the main iterative loop continues working until the queue is empty. First, an active cell is popped from the queue. The neighbouring cells of the first active cells are being checked using the grid point method. If the cell being processing is within the image area and is not active, that cell is pushed to the queue and is labeled active. The neighbouring cells are tested in the order of left to right, top to bottom. Once the procedure is done, the second active cell is popped from the queue and being processed with the same procedure. The whole procedure continues until the queue is empty.
while (queueIsNotEmpty) { xL, yL = popQueue()
for each neighbouring cell {
if (checkIfCellisActive(thisCell)) { pushQueue(axL, ayL) labelThisCellActive(thisCell) } } }
In Figure 5.11, the lens plane is divided in a grid with resolution according to the source star size. The dark solid line represents the boundary of the real image area. The cross marks are the resolved point images from the point sources within the source star. The cells that contain the images are marked as active (grey in colour) and pushed into the queue. The top figure shows content of the queue before the main loop starts. The coordinates of the first labeled active cells are pushed into the queue and act as initial active cells in searching the image area. In loop 1, cell A1 is popped and its surrounding cells are tested using the grid point method. Since cells B1 to B8 are also within the image area boundary, they are being labeled as active and are pushed into the queue. The whole procedure is repeated until the queue is empty, thus the whole image area is covered by active cells. Note that only the index is moved when a cell was popped from the queue. Therefore, all the active cells can be stored inside the queue and are ready for copying into the GPU memory for ray shooting.
point image
image area boundary A1 A2 Queue: A1, A2 Push index Loop 0 A1, A2 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16 Push index Loop 1: A1, A2, B1, B2, ..., B8 Pop Queue B1, ..., B8 A1 Push index Loop 2: A1, A2, B1, ..., B8, B9, B10, ... B16 Pop A2 B9, ... B16 Push index Loop 3: A1, A2, B1, B2, ... B16, C1, ... C4 Pop B1 C1, ... C4 Queue Push index Loop 18: A1, ... B16, C1, C2, ... C26 Pop B16 C1, ... C26 C1 C2 C3 c4 c5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26
...
Figure 5.11: The image area searching algorithm at work at different stages. The point images are marked as black crosses and labeled active cells are in grey. The dark solid line is the real image area boundary.
The iterative version of our algorithm works similarly to the recursive version. The difference is that a custom queue is used to manage the active cells instead of a stack of method calls in the recursive procedure. This not only allows us to have total control in memory usage and performance tuning, but also provides an active cells list that is ready for the GPU to process with no extra data rearrangement. It is achieved by our custom design queue data structure which acts as an active cell manager as well as an active cells storage structure. The custom queue does not delete the contents of the queue when popping data, instead, it uses an index as pointer to keep track of the head of the queue. The content in the queue is ready to transfer to the GPU once the image area searching algorithm is completed as it is essentially an array with active cells’ coordinates.
Our image area searching algorithm is designed to be very efficient in searching the image area. It only expends its search to the place where the image area is extended from the initial active cell. This is especially important when computing light curves of high magnification events as the image area is a very long thin arc when the source star is located close to the centre of the Einstein ring. Our image area searching algorithm does not spend extra computation on areas that are not mapped to the source.