4.3 Serialization
4.3.1 Serializing Objects
Prior to copying an object to the GPU, the object must be serialized into the proper format so that the object can be processed by the translated CUDA code. Serializing
a Python object involves collecting all of its non-contiguous parts and collecting them in a contiguous section of memory as normal binary data, as depicted in Figure 4.6.
Figure 4.6: A normal Python object and its serialized counterpart
In order to collect all the data in an object, including the data in its nested objects, the object’s fields can be recursively examined. However, the order in which the fields are accessed must be in the same order as the class definition that is created during the class generation phase. Otherwise, the data will not be in the proper order to match up with the class definition and will not match up with the precomputed format specifier in the Class Representation. Luckily, because order of the fields in the class definition is determined by the order of the fields in the class’s corresponding Class Representation, the Class Representation and its nested Class Representations can be used to traverse the object and collect its data in the correct order. The algorithm in Figure 4.7 is used to recursively collect all the data of an object and any nested objects.
The algorithm must be provided with an object to extract data from and a Class Representation representing that object. The algorithm iterates through all the field names and field types in the Class Representation. If the field type is a primitive Class Representation, then the data in the field is of a primitive type and must be collected. Otherwise, if the field type is a complex Class Representation, this
ExtractData (obj, class repr, data items)
Inputs: obj, An object
class repr, A Class Representation representing the object data items, A list to contain the extracted data items
foreach f ield, repr∈ zip(class repr.field names, class repr.field types) do if repr is a PrimitiveClassRepresentation then
data items.append(obj. dict [f ield]);
else
ExtractData(obj. dict [f ield], repr, data items);
Figure 4.7: An algorithm to recursively collect all the data in a Python object
indicates that there is an object to be recursively examined in the field. In this case, ExtractData is recursively called and a reference to the list of data items that is currently being built is passed along.
Using this algorithm, the data is guaranteed to be in the same order as expected by the C++ class definition to be used with the serialized object. Then, the data can be converted into its serialized form by using Python’s struct module. The struct
module has a method pack, which is used to pack binary data. In order to use
struct.packto pack a list of data items, a format specifier string must be provided.
The format specifier string for the object can be found in the Class Representation, as the string was precomputed during the construction of the Class Representation. Calling struct.pack with the format specifier string and the list of data items pro- duces a byte array object containing the binary representation of the list of data items. This byte array that can be passed to PyCUDA’sto device function which allocates appropriate memory on the GPU for the byte array, copies the data into the allocated memory, and returns aDeviceAllocationobject. This DeviceAllocationobject is
effectively a pointer to GPU memory and can be passed into the CUDA kernel as an argument.
Once the serialized data has been copied to the GPU and the kernel has been executed, the serialized objects must be copied back to the host and then deserialized. In order to copy the serialized objects back to the host, a byte array with the same size as the serialized object must first be allocated. The size of the serialized object can be determined by using struct.calcsize, which takes a format specifier string and outputs the number of bytes needed to store a struct with the specified format. To copy the serialized data from the GPU into the empty byte array, PyCUDA’s
memcpy dtoh function is used. Thememcpy dtoh function takes a byte array to store
the copied data and a DeviceAllocation that describes the location of the data on the GPU. This memcpy dtoh function is passed the empty byte array and the existing
DeviceAllocation object that was created when copying the serialized data to the
GPU.
After the serialized data is copied back to the host, the data is still contained in a byte array and must be unpacked. The first step of deserialization is using
struct.unpack to convert the byte array into a list of data items using the format
string from the Class Representation. Now, the data items must placed back into the fields of the object from which they originated, using the algorithm in Figure 4.8.
This algorithm requires an object to unpack the data items into, obj, a Class Representation for the class of obj to guarantee correct order of iteration over the fields ofobj, the list of data items that must be unpacked intoobj, and the number of data items unpacked from the data items list (initially zero). The algorithm iterates over all the fields in the Class Representation, recursively stepping into the field of an object if a non-primitive Class Representation is found. If a primitive Class Representation is found while iterating over the Class Representation’s fields, the
InsertData (obj, class repr, data items, num unpacked)
Inputs : obj, An object
class repr, A Class Representation representing the object data items, A list containing the data items to unpack into obj num unpacked, The number of items unpacked from data items
Output: The number of data items unpacked so far
foreach f ield, repr∈ zip(class repr.field names, class repr.field types) do if repr is a PrimitiveClassRepresentation then
obj. dict [f ield] ← data items[num unpacked]; num unpacked←num unpacked+ 1;
else
num unpacked ← InsertData(obj. dict [f ield],repr, data items, num unpacked);
return num unpacked
Figure 4.8: An algorithm to recursively insert a data list into a Python object
next item from the data item list must be inserted into the field described by the primitive Class Representation.