4.2 Code Generation
4.2.2 Generating Classes
The information used to produce all necessary CUDA C++ class definitions is con- veniently located in the extracted Class Representations. As described previously, a Class Representation contains a class’s name, a list of fields and their types, as well as a list of Method Representations describing the prototypes that must be included in the class definition. The Method Representations contain a method’s name, return type, and parameter names and types.
The first part of generating class definitions consists of creating a list of forward declarations for each class. This allows us to sidestep the issue of needing to define classes in any particular order and allows us to generate a wider variety of Python code. Without these forward declarations, some Python code cannot be translated. For example, suppose that class A has a field of type B and class B has a method that has a parameter of type A. This would cause a circular dependency and would require A to be defined before B and B to be defined before A. However, using forward declarations, the compiler is aware of bothAand B before either of them are actually defined. Creating these forward declarations is a straightforward process and involves iterating through the name fields of all the extracted Class Representations. Once the forward declarations are present, the next step is to generate the actual class definitions for each Class Representation that was extracted. In order to mimic the fact that in Python all fields and methods are public, all the fields and method prototypes in the generated CUDA C++ class definition are placed under the public modifier.
The first component of the class definition is the class’s field declarations. The field declarations are obtained by iterating through the name and Class Representa- tion pairs describing the fields and their types in the Class Representation for which the class definition is being generated. For non-primitive fields, instead of storing a
pointer to another object as would be preferable in normal C++, the entire object is stored in the parent object, requiring objects to be stored in contiguous memory. This helps speed up serialization and deserialization by keeping objects in contiguous memory, but requires that objects are prevented from directly or indirectly contain- ing objects of the same type, as previously mentioned. Storing objects in contiguous memory also allows simpler allocation of local variables, as performing parallel dy- namic allocations from many GPU threads simultaneously is not ideal [5]. In addition, having contiguous objects requires that all objects must be homogeneous and must have all fields present. This means that objects of a particular type type cannot have null fields if other objects of the same type have data present in these fields.
The second component of the class definition is either a declaration or implemen- tation of a default constructor. If a Python class has a constructor that does not take any parameters, then a prototype is simply declared and an implementation is pro- vided later, during the method translation process. Otherwise, the default constructor is given an empty implementation and is automatically used by the CUDA compiler to initialize an empty object so that memory can be copied into the newly created object from another object that is passed by value either as a function argument or as a function return value. This empty default constructor is also automatically used by the compiler when an object is assigned into an existing object, such as when an object is written to the output list of the map operation. A copy constructor that takes a reference and initializes an object based on that reference is also nec- essary. The copy constructor is implicitly defined by the compiler and assigns each field from a reference into the newly created object. This copy constructor is used when a function or methodA’s return statement consists of a call to another function or method B that returns a reference. If A returns an object copy while B returns a reference, the object referred to by the reference returned from B must be copied, using the copy constructor, into the return value location in A’s stack frame so that
the object can be returned by Aas an object copy. In addition to a copy constructor, a move constructor is also necessary. Luckily, the compiler also implicitly defines a move constructor and no additional work is necessary. The implicitly defined move constructor assigns each field of the object referred to by the rvalue reference to their corresponding fields of the newly created object. The move constructor is used when functions attempt to return an rvalue reference as an object copy.
The third component of the class definition is the method prototype declarations. These method prototype declarations are determined by iterating over each Method Representation in the Class Representation for which the class is being generated. Each Method Representation contains the name, return type, and parameter types of the method represented by the Method Representation. In order to attempt to mimic Python’s pass-by-reference behavior for non-primitive objects, including both rvalues and lvalues, there must be multiple prototypes for each method. For each non-primitive parameter of a method, there must be a prototype that declares the parameter as a normal reference, as well as a prototype that declares the parameter is an rvalue reference, because methods must be prepared to take either rvalues or lvalues as arguments, as described previously in Section 4.2.1. In addition, depending on the scope of the return value of the function, the return value may either be returned as an lvalue reference or an object copy. This step must be done after function generation, as the abstract syntax trees must be inspected in order to determine whether an argument was returned, so that some of the prototypes can be given lvalue reference return types.