3.2 Components of Scheme shim
3.2.2 A c-lambda function
In the Scheme shim, the c-lambda function is called from the vector-length-calculation helper function. In parameter list of a c-lambda function, the first six parameters describe the execution configuration and the seventh describes the size of dynamic shared memory to be defined in CUDA-C shim. The rest of the parameters are the actual kernel arguments and lengths of vectors. In the parameter list, if a parameter is an actual argument and it is a vector, then the next parameter will be the length of that vector. The c-lambda function performs Scheme to C data type conversions, executes casting operations for Scheme vectors, and calls the CUDA-C shim with the converted C parameters. In Listing 3.2 (lines 13–24), we provide the definition of the c-lambda function vector_addition_scm_driver.
In this example, vector_addition_scm_driver takes ten parameters. On line 14, the parameter list:
(int int int int int int int unsigned-int32 scheme-object int)
shows types for the parameters, because a c-lambda special form only accepts types of its parameters in the parameter list. The last three are actual kernel arguments and the type unsigned-int32 is the type for the argument u32_constant, in line 34. Then, the type scheme-object is the type for the argument Scheme vector u32v_src, in line 35, and the last parameter int type is for the argument vector-length-calculation on line 37
(u32vector-length u32v_src)
Note that this type information is used during the time of compilation by the Gambit compiler to link with C code.
In the CUDA-C shim, we need to allocate device memory for each vectors; therefore, we need to convert each Scheme vector to its corresponding C type pointer. Gambit provides library functions in C for casting Scheme vectors to C pointers and only allows those library functions in c-lambda or c-declare constructs. Therefore, the pointer-casting operation must be done within a c-lambda or a C-function defined within a c-declare construct. Both of these constructs must also be defined in a .scm file. Therefore, the pointer- casting operations for the Scheme vectors in this foreign-function interface reside in the Scheme shim’s c-lambda function in main.scm file. Note that this c-lambda function calls the CUDA-C shim.
In this example, pointer-casting operations and the call to the CUDA-C shim can also be defined in a C function within a c-declare construct. Because Gambit compiler does not allow a C function in a c-declare construct to be linked from a different file, that C function cannot have any external linkage. We cannot use the simpler c-declare in our implementation because we generate this foreign-function interface in a different file. Instead we put the pointer-casting operations and a call to the CUDA-C shim in a c-lambda function, like an interface generated by our implementation. We also put a forward declaration of CUDA-C shim within a c-declare construct.
The pointer-casting operation for the Scheme vector of type scheme-object to its corresponding Gambit allowed C type is shown on line 18 of Listing 3.2,
___U32 *host_u32v_src = ___CAST(___U32*,___BODY_AS(___arg9, ___tSUBTYPED));
This operation extracts the pointer of the parameter ___arg9 (referring to the type scheme-object, which is the ninth parameter in the parameter list, shown on line 14 of Listing 3.2) from Scheme using macro ___BODY_AS. Here, a tag ___tSUBTYPED in ___BODY_AS specifies that it is a memory allocated object. Next, ___CAST macro performs the type casting for the extracted pointer to type ___U32 because ___arg9 is a vector of unsigned 32-bits integers. The pointer is then assigned to a ___U32 variable host_u32v_src. The C pointer type ___U32 is defined in gambit.h along with ___U64. In this example note that the C pointer type ___U32 is used because the vector contains 32-bit unsigned-integers defined on line 2 of Listing 3.1.
We follow a naming convention to name the cast-pointer for a Scheme vector, and it is
host_[u, s, f]SIZE v_[IN\OUT]NAME
length-calculation helper function. The prefix host signifies that it is a host memory pointer in C and needs to be copied to device memory by the CUDA-C shim. Type representation information is also embedded with it. This helps maintenance programmers to recognize the appropriate type in CUDA-C shim for further improvement of our implementation. In this example, the vector src is defined in Scheme. In the helper function it becomes u32v_src and in the c-lambda function it becomes host_u32v_src. Note that this consistent changing of names in different parts of Scheme shim helps to clearly identify the appearances of a vector with different names and purposes.
Finally, we call the CUDA-C shim, vector_addition_cu_driver, with ten arguments (lines 20–22):
vector_addition_cu_driver( ___arg1, ___arg2, ___arg3, ___arg4, ___arg5, ___arg6, ___arg7, ___arg8, host_u32v_src, ___arg10);
We pass all the parameters of the c-lambda function to the CUDA-C shim. In the argument list, the first seven arguments (___arg1, ___arg2, ___arg3, ___arg4, ___arg5, ___arg6 and ___arg7) refer to the first seven int type parameters in the parameter list on line 14. The eighth argument (___arg8) refers to the type unsigned-int32 in the parameter list. For the Scheme vector, we do not pass the type scheme-object, but rather the converted C value host_u32v_src. We also pass the length computed by the helper function in Listing 3.2 (line 37) of the converted C type vector. This is the last argument, ___arg10, which refers to the last parameter type int in the parameter list on line 14. Note that the scalar types int and unsigned-int32 shown on line 14 of Listing 3.2 are auto-converted to C by c-lambda.