2.14 The SAXPY Example
2.14.3 The example SAXPY application – C ++ language
Analogically to the C version, the C++ code also needs to include header files. As in the C version, the directory names for these files are different for Apple and for other platforms. This problem is also solved using conditional compilation using pre- processor directives. The includes are in listing 2.100. In this listing, there is also one very useful declaration – __CL_ENABLE_EXCEPTIONS. This makes OpenCL C++ API generate a specific object which handles the exception in the event of error. Thanks to this and despite the fact that there is no explicit error handling in the code, the user will be notified if anything goes wrong.
The global variables used by this example are of a different type than that found in the C version. These are wrapper objects for OpenCL elements like context, com- mand queue and so on. These wrappers provide convenient methods for manipulat- ing these objects. The declaration of global objects necessary to run sample compu- tations on the device can be seen in listing 2.101. The details about classes in this fragment are in sections 2.11, 2.8, 2.6 and 2.5.
Loading and compilation of an OpenCL program can be seen in listing 2.102. The OpenCL device code is loaded from a file named saxpy.cl, using very convenient C++ API that allows for a load of the whole file into a string using only two lines of code. Next, this string is passed to the constructor of the type cl::Source, and then the source is passed to the constructor variable cl::Program. The program is then built. The resultant object representing the compiled OpenCL program is returned. More information about program creation can be seen in section 2.11.
The function for printing a vector to a console can be seen in listing 2.103. It does the same thing as the analogous function in the C version of this program.
1 cl::Program createProgram(cl::Context context, std::string fname) {
2 cl::Program::Sources sources;
3 cl::Program program;
4 std::vector <cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES > ();
5 std::ifstream source_file(fname.c_str());
6 std::string source_code(std::istreambuf_iterator <char> (
7 source_file), (std::istreambuf_iterator <char> ()));
8 sources.push_back(std::make_pair(source_code.c_str(), source_code.length()));
9 program = cl::Program(context, sources);
10 program.build(devices, NULL);
11 return program;
12 }
Listing 2.102: OpenCL program compilation
1 void printVector(cl_float *a, int size) {
2 int i;
3 std::cout << "[ ";
4 for (i = 0; i < size; i++) std::cout << a [i] << " ";
5 std::cout << "]" << std::endl;
6 }
Listing 2.103: The function that prints a vector to standard output
1 cl_float *loadVector(int size) {
2 int i;
3 cl_float *a;
4 a = new cl_float [size];
5 for (i = 0; i < size; i++) a [i] = (8 - i) / 4 + 1;
6 return a;
7 }
Listing 2.104: The function for vector initialization
The C++ version also initializes the vectors using an arithmetic sequence of values. The initialization function can be seen in listing 2.104. The vectors are also represented by an array of float values allocated using the new operator.
Initialization of OpenCL can be seen in listing 2.105. It gets the list of available platforms. The first one available is selected, and then the context is created on it. The context uses all of the available devices. This is a very simple way of initializing and does not guarantee efficiency (it can select a CPU, for example). Context creation is described in more detail in section 2.5.
The program is created using the function shown in listing 2.102. Then the ker- nel object is extracted from the program – this object is used for executing the kernel. The last step is to create a command queue. The command queue is essential for exe- cuting any commands on an OpenCL device. Every action (like execution of kernels,
1 void initialize() {
2 std::vector < cl::Platform > platforms;
3 cl::Platform::get(&platforms);
4 cl_context_properties cps [3] =
5 { CL_CONTEXT_PLATFORM, ( cl_context_properties )(platforms [0])(), 0 };
6 context = cl::Context(CL_DEVICE_TYPE_ALL, cps);
7 program = createProgram(context, "saxpy.cl");
8 saxpy_k = cl::Kernel(program, "saxpy");
9 queue = cl::CommandQueue(context, context.getInfo<CL_CONTEXT_DEVICES > () [0]);
10 }
Listing 2.105: The function that encapsulates all initialization steps
1 void release() {
2 }
Listing 2.106: Release of resources – in C++ it is empty
1 void saxpy(cl::Buffer c, const cl::Buffer a, const cl::Buffer x,
2 const cl::Buffer b, 3 int size) { 4 saxpy_k.setArg(0, c); 5 saxpy_k.setArg(1, a); 6 saxpy_k.setArg(2, x); 7 saxpy_k.setArg(3, b);
8 queue.enqueueNDRangeKernel(saxpy_k, cl::NullRange, cl::NDRange(
9 size), cl::NullRange);
10 }
Listing 2.107: Enqueue (run) the kernel
memory transfers and so on) is executed by the device via the command queue. The contexts and devices are more widely described in section 2.3 and 2.4.
In C++, the release of an OpenCL object is done on the exit of the last function that uses it, so the function in listing 2.106 is empty. It is left here only for comparison with the C version of the SAXPY application.
Execution of a kernel is shown in listing 2.107. This function sets parameters for the kernel using the cl::Kernel::setArg method. The kernel is enqueued using the method cl::CommandQueue::enqueueNDRangeKernel. Execution of the kernel is described in sections 2.8, 2.9 and 2.11.
The main function can be seen in listing 2.108. This function first initializes OpenCL objects using the function initialize shown in listing 2.105. Next, it allo- cates and initializes vectors using the function loadVector shown in listing 2.104. The buffers – x_dev, a_dev, b_dev and c_dev – are allocated using the constructor cl::Buffer. The class cl::Buffer encapsulates cl_mem. The values stored in the
1 int main(int argc, char **argv) {
2 initialize();
3 size_t size = 32;
4 cl_float *a, *b, *c, *x;
5 c = new cl_float [size];
6 a = loadVector(size);
7 x = loadVector(1);
8 b = loadVector(size);
9 printVector(a, size); printVector(x, 1); printVector(b, size);
10
11 cl::Buffer a_dev, x_dev, b_dev, c_dev;
12 a_dev = cl::Buffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
13 sizeof(cl_float) * size, a);
14 b_dev = cl::Buffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
15 sizeof(cl_float) * size, b);
16 c_dev = cl::Buffer(context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR,
17 sizeof(cl_float) * size, c);
18 x_dev = cl::Buffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
19 sizeof(cl_float), x);
20 saxpy(c_dev, a_dev, x_dev, b_dev, size);
21
22 queue.enqueueReadBuffer(c_dev, CL_TRUE, 0, sizeof(cl_float) * size, c);
23 printVector(c, size);
24
25 delete a; delete b; delete c;delete x;
26 release();
27 return EXIT_SUCCESS;
28 }
Listing 2.108: The main function
OpenCL buffers are initialized using data from arrays (vectors) a, b, c and x. For details about OpenCL memory, please refer to section 2.10.
The actual computation is executed using the function saxpy from listing 2.107. Data is then obtained using the command queue method cl::CommandQueue::enqueueReadBuffer that downloads data from the buffer into the host memory. Then the results are printed, using the function printVector. The last step is to free the memory and release resources.