This section focuses on creation and management of OpenCL contexts. The OpenCL context provides consistent way of performing computations. The context can be seen as an abstraction for computation in heterogeneous environments. Every imaginable object necessary for computation is available. The OpenCL standard defines context as:
The environment within which the kernels execute and the domain in which synchroniza- tion and memory management are defined. The context includes a set of devices, the me- mory accessible to those devices, the corresponding memory properties and one or more command-queues used to schedule execution of a kernel(s) or operations on memory ob- jects.[10]
Context
Every device within a context can communicate with other devices in the same context via memory buffers. The buffers can be shared. There is no direct method for memory transfer between different contexts. The same kernel code can be compiled for any or all devices within a context. This construction allows for better-fitting
1 std::vector < cl::Device > devices;
2 p.getDevices(CL_DEVICE_TYPE_ALL, &devices);
3 if (devices.size() > 0) {
4 for (size_t j = 0; j < devices.size(); j++) {
5 std::cout << "Platform " << index << " , device " << j << std::endl;
6 std::cout << "CL_DEVICE_NAME: " << devices [j].getInfo <CL_DEVICE_NAME> () <<
7 std::endl;
8 std::cout << "CL_DEVICE_VENDOR: " <<
9 devices [j].getInfo < CL_DEVICE_VENDOR > () << std::endl;
10 std::cout << "CL_DEVICE_MAX_COMPUTE_UNITS: " <<
11 devices [j].getInfo < CL_DEVICE_MAX_COMPUTE_UNITS > () << std::endl;
12 std::cout << "CL_DEVICE_MAX_CLOCK_FREQUENCY: " <<
13 devices [j].getInfo < CL_DEVICE_MAX_CLOCK_FREQUENCY > () << std::endl;
14 std::cout << "CL_DEVICE_LOCAL_MEM_SIZE: " <<
15 devices [j].getInfo < CL_DEVICE_LOCAL_MEM_SIZE > () << std::endl;
16 std::cout << "CL_DEVICE_GLOBAL_MEM_SIZE: " <<
17 devices [j].getInfo < CL_DEVICE_GLOBAL_MEM_SIZE > () << std::endl;
18 }
19 }
Listing 2.9: Getting the list of available devices for each platform along with some basic description – C++ code.
computation into existing hardware – for example some algorithms execute faster on the CPU and others on the GPU. Figure 2.7 shows a graphical explanation of different OpenCL objects in context.
Figure 2.7: Multiple contexts defined on one host.
Even though buffers can be shared among different kinds of devices, the pro- grammer has to be aware that there is the possibility that actual data will be trans- ferred via PCI-E bus or system bus in order to deliver it to different devices. The
OpenCL standard provides only an abstraction for this construct, and different im- plementations can provide different solutions for it.
Note that in the picture 2.7 there are multiple contexts on one host. This con- struction is possible because the OpenCL standard does not limit the number of con- texts created by the host program. This is most useful when there are multiple plat- forms available for different hardware.
2.5.1. Different Types of Devices
OpenCL defines enumerated type cl_device_type. This type is used for device classification and consists of the following values: CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU, CL_DEVICE_TYPE_ACCELERATOR, CL_DEVICE_TYPE_DEFAULT, CL_DEVICE_TYPE_CUSTOM and CL_DEVICE_TYPE_ALL.
Every kind of device matches the CL_DEVICE_TYPE_ALL. This is often used when there is need for selection of all devices. The CL_DEVICE_TYPE_DEFAULT is for se- lecting the default device in the system. It is used mostly for prototyping software. The OpenCL implementation decides which device is the default and, as such, should not be used in quality software.
2.5.2. CPU Device Type
The CL_DEVICE_TYPE_CPU corresponds to the central processing unit located in the host. The standard defines it as:
An OpenCL device that is the host processor. The host processor runs the OpenCL imple- mentations and is a single or multi-core CPU.[10]
CL_DEVICE_TYPE_CPU
This is the same device as the one executing the host code. It is able to execute the same kernels as any other OpenCL device, but it also provides a very convenient and efficient way for sharing memory buffers.
This device has direct access to the host memory. OpenCL buffers can use the host memory as storage that can be directly accessed from both the host and the device code.
The CPU device can even be used for ordinary sequential algorithms. In some circumstances, the code executed via the OpenCL CPU device can run faster than the same algorithm implemented as the host code. That is because the OpenCL compiler can tune the OpenCL program for given hardware. OpenCL implementation can be created by the same company that provides the CPU, so it can incorporate knowledge about some features unavailable in other CPUs for optimization. If no optimization is turned on in the host C compiler, then it is almost certain that the compiler embedded in the OpenCL implementation will produce faster code.
2.5.3. GPU Device Type
OpenCL specification defines GPU as:
An OpenCL device that is a GPU. By this we mean that the device can also be used to accelerate a 3D API such as OpenGL or DirectX.[10]
CL_DEVICE_TYPE_GPU
The devices matching CL_DEVICE_TYPE_GPU cl_device_type are usually graphic cards. These devices are not as fast as CPUs in sequential algorithm execu- tion but can be very efficient in massively parallel tasks. The GPU by definition can also accelerate 3D API such as OpenGL or Direct3D.
This kind of device allows for highly parallel kernel execution. For the hardware available during the time this book was written, GPUs were capable of executing more than 3000 work-items in parallel. This allows for standard algebra algorithms to run multiple times faster than on the CPU. For example, multiplication of matrices can be done in O(n) time for matrices smaller than 54x54, or a dot product in O(1) for vectors of size less than 3000.
Another feature of the GPU device is that it often allows for cooperation between OpenCL and graphic standards such as OpenGL and DirectX. The textures and other data can occupy regions in the GPU memory that can be shared between OpenCL programs and graphic APIs. This feature allows not only for generic computations but also for fast image processing with results seen in real time as well.
There is also a class of devices that are CL_DEVICE_TYPE_GPUs but do not allow for video output to a monitor. This is contrary to intuition, but not to the definition. Some Tesla accelerators do not output graphics to a screen but do support OpenGL and DirectX, so by definition they are GPUs.
2.5.4. Accelerator
Accelerators are dedicated OpenCL processors. By definition, an accelerator is: A dedicated OpenCL accelerator (for example, the IBM CELL Blade). These devices com- municate with the host processor using a peripheral interconnect such as PCIe.[10]
CL_DEVICE_TYPE_ACCELERATOR
These are devices that do not support graphic acceleration and are not processors on the host. This device type is only for computation. This distinction from the other two device types is necessary, because the accelerator does not use the same address space as the CPU on the host and cannot cooperate with graphic libraries. It is just for computation and nothing else.
2.5.5. Different Device Types – Summary
The programmer should consider the algorithm construction before choosing one particular device type. The CPU is best for traditional algorithms that run on OpenMP or threads. Massively parallel computations such as operations on big matrices, or particle calculation, can run fast on a GPU or ACCELERATOR. If the application uses OpenCL for one of the graphic generation stages, then the GPU would be the best choice.
The reader should note that the device type does not provide any information about performance of a given device. This information can be obtained using, for ex- ample, a benchmark. Sometimes it is good practice to let the end-user decide which available device to use. The device type should be used as information for the appli- cation about capabilities and architecture of hardware. The most common situation in whichj the device type is important is when OpenCL is used for graphic filters – a GPU would be best because it allows for cooperation with graphic APIs.
2.5.6. Context Initialization – by Device Type
The simplest way of creating an OpenCL context is by using the function clCreateContextFromType. An example function for creating context in this way can be seen in listing 2.10. This method is commonly used in cases where any device from the selected class is capable of efficient algorithm computation or prototyping applications.
cl_context clCreateContextFromType ( const cl_context_properties *properties, cl_device_type device_type, void (CL_CALLBACK *pfn_notify) (const char
*errinfo, const void *private_info, size_t cb, void *user_data),
void *user_data, cl_int *errcode_ret);
Create an OpenCL context from a device type that identifies the specific device(s) to use.
clCreateContextFromType
This function allows for choosing a class of devices and including it into a newly created context. The device type (devtype in the example) can be one or more device types joined with the "OR" operator. For example, parameter devtype can be CL_DEVICE_TYPE_GPU | CL_DEVICE_TYPE_ACCELERATOR or CL_DEVICE_TYPE_ALL. Different device types were described in section 2.5.1.
This function also requires an array of cl_context_properties for context setup. This array specifies a list of context property names and their corresponding values. Each property name is immediately followed by the corresponding desired value. The list is terminated with 0[10]. If this parameter is NULL, then the behavior of this function is implementation-specific. Some OpenCL implementations do not allow for a NULL value here. In the example, the list contains only the platform identifier. OpenCL 1.2 accepts multiple other possible properties that allow for better cooperation between OpenCL and other libraries like OpenGL or DirectX. Some of these properties will be described in section 3.4.
The platform ID is obtained using the function clGetPlatformIDs. This func- tion is used for getting the list of available platforms. In the example, it writes the platform list into the array platforms. The size is limited by i+1, because the plat- form is selected by its index in the platform list.
Note that the platform provides an environment for contexts. It is impossible to include devices from different platforms within one context.
The last three parameters for the function clCreateContextFromType are re- sponsible for selecting the callback function and for the error code. The error code is not checked in the example for reasons of simplicity. Error handling is described in section 2.7.
1 cl_context createContextFromType(cl_device_type devtype, int i) {
2 cl_context context;
3 cl_platform_id *platforms;
4 cl_uint platforms_n;
5 cl_context_properties cps [3] = { 0, 0, 0 };
6 platforms = (cl_platform_id*)malloc(sizeof(cl_platform_id) * (i + 1));
7 clGetPlatformIDs(i + 1, platforms, &platforms_n);
8 cps [0] = CL_CONTEXT_PLATFORM;
9 cps [1] = (cl_context_properties)(platforms [i]);
10 context = clCreateContextFromType(cps, devtype, NULL, NULL, NULL);
11 free(platforms);
12 return context;
13 }
Listing 2.10: OpenCL context initialization using clCreateContextFromType
1 cl_context createContextFromIndex(int pidx, int didx) {
2 cl_context context;
3 cl_platform_id *platforms;
4 cl_device_id *devices;
5 cl_context_properties cps [3] = { 0, 0, 0 };
6 devices = (cl_device_id *)malloc(sizeof(cl_device_id) * (didx + 1));
7 platforms = (cl_platform_id *)malloc(sizeof(cl_platform_id) * (pidx + 1));
8 clGetPlatformIDs(pidx + 1, platforms, NULL);
9 clGetDeviceIDs(platforms [pidx], CL_DEVICE_TYPE_ALL, didx + 1, devices, NULL);
10 cps [0] = CL_CONTEXT_PLATFORM;
11 cps [1] = (cl_context_properties)(platforms [pidx]);
12 context = clCreateContext(cps, 1, &devices [didx], NULL, NULL, NULL);
13 free(devices);
14 free(platforms);
15 return context;
16 }
Listing 2.11: OpenCL context initialization using clCreateContext
2.5.7. Context Initialization – Selecting Particular Device
Another way of creating an OpenCL context is by using the function clCreateContext. This function takes the list of devices and the selected platform to produce a configured OpenCL context.
cl_context clCreateContext ( const cl_context_properties *properties, cl_uint num_devices, const cl_device_id *devices, (void CL_CALLBACK *pfn_notify) (const char *errinfo, const void *private_info, size_t cb, void *user_data), void *user_data, cl_int *errcode_ret );
Creates an OpenCL context.
clCreateContext
The example helper function createContextFromIndex that is creating a con- text with one device can be seen in listing 2.11. In this example, parameter pidx is
the index of the platform on the platform list and parameter didx is the index of the device within the platform. The OpenCL standard does not specify if the order of device IDs will remain the same between querying. In the example, this assumption is made for simplicity’s sake. In most OpenCL implementations, it works this way. For production software, the best way is to keep the name of the selected device in the configuration file and compare it with the devname from consecutive calls:
1 clGetDeviceInfo(devID[i],CL_DEVICE_NAME, 1024, devname ,NULL);
In listing 2.11, the list of available platform IDs is obtained using clGetPlatformIDs. It stores this list in the array platforms, which is of size pidx+1.
The array devices is filled with device IDs for the selected platform, using func- tion clGetDeviceIDs. This list is of size didx+1.
So the selected IDs of the platform and the device are platforms[pidx] and devices[didx]. These values are used as parameters to the function clCreateContext. The ID platforms[pidx] is passed indirectly using the array cps of type cl_context_properties. The first parameter of the clCreateContext
function is the same as in the function clCreateContextFromType – the context properties. The second parameter is the selected device ID. The last three parameters can be used for error handling but in the example are set to NULL. Error handling is described in section 2.7.
After the context is created, the selected devices in the context remain the same. There is no way to add a device to a context or to remove any device from it.
2.5.8. Getting Information about Context
The function clGetContextInfo is used for getting information about an OpenCL context. The example for displaying the list of devices in a given context can be seen in listing 2.12. The function clGetContextInfo takes the enumeration constant value to determine which parameter to retrieve. In the example, there are two of them – CL_CONTEXT_NUM_DEVICES and CL_CONTEXT_DEVICES. The first one means that the number of devices included in the context should be retrieved, and the second is for querying the list of device IDs.
cl_int clGetContextInfo ( cl_context context, cl_context_info param_name, size_t param_value_size, void *param_value, size_t *param_value_size_ret);
Query information about a context.
clGetContextInfo
The number of devices in a context is used for setting the size of array for storing the list of device IDs. The loop is used for displaying some basic information about each device using clGetDeviceInfo.
1 void listDevicesInContext(cl_context context) {
2 int i;
3 cl_device_id *devices;
4 cl_uint devices_n = 0;
5 clGetContextInfo(context, CL_CONTEXT_NUM_DEVICES, sizeof(devices_n), &devices_n,
6 NULL);
7 if (devices_n > 0) {
8 char ret [1024];
9 devices = (cl_device_id *)malloc(sizeof(cl_device_id) * (devices_n + 1));
10 clGetContextInfo(context, CL_CONTEXT_DEVICES, devices_n * sizeof(cl_device_id),
11 devices,
12 NULL);
13 for (i = 0; i < devices_n; i++) {
14 printf("Device %d\n", i);
15 clGetDeviceInfo(devices [i], CL_DEVICE_NAME, 1024, ret, NULL);
16 printf("CL_DEVICE_NAME: %s\n", ret);
17 clGetDeviceInfo(devices [i], CL_DEVICE_VENDOR, 1024, ret, NULL);
18 printf("CL_DEVICE_VENDOR: %s\n", ret);
19 }
20 free(devices);
21 }
22 }
Listing 2.12: Listing devices in an OpenCL context