• No results found

Global and Constant Memory Allocation – Host Code

2.10 OpenCL Memory

2.10.3 Global and Constant Memory Allocation – Host Code

OpenCL allows for a high level of control in terms of memory allocation and man- agement. This allows for better utilization of hardware resources.

There are two types of buffers allocable from the host program: The memory buffer and the texture buffer. The first one is used in most cases; the other is used mostly for cooperation with OpenGL or Direct3D and will be discussed in chapter 3.4. OpenCL uses cl_mem type for identifying memory buffers. The C++ wrapper uses cl::Buffer for this purpose. The examples of allocation of different memory buffers are in listings 2.42 and 2.43.

cl_mem clCreateBuffer ( cl_context context, cl_mem_flags flags, size_t size, void *host_ptr, cl_int *errcode_ret);

Creates a buffer object.

clCreateBuffer

The function clCreateBuffer creates a memory buffer. It allocates memory within a context that can be accessed by kernels. The C++ equivalent is a constructor

1 // (1) 2 cl_float example_array[] = { 1, 2, 3, 4 }; 3 cl_float example_ret_arr[] = { 0, 0, 0, 0 }; 4 // (2) 5 cl_mem new_buffer_dev; 6 cl_mem copied_buffer_dev; 7 cl_mem in_host_buffer_dev; 8 // (3) 9 copied_buffer_dev = clCreateBuffer(context,

10 CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(cl_float) * 4, example_array,

11 &ret);

12 // (4)

13 new_buffer_dev = clCreateBuffer(context, 0, sizeof(cl_float) * 4, NULL, &ret);

14 // (5)

15 in_host_buffer_dev = clCreateBuffer(context,

16 CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, sizeof(cl_float) * 4, example_array,

17 &ret);

Listing 2.42: Memory buffers allocation – the C code.

1 // (1) 2 cl_float example_array[] = { 1, 2, 3, 4 }; 3 cl_float example_ret_arr[] = { 0, 0, 0, 0 }; 4 // (2) 5 cl::Buffer new_buffer_dev; 6 cl::Buffer copied_buffer_dev; 7 cl::Buffer in_host_buffer_dev; 8 // (3)

9 copied_buffer_dev = cl::Buffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,

10 sizeof(cl_float) * 4, example_array);

11 // (4)

12 new_buffer_dev = cl::Buffer(context, 0, sizeof(float) * 4);

13 // (5)

14 in_host_buffer_dev = cl::Buffer(context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,

15 sizeof(cl_float) * 4, example_array);

Listing 2.43: Memory buffers allocation – the C++ code.

cl::Buffer::Buffer ( const Context& context, cl_mem_flags flags, ::size_t size, void * host_ptr = NULL, cl_int * err = NULL );

The constructor that creates an OpenCL buffer object.

cl::Buffer::Buffer

cl::Buffer. Both methods allocate memory accessible by kernels. These memory regions are created within context. Kernels cannot access memory that has been allocated in different contexts. The actual location of the memory region allocated in this way is driven by a bitfield called cl_mem_flags that is the second parameter of both functions. If none of the flags is set, then the default is selected: read-write memory allocated in the device memory. The available bitfield values are:

• CL_MEM_READ_WRITE: This is the default. It means that an allocated memory buffer is readable and writable by kernels.

• CL_MEM_WRITE_ONLY : This buffer will be writable only for a kernel. The host code can both read and write to it.

• CL_MEM_READ_ONLY: The memory buffer will be read only from kernels. It is advised to use this type of memory buffer to store constant values that can be passed as constant memory to the kernels. It gives a hint about its purpose to the OpenCL platform.

• CL_MEM_USE_HOST_PTR: This type of memory buffer will use the given host memory as storage space. OpenCL implementation can, however, cache the data in this buffer, so this buffer in the host memory can be incoher- ent with the data visible by kernels. To assure the coherency of this me- mory, the programmer can use for example clEnqueueReadBuffer and clEnqueueWriteBuffer. This memory is very useful when dealing with algo- rithms run by the CPU. The buffers are used directly by the kernels, so there would be no memory transfers between the host memory and the device me- mory.

• CL_MEM_ALLOC_HOST_PTR: This value orders OpenCL to allocate a memory buffer in the host-accessible memory. This is also very useful for cooperation with the CPU. CL_MEM_ALLOC_HOST_PTR and CL_MEM_USE_HOST_PTR are mu- tually exclusive.

• CL_MEM_COPY_HOST_PTR: This value is also very commonly used. It means that OpenCL must copy the data pointed by the fourth value (*ptr) into the newly created memory buffer. This value is commonly used in examples because it helps to write more compact codes.

• CL_MEM_COPY_HOST_WRITE_ONLY: This flag specifies that the host will only write to the memory object (using OpenCL APIs that enqueue a write or a map for a write). This can be used to optimize write access from the host (e.g., enable write-combined allocations for memory objects for devices that communicate with the host over a system bus such as PCIe).

• CL_MEM_COPY_HOST_READ_ONLY: This flag specifies that the host will only read the memory object (using OpenCL APIs that enqueue a read or a map for read). CL_MEM_HOST_WRITE_ONLY and CL_MEM_HOST_READ_ONLY are mutually ex- clusive.

• CL_MEM_COPY_HOST_NO_ACCESS: This flag specifies that the host will not read or write the memory object. Note that this flag is mutually exclusive with CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_READ_ONLY.

Note that the distinction between constant and global memory applies only to the way a kernel sees them. From the host program, these are just memory buffers that can be accessed. On the other hand, the kernel does not have knowledge about physical location of memory buffers; it does not know if it is created, for example, with the CL_MEM_USE_HOST_PTR switch.

There is another way the host code allocates memory for a kernel – during set- ting parameters. Consider the kernel that takes a local memory pointer as a parame- ter. The host code sets that parameter for the kernel by passing the size of the buffer and NULL pointer. OpenCL implementation allocates the appropriate amount of lo- cal memory before the moment of kernel execution. This is described in detail in section 2.11.2.