5.3 Real-Time Inverse Tone Mapping using Graphics Hardware and Multi-Core
5.3.2 Point-Based Density Estimation
Evaluation of density estimation on the CPU is computed in two steps. Firstly, samples around the evaluation point are collected using a spatial query. Secondly, Equation 5.14 is evaluated using gathered samples. However, a spatial query on a GPU can be not very efficient to im- plement, because hierarchical data structures can not be naturally mapped on current graphics hardware.
(a) (b) (c) (d)
Figure 5.23: An example of density estimation on GPU using Cone and Gaussian kernel applied to Bristol Bridge LDR image: a) Density estimation using a Cone kernel. b) The Cone kernel withα=1
used in a) in the domain [−1,1]×[−1,1]. c) Density estimation using a Gaussian kernel. d) The Gaussian kernel withσ=1used in c) in the domain[−3,3]×[−3,3].
GPUs are very fast in drawing primitives (fill-rate is 38 Gpixels per second for G80 class boards [145]), especially for simple primitives such as points. Furthermore, they have built-in instructions for texture and filtering operations. Therefore, the evaluation of density estimation can be redesigned to exploit these capabilities: fast primitive rendering, and texture support. The idea is to render a textured point for each light source sample, where the size of the point is equal to the radiusrs. This covers all pixels under the influences of that sample. The texture
used for these points is a discretised smoothing kernel used in Equation 5.14, which allows to perform filtering, see Figure 5.23. Furthermore, the accumulation of values, when two points are overlapped, is achieved by disabling the Z-buffer and enabling the Alpha blending [6].
Moreover, the implementation on GPU is straightforward, because there are only two steps to compute: load samples into a vertex buffer and render points from the vertex buffer using a short shader, see Listing 3 for the shader source code.
The method described before is designed for 2D still images, but the extension to videos is straightforward and it requires only a few modifications. The first step is to add samples from backward and forward frames to the vertex buffer used for rendering. Furthermore, each sample
1 sampler2D LDR_Image; 2 float inverse_gamma; 3 float4 poly1[3], poly2[3]; 4
5 float4 PixelShaderCRF(float2 texCoords)
6 {
7 float4 LDR=tex2D(LDR_Image, texCoord); 8 float4 ret=float4(0.0f,0.0f,0.0f,1.0f); 9
10 for(int i=0;i <3; i++)
11 {
12 //Inverse CRF using a quintic
13 ret[i]=((poly1[i].x*LDR[i]+poly1[i].y)*LDR[i]+poly1[i].z)* 14 LDR[i]+poly1[i].w; 15 ret[i]=(ret.x*LDR[i]+poly2[i].x)*LDR[i]+poly2[i].y; 16 } 17 return ret; 18 }; 19
20 float4 PixelShaderGamma(float2 texCoords)
21 {
22 return pow(tex2D(LDR_Image,texCoords),inverse_gamma);
23 };
Listing 1: The pixel shaders for linearisation of the LDR image. PixelShaderCRF linearises an LDR image using a CRF fitted into a quintic polynomial which coefficients are stored inpoly1andpoly2. This method can be used for known cameras, when the CRF can be measured, or for unknown ones using techniques in Section 3.1.1. PixelShaderGamma linearises an LDR image applying the inverse gamma function, which can be employed for DVDs and television programs [89].
Figure 5.24: An example of flattened 3D texture for storing a Gaussian temporal kernel using rt=5. From left to right slices for the current frame to the last forward/backward frame. Note that the kernel from backward and forward frames are symmetric, so there is the need to only store only one direction.
1 sampler2D LDR_Image; 2 sampler2D ExpandMap;
3
4 float Lwhite2, Lwhite2beta; 5 float2 SatValues;
6 float3 LUMINANCE={0.2126,0.7152,0.0722}; 7
8 float4 PS_Renge_Interpolation(float2 texCoord): COLOR
9 {
10 //Fetch an LDR pixel
11 float4 LDR=tex2D(LDR_Image, texCoord); 12 float Ld=dot(LDRI.xyz, LUMINANCE); 13 //Expansion
14 float tmp=Ld-1;
15 float Lw=Lwhite2beta*(tmp+sqrt(tmp*tmp+4*Ld/Lwhite2)); 16 //Expand Map
17 float4 emap=tex2D(ExpandMap, texCoord); 18 //Linear Interpolation 19 float Lfinal=Lw*emap.x+Ld*(1-emap.x); 20 //Saturation 21 tmp=Ld/Lfinal; 22 tmp=tmp*(3*tmp-2*tmp*tmp); 23 float sat=SatValues.x*(1-tmp)+SatValues.y*tmp;
24 return pow((LDR/Ld),sat)*Lfinal;
25 };
Listing 2: The pixel shader for the range expansion and linear interpolation. The first step is to load from memory the current pixel of the LDR image (line 11), and calculate the luminance (line 12). Then, range expansion is performed using Equation 5.7 (line 14-15), for speeding calculations up some constants are used. For example,Lwhite2beta=L2whiteβ, andLwhite2=L2white. Afterwards, lin- ear interpolation is performed (line 19) using the expand map previously loaded from memory (line 17). Finally, saturation is performed (line 21-24) using Equation 5.10 and Equation 5.11, where
SatValues.x=SMaxandSatValues.y=SMin.
1 sampler3D texFilter; 2
3 float PixelShaderPointSplatting(float mc_sample_val, float3 texCoords)
4 {
5 //Fetch to precomputed filter
6 float4 val=tex3D(texFilter,texCoords); 7 //Filter value times sample value 8 return val.x*mc_sample_val;
9 };
Listing 3: The pixel shader for 2D/3D density estimation. The shader simply draws all pixels of each textured point. The first step is to load from memory the texture used for encoding the kernel (line 6). Then, this value is multiplied times the luminance power of the sample (line 8) which is stored in the vertex buffer. Note thattexCoords.zrefers to the frame in time, for 2D evaluationtexCoords.z=0.
in the vertex buffer has an attributet∈[−5,5], which identifies the frame of the sample, where
t=0 is the current frame. The second step is to discretise the 3D smoothing kernels. The idea is to store each slice in time of the 3D kernel into a 3D texture, see Figure 5.24 for an example. When textured points are rendered, the access to the correct time slice of the 3D texture is achieved using the extra attributet added to each sample, see Listing 3 for the shader source code.
Note that the CPU version needs all frames to create the 3D-tree used for the spatial query. The GPU implementation needs only few backward and forwards frames, up to 5, which makes it suitable for real-time streaming of content.
Separate Joint Bilateral Up-Sampling
Joint bilateral up-sampling is a non-linear filter similar to a bilateral filter, see Appendix A, and it can not be computed separately as in the case of a Gaussian filter. Pham and van Vliet [161] proposed an approximation of the filter that produces good and indistinguishable results when max(σR,σS)is not too large. For example a maximum variance up to 8−10 can produce
acceptable results [161, 30].
Figure 5.25:The scheme of Joint Bilatereal Up-Sampling on the GPU. Two separate 1D joint bilateral filters are used to transfer edges from the LDR image to the expand map. The first filter is applied to columns and the second one to rows.
To speed-up computations the GPU version of the expansion algorithm uses a separate joint bilateral up-sampling inspired by the separation technique of Pham and van Vliet, see Figure
5.25 for the scheme of the up-sampling method. The approximation applies two 1D joint bilateral up-samplings, the first one is applied horizontally and the second one vertically on the result of the first one.
Light Sources Samples Clamping
On the GPU, it is difficult to create a clamping such as on the CPU version presented in Equa- tion 5.17. In fact, on the CPU this operation can be done immediately after the spatial radius query, because it returns the number of samples in the volume, and this avoids performing a density estimation evaluation. However, using the point-based approach the density estimation is performed as a first step, and it is not possible to know in advance the number of samples that will contribute to estimate a pixel. Therefore, there is the need to count these using a second channel in the kernel (equal to 1) during the density estimation, and to store them. Further- more, a second pass is needed to perform the clamping. As can be noticed this process would slow down performances, because an extra channel in the expand map and in the precomputed kernel, and a second pass are needed to be added. To have a more efficient solution in terms of computational time and memory, an approximated solution is to perform clamping directly on samples before rendering textured points, this operation is called precomputed clamping. For each samplex,P=
y:kx−yk<rs is computed. Then it is tested if the number of samples
inPis enough for a density estimation in its area,|P| ≥nsMin.
(a)
x y A
(b)
Figure 5.26: An example of precomputed clamping: a) The precomputed clamping using rsremoves the sampleyfromPfor areaAin red. However,Ais influenced for splatting byy. b) The use of2rs allows to include the sampleyinPfor the precomputed clamping.
However,Phas to be calculated using 2rsinstead ofrs, because it is possible to exclude some