• No results found

7.2 Possible Future Developments

7.2.3 Speedups

It is no secret that the code used to generate the results of this thesis is not exactly fast, still we claim to be able to do realtime applications at reasonable costs. The running times reported on video super resolution in Section 4.4.5 are representative for all three upscalings. We see that it is the initial low

resolution flow computations in the multiresolution pyramid that takes up most of the running time, but we know this step can be run in realtime for simpler variational flows (again we refer to the paper [11] by Bruhn et al.). To get a precise but computationally cheaper flow calculation, one could run such a simpler (linear) variational optical flow method or a block matching algorithm to get an initial estimate of the flow and then refine it with the computationally more expensive and accurate method on only a few of the finest scale levels of the pyramid used in our current upscalers. Alternatively one could also run the cheaper method and then only do very few iterations of the expensive method on each level of the pyramid.

No matter if we bring in an additional flow algorithm or not, we can also make our current variational flow method faster. And the same goes for the in- tensity calculations. We have used Gauss-Seidel solvers in our implementations, but switching to solving the systems using the faster successive over-relaxation (SOR) is just a matter of adding a very few lines of code. However, it will take quite some time to test and tune the altered code and the new SOR weight parameter. Further speedups could be gained switching to multigrid solvers, which have been used to successively speed up variational flow calculations, e.g. in [11], but this requires a larger and not straightforward restructuring of our implementations. General optimizations of the code the use of faster solvers will be beneficial to the performance of both software and hardware implemen- tations. Our variational methods runs the same filter on all pixels, making them highly parallelizable, which is an advantage in hardware implementations but also in software implementations since the growth in CPU speed is being replaced by a similar growth in the number of cores in CPUs.1 Still we do not

find it highly likely that we will reach realtime performance in software in the near future without using dedicated and specially developed hardware. We do not consider the high level variational parallelization presented by Kohlberger

et al. in [62] as something we want to apply to our upscaling algorithms since it

is not very efficient. We believe a direct parallelization splitting the frames into spatial regions with a small overlap between them and optimizing each region on its own processor with a shared memory for boundary data is the way to reach realtime performance in hardware. The hardware we have our minds set on using are HDTV FPGAs able to run just under 100 mathematical operations per pixel on a 1080 × 1920 HD sequence in realtime.

A very obvious way to gain a huge speedup is through integration of forward and backward flow calculations. So far we have ignored the fact that the forward flow from frame n to frame n+1 and the backward flow from frame n+1 to frame

n should be practically the same, they are just the direction specific warps or

mappings from one frame to the other. The two might not be exactly the same, which is mainly do to numerical imprecision and (dis)occlusions, but when one of them is done at any given level of the multiresolution pyramid, the reverse of it will with a very few extra iterations be the other, and we can then save a lot of processing time. Similar improvements are possible with warping from input sequence flow to output sequence flow in variational temporal super resolution. We have so far processed all pixels equally, but as some types of image

1The classic Moore’s law predicts a doubling in the number of transistors in CPUs every

18 months, but since cooling of single core processors is becoming a problem, the doubling in processing power every 18 months is now predicted to be obtained by growth in the number of cores in CPUs.

(sequence) content is very simple we could use very simple processing on these types of content. There is a tendency in our work as well as in many other works in the field of image (sequence) processing and analysis to focus on the difficult cases in a given data set, but for large portions (mainly the smooth regions) of the data very simple processing would do. The problem lies in identifying the troublesome regions; methods for edge detection, motion detection on inter- laced (and progressive) video, general segmentation, etc. are not fail safe and often rather complex in them selves, and thus saving advanced and expensive processing might come at a high price of either other expensive processing or drops in output quality.