CHAPTER 4 – IMAGE-PROCESSING THEORY AND IMPLEMENTAION
4.2. Methods of Improving Results
Like most computer vision tasks, the results can be affected by the testing
environment in both positive and negative ways. Lighting conditions may change or haze from smoke or fog may distort the target structure, either of which may result in the cross-correlation coefficient calculations being incorrect. This could result in an inaccurate detection of movement of the entire structure. There are many ways to
improve the reliability of the results when implementing both the hardware and software in this system.
4.2.1. Hardware Considerations
There are several considerations to be made when selecting the hardware to be used and placing it in the measurement environment. The smallest unit of measurement in an image is a pixel. A single pixel must be translated into a unit of measurement that applies to the real-world structure in the image. This thesis uses the measurement ratio of millimeters per pixel (mm/px) to represent distances within an image. Reducing this ratio to the smallest value possible will increase the accuracy of the measurements taken. This makes selecting an appropriate camera one of the most important factors considered when setting up the system in the field.
Higher resolution cameras capture more pixels, which will reduce the mm/px ratio. Placing the camera close to the target structure is also important as this reduces the field of vision of the image taken. The reduced field of vision allows for more detailed images of a smaller area to be taken, which also reduces the mm/px ratio. In cases where placing the camera close to the target structure is not possible, then the use of an optical zoom attachment for the camera may be warranted. Optical zoom attachments are
preferred to digital zoom as they do not distort the data being collected. A final method that can be used to increase resolution is to use a wide-angle lens to collect data on the axis that is most likely going to see translational movement. Wide-angle lenses have an aspect ratio of 16:9, which makes one dimension of the image much larger than the other. If the camera is lined up with the axis that is most likely going to see translational
movement, then more data can be taken along that axis due to the increased number of pixels along that area of the image [12]. This would require some forethought by the civil engineer setting up the camera.
4.2.2. Software Considerations
In order to compensate for environmental changes, many cameras today have the ability to adjust for varying levels of brightness. These adjustments may cause slight variations within the cross-correlation process. The simplest way to get around these issues is to preprocess each image before applying the cross-correlation processes. The two most common methods are converting the color space to hue saturation intensity (HSI) format and normalizing the image.
4.2.2.1. Converting the Color Space to HSI
The majority of cameras save images in the RGB format, which is good for displaying the image on a computer since most display adapters are designed to accept the RGB format. However, the purpose of computer vision is not to display the image, but to analyze the image and report on its features. In the RGB color space, each color (red, green, and blue) is stored within 8 bits of data to form the 24-bit RGB color space. If the brightness of the image changes, then each color value can vary dramatically. Varying brightness levels can be considered like noise on a signal, and in the case of the
RGB color space, this noise can have a very large effect on the quality of the image. The solution to this problem is to convert the color space of the image to HSI format. On the color wheel, the hue angle and the saturation value create a polar coordinate, which can be used to describe a color. The intensity value is used to define the brightness of the color being described. The hue angle and saturation values change minimally in varying brightness conditions while the intensity values change quite a bit. This reduces the noise caused by the brightness levels down to one variable instead of three. In some computer vision tasks, the intensity value can be ignored entirely, which reduces the amount of error caused by varying brightness. Normalizing the image can further reduce the amount of noise [12].
4.2.2.2. Normalizing
Normalization, or contrast stretching, is a process that changes the range of pixel intensity values across an image to reduce poor contrast due to glare or darkness. This process brings the image into a range that is more familiar or “normal” compared to the template image. Normalization is not a very processor-intensive task, but can greatly improve the cross-correlation results in poor lighting conditions. If the intensity range of an image is from 90 to 200 and the desired range is from 0 to 255, then the normalizing process begins with subtracting 90 from each pixel’s intensity value making the range 0 to 165. Then, each pixel’s intensity value is multiplied by 255/165 and then rounded to the nearest integer. The resulting pixel intensity range would be the desired 0 to 255. This process will allow two similar images taken under different brightness conditions to be cross-correlated correctly [12].
4.2.3. Methods of Speeding Up Processing 4.2.3.1. HSI Color Space
Moving to the HSI color space is not only good for improving the cross-
correlation results, but can also be used to improve the processing speed. It is possible to normalize the image while still in the RGB color space, but this requires the manipulation of all three colors and the use of nonlinear equations. However, while in the HSI color space, only the intensity component has to be changed with a simple linear equation. This makes normalizing in the HSI color space several times faster than in the RGB color space.
4.2.3.2. Reducing Resolution
As image sizes increase, the amount of time to search them for the template image takes longer. A method to reduce this time is to reduce the resolution of both the image being searched and the template. If both of these images are reduced to 50% of their original size, then the cross-correlation processing time will be reduced by 75%. This method, however, sacrifices the accuracy of the cross-correlation result. This method can be used to approximate the location of the template within the image. Using this
approximation, the original template and image can be cross-correlated, but only in the area of the approximation. This method will reduce the required processing time by a significant amount, but with a certain amount of risk. Lower resolution increases the chances of an incorrect approximation [12].
4.2.3.3. Searching a Predefined Area
When considering methods to improve processing speed, it is important to consider the task at hand. Solutions to computer vision tasks are unique and highly
tailored; therefore, optimizations can be tailored to suit the task. For the majority of the structures in question, it is unlikely that they will move more than a few centimeters at a time without anyone noticing. Therefore, it would make sense to begin searching the area in which the target was found last time. The search area could be considered to be 10% larger than the template image centered on the last known location of the template. If a strong peak correlation coefficient is found (greater than 0.95), then it can be assumed that the structure has not moved, and that there is no need to search the rest of the image. For high resolution images, this can save a large amount of processing time while still providing accurate data.