Homulle-Zimmerling_Thesis_UAV_Camera_System.pdf

(1)

Delft Univ ersity of T echnology

Bachelor Thesis

UAV Camera System for Object Searching and Tracking

Harald Homulle

4032578

Jörn Zimmerling

4047621

June 2012

Faculty of Electrical Engineering, Mathematics and Computer Science

(2)

UAV Thesis (June 22, 2012)

(3)

Preface

The third year of the Bachelor Electrical Engineering is completed with the bachelor thesis. With the bachelor thesis students show their academic capability, and ability to design a electronic system in a structured manner. At the TU Delft the bachelor thesis is embedded in the final bachelors project where students design and build a prototype for a client.

From this perspective Johan Melis and Thijs Durieux are our clients. The two master students from Aerospace Engineering are setting up a competition for Unmanned Aerial Vehicles (UAV). The contes-tants of the competition have to built a UAV with the ability to perform the subsequent tasks:

• track a ground vehicle; • track an aerial vehicle; • fly a track as fast as possible.

Our team, consisting of six Electrical Engineering students, developed an electronic system which per-forms the tracking of a ground vehicle. At the same time a group of ten Aerospace Engineering students designed and built a fixed wing UAV that can fly as fast as possible. Although our electronic system was tested on a quadrocopter UAV, it was designed for the fixed wing UAV developed at the Aerospace faculty.

Within a time span of seven weeks, the goal of this project was to implement a tracking system on a UAV with the ability to send a picture of the tracked vehicle to a ground station. Our team was split up in three groups: a transmission and visualization, a position and control and a camera system group.

The authors of this thesis have developed and implemented the imaging part. A camera system was developed which has the ability to detect an object and generate control signals for the UAV in order to follow the object. Furthermore, formulas are deduced for the estimation of the location and velocity of the ground vehicle.

The design choices and decisions regarding the camera system are described in this thesis . An overview of the hardware and software of the camera system is given. The choices made in the design trajectory are motivated and the performance of the implemented system is shown.

Harald Homulle & Jörn Zimmerling Delft, June 2012

Acknowledgements

We would like to thank our supervisor Dr. Ir. C.J.M. Verhoeven for the financial support and his enthused way of supervising. We would like to acknowledge our “clients” B.Sc. J. Melis and B.Sc. T. Durieux for their support and giving us the opportunity to participate in the UAV Project. The Microelectronics Department deserves special thanks for the provision of budget and two quadrocopters. We would like to express our thanks to M.Sc. M. van Dongen for his support on producing a PCB for the Toshiba Camera. Ir. S. Engelen for the technical discussions and the support of a Leopard Imaging Camera Module. We are grateful to M. de Vlieger for handling our orders and her help with various administrative tasks. Furthermore we like to thank the Raspberry Pi foundation for their support with a Raspberry Pi. Thanks also to our team members, whose team spirit, technical discussions and support were invaluable.

(4)

(5)

Summary

An embedded electronic camera system for unmanned aerial vehicles (UAV) was developed in the scope of the final bachelor project at the TU Delft in 2012. The aim of the project was to develop a prototype for a new UAV competition.

The electronic system of the prototype had to able to track an object, make a picture of it and send it to a ground station, where the picture and flight data is visualized. To whole project is divided into three subgroups each designing a specific part of the system. This thesis provides the design choices and performance results regarding the image processing and object detection of the whole system.

It was considered that a literature study would usefully supplement the choice of hard- and software. The literature study was extended by creating an overview of existing hardware for embedded video processing.

To perform embedded image processing the hardware components were selected first. It was decided that the system consists out of a camera, a processing board and a power converter. As camera a webcam was chosen, because it is robust, fast, small and it has the ability to buffer data, so no buffering on the processing board was needed. The Beagleboard, an ARM based prototyping board, was picked as a pro-cessing board, because it is the fastest of the compared boards. Furthermore it doesn’t use much power, is compatible with the hardware used by the other project groups and enough drivers for peripheral devices are available. A buck converter was selected to down convert the voltage of a battery to the lower input voltage needed by the Beagleboard, as it had the highest efficiency of the considered converters.

The Beagleboard had to be capable of multiple sensor communication, real time image processing and running location and velocity estimation algorithms. It was decided to run Linux on the Beagleboard, because it allows high level programming.

For prototyping reasons, C++ was used to implement the algorithms, because it is easy to implement, fast and can be handled well by the Beagleboard. The Open Computer Vision library was selected to make the implementation of the readout of the webcam fast and the implementation of the algorithms facile. Algorithms were implemented for colour detection, real location estimation of the object and velocity estimation.

Different object detection approaches were considered and colour detection by thresholding each pixel of the captured video frame, was selected as the most convenient algorithm, because it is the fastest and least complex algorithm of the compared methods. The design brief laid explicit constraints on a high frame rate such that colour thresholding was selected, as being the fastest algorithm.

The pixel coordinate of the object in the frame was found by a centre of mass calculation of the selected pixels. A moving average filter was implemented to remove noise from the object detection algorithm.

Trigonometric based algorithms were deduced to estimate the real location from the pixel coordi-nates. The velocity of the object was found by the displacement of the object between two frames over the time between those frames.

The designed prototype was tested in two different approaches; a test setup with a static camera and one with a moving camera, mounted on a quadrocopter. In both configurations it was shown the built system is capable of reliable object detection, location and velocity estimation.

(6)

From the results of these experiments, it could be concluded that the system can achieve an accuracy of up to 99 % in the static and 90 % in the moving camera setup for the estimation of both the location and the velocity. Furthermore, around 14 frames per second on a resolution of 320 × 240 could be pro-cessed using the Beagleboard and running all algorithms described above.

The prototype built within this bachelor project and documented in this thesis, is able to fulfil the tasks defined in the assignment. It therefore meets the specifications regarding the image processing and object detection defined in the design brief of the clients. Some suggestions for future improvements of the system are testing it on the fixed wing UAV and further enhancing and optimizing the various algorithms.

Although the system is not ready for the UAV competition yet, the first milestone on the way to a UAV with camera system has been reached within this thesis. The work performed in the scope of this thesis clearly contributed to the electronic system of the final fixed wing UAV demonstrating the task of the new UAV competition.

(7)

List of Figures

3.1 Analogue CMOS camera 20B44P. . . 10

3.2 Digital CMOS camera Toshiba TCM8230MD & additional needed breakout board. . . . 10

3.3 Leopard digital camera module LI-VM34LP. . . 11

3.4 Conrad mini webcam. . . 11

3.5 CMUcam 4 (Board & Omnivision 9665 camera). . . 12

3.6 Leopard board DM365. . . 13

3.7 Beagleboard XM. . . 13

3.8 Raspberry Pi. . . 14

3.9 Spartan-3 Develompent Board. . . 14

3.10 Buck converter TPS5430 and AR.Drone battery. . . 17

3.11 Hardware setup. . . 17

3.12 Overview of the full electronic UAV system. . . 18

4.1 Comparison of the RGB and HSV colour spaces. . . 21

4.2 Example of a set of pixels that are checked by the algorithm. . . 22

4.3 3D sketch used to estimate the location of the target. . . 23

4.4 Top view of the sketch given in Figure 4.3. . . 23

4.5 Steering signals generation; quadrocopter approach. . . 27

4.6 Image from a demo video of a red car driving on a road. . . 28

4.7 Image from a demo video; colour detection. . . 28

4.8 Image from a demo video; optical flow. . . 29

5.1 Semi-Outdoor testing setup. . . 32

5.2 Indoor testing setup. . . 32

5.3 Moving camera testing setup. . . 33

5.4 Two captured images from the semi-outdoor umbrella tracking experiment. . . 34

5.5 Objects on the predefined matrix as seen by the webcam. . . 35

5.6 Indoor path of object. . . 35

5.7 Indoor time line. . . 36

5.8 Noisy and filtered pixel location in the static camera setup. . . 36

5.9 Path of the red cap in the moving camera experiment. . . 37

B.1 Functional block diagram of the UAV system. . . 51

D.1 Projection points of pixels under different resolutions. . . 66

(10)

(11)

List of Tables

2.1 Overview of object detection / recognition techniques . . . 3

3.1 Overview of cameras. . . 9

3.2 Overview of processing boards. . . 12

3.3 Camera calibration. . . 16

3.4 Overview of power converters. . . 16

4.1 Overview of available object recognition algorithms. . . 20

5.1 Performance overview on two test setups. . . 37

5.2 Performance overview for software resizing. . . 38

5.3 Performance overview for hardware resizing. . . 38

5.4 Size overview for different image storage formats. . . 39

5.5 Overview of the measured power consumption and weight of the system. . . 39

6.1 The various system requirements that are achieved in the present prototype. . . 41

D.1 Measurement results accompanying Figure 5.7. . . 63

(12)

(13)

List of Abbreviations

AR.Drone Parrot AR.Drone quadrocopter (Prototyping)

ARM Advanced RISC Machine

B byte

b bit

BMP Bitmap

CCD Charge-Coupled Device

CMOS Complementary Metal Oxide Semiconductor

COCOA Moving Object Detection Framework

D/A Digital/Analogue conversion

DC Direct Current

FoV Field of View

FPGA Field-Programmable Gate Array

fps Frames per Second

GPIO General Purpose Input/Output port

GPS Global Positioning System

GPU Graphical processing unit

GV Ground Vehicle

HSV Hue, Saturation and Value

Hz Hertz

I2C Inter-Integrated Circuit

I/O Input/Output ports

IR Infra Red

JPEG Joint Photographic Experts Group

LDO Low-Dropout Regulator

LDR Laser Detection Radar

LVFG Lyapunov Vector Field Guidance

MODAT Moving Objects Detection and Tracking Framework

N.A. Not Applicable

OpenCV Open Computer Vision Library

OS Operating System

PCB Printed Circuit Board

PNG Portable Network Graphics

RGB Red, Green and Blue

RMS Root Mean Square

SD Secure Digital (Multimedia Flash Memory Card)

Sonar Sound Navigation and Ranging

TLD Track, Learn and Detect

TVFG Tangent Vector Field Guidance

UART Universal Asynchronous Receiver/Transmitter

UAV Unmanned Aerial Vehicle

USB Universal Serial Bus

V Voltage

(14)

VHDL VHSIC Hardware Description Language

VPSS Video Processing Subsystem

W Watt

(15)

Chapter 1

Introduction

Unmanned aerial vehicles, also known as drones, already perform a wide range of tasks in the military sector [1]. These tasks include surveillance, territory exploration, reconnaissance and even attacking.

The history of many military inventions, like GPS or the internet, shows that the civil market often adapts or enhances military products for civil use [2]. The same seems to happen with UAVs as the growing number of non-military applications shows. Applications like wildfire protection [3], dike or pipeline surveillance or even filming are growing in the last years.

The new UAV competition is an initiative by Johan Melis and Thijs Durieux. The competition consists of three tasks, first the contestants have to show the speed and manoeuvrability of their UAV, second a ground object has to be found in a field and finally two UAVs will have to find each other. To show the achievability of these goals, a prototype for the competition is desired.

Two teams will be working on this prototype: ten students of Aerospace Engineering and six students

of Electrical Engineering. The aerospace team will design and produce a fixed wing UAV1, the goal of

the electrical team is to develop the electronics of the system.

The electronic system of the developed UAV is roughly divided into three parts. One team deals with the connection of the UAV to the ground station. Furthermore, they handle and visualize the information exchanged between ground station and UAV. The second team works on the control of the UAV, the sensors and the merging of the subsystems. The last team, formed by the authors of this thesis, develops a camera system that recognizes and tracks objects. The electronic system cannot be tested on the fixed

wing UAV of the Aerospace team and will therefore be tested on a quadrocopter2AR.Drone.

It is the aim of this thesis to describe the design and implementation process of the camera system and the design choices made.

An overview of existing object detection and tracking methods is given to investigate which algo-rithm, camera and processing unit are most suitable for a UAV. Thereafter the system specifications are given as well as the experimental performance of the camera system. Furthermore the chosen algorithm used to detect and track objects is explained.

The requirements for the whole system are listed in the design brief in Appendix A. From those requirements, a flow chart for the various tasks is gathered in Appendix B, where all tasks of the final system are shown. According to Appendix B the aims of the camera subproject are

• to record a video in real time;

• to adapt the video for further processing on an embedded platform; • to detect a ground vehicle;

• to estimate the location of the ground vehicle; • and to estimate the velocity of the ground vehicle.

1_{A fixed wing UAV is essentially a small unmanned air plane with wings rather than rotors.}

2_{A quadrocopter UAV is a helicopter-like aircraft with four horizontal rotors, often arranged in a square layout.}

(16)

2

The studies of UAV systems were usefully supplemented with a literature study of related fields. A lot of research has been carried out on these subjects. In order for the raw camera data to be processable with common object detection and tracking algorithms, video stabilization may be required.

Therefore a wide range of stabilization algorithms, which differ in complexity and accuracy, have been proposed by various research groups [4–7].

Object recognition for UAVs can be either based on colour [8–10] or based on motion [6, 11, 12]. Stabilization can also be done after the object is detected by implementing a moving average filter [13] on the detection data.

The estimation of the motion of objects can be calculated by optical flow methods [14], so the motion vector of an object can be extracted from a video stream.

All papers report computational limitations of their algorithms due to the embedded system and re-quirement of real time data analysis, therefore special attention will be paid to the speed and complexity of the algorithms used in this thesis.

This thesis is organized as follows. In chapter 2 the current technological level of all related fields is discussed. Design choices are made in chapter 3 regarding a camera, a processing board and a power supply. The overview of the camera system and the total system is also given in that chapter. The software and algorithms are reported in chapter 4; formulas for the calculation of the position and velocity of the object are deduced. The performance of the camera system is given in chapter 5. The conclusions of this thesis and some suggestions for further work are provided in chapter 6.

(17)

Chapter 2

UAV Camera Systems in Literature

In this chapter a brief overview of the current technical level with respect to related fields is given. By analysing the literature and research performed on the field of UAV camera systems and image process-ing, this chapter seeks to point out the state of the art. Using the overview given in this literature study, design, implementation and algorithm choices can be made in the following chapters.

In section 2.1 a discussion of various object detection methods shows that object detection with a camera is desirable for the goals of this project. Subsequently section 2.2 gives an overview of software and hardware methods in order to stabilize camera pictures. Several camera object detection algorithms are a topic of discussion in section 2.3. Section 2.4 discusses techniques to extract the motion vector of an object from a video stream. Finally, section 2.5 shows different options to track an object once its location and motion are known.

2.1 Object Detection Methods

Depending on their application, camera, sonar, or laser systems are used to detect objects. Differences between the named methods can be found in Table 2.1 where these methods are listed and compared. The size and price range from small and cheap solutions up to large and expensive ones.

Table 2.1: Overview of object detection / recognition techniques

Attribute Camera Camera Sonar [15–17] LDR [18] Colour IR

Dimensions Two dimensional Two dimensional Three dimensional Three dimensional Information Three colour layers One heat layer Shape / Distance Shape / Distance Detection basis Colour / Motion Heat Shape Shape Size Very small Very small Large Large Weight Very small Very small Large Large High Data Rate Low Resolution Costly

Disadvantages Good Contrast Needed High heat contrast needed Range 20 m Mainly Distance Acquaintance Ambient Light Needed Mainly used under water Large and heavy Advantages High Detail Works at night Works at night High resolution

It can be seen that camera systems have the advantage of being very small, cheap and having a high resolution. However camera systems produce a much higher data rate than needed for object detection and a colour contrast is needed for reliable detection.

For Sonar (Sound Navigation and Radar) and LDR (Laser Detection Radar) systems, the detection is based on the shape of the object. Sonar detection is done with sound and is normally used underwater, LDR is based on the same principle, but uses lasers instead of sound. The reflection of either the sound or light waves on objects is used to create a 3D shape view of the area of interest. If the exact shape of the object is known, this shape can then be detected on the 3D shape view.

(18)

4 2.2. STABILIZATION OF THE CAMERA

The resolution of these systems is related to the used wavelength in both applications. The wavelength of light waves used for LDR is much smaller than the wavelength of acoustic waves used for Sonar so the resolution of LDR is much higher. However both systems are quite large and more expensive than camera systems.

As described in the design brief (in Appendix A) a camera is required. Therefore optical object recognition based on cameras is studied in more detail in the following sections.

2.2 Stabilization of the Camera

The camera can be stabilized both in hardware and in software. A (more) stable image results in better tracking results and thus a higher accuracy of the algorithm. The hardware approach will increase the weight of the system, but will result in a stable image. The software approach will increase the processing power needed and results are mostly worse than using hardware to control the motion.

Li and Ding [7] suggest the use of servos to control and minimize the camera movement. Stabilization inside the camera can be performed using gyroscopes to measure camera movement and shift the sensor in the opposite direction to balance the movement [4].

In order to stabilize a video in software the movement of the camera is computed by comparing mul-tiple frames with each other. The video is stabilized by shifting the frames to correct for the computed movement. The approaches found in literature differ by the algorithm to estimate the camera move-ment. [19] suggests to estimate the camera movement by a spacio-temporal approach, where blocks are matched in subsequent frames.

Another approach implemented in the COCOA framework for UAV object tracking proposed by [5,6] is “Ego Motion Compensation”. [5, 6] combine a feature based approach similar to [19] with a gradient based approach as proposed by [20]. The algorithm tends to adept itself during the process and learns to discriminate shapes, stabilization is based on shifting the image according to where the shape is detected. A combination of hardware and software video stabilization is described by [4]. Instead of estimating the movement with an algorithm, the movement is measured with gyroscopes. The frame correction is similar to software approaches and is based on image shifting.

A total different approach is not stabilizing the camera or the camera sensor, but stabilizing the mea-surement data. This can be done by implementing a moving average filter as described by [13] and weighting the previously measured values with a certain factor to establish a weighted average.

2.3 Object Detection with a Camera

In general two types of object recognition are described in literature: Colour recognition [8–10] and Movement recognition [6, 11, 12]. Colour recognition is based on the detection of particular colour bulbs of the object. For recognition based on colour [8], the processing is quite easy.

In each single frame the pixels with Red, Green and Blue (RGB) components are checked for a particular colour condition. After this colour thresholding the centre of mass of all pixels satisfying the condition is calculated. This algorithm is quite fast and needs just a few calculations per pixel. Similar is detection based on a certain brightness threshold, a black and white image is used in this method. A bright object can be tracked by thresholding the luminance of the pixels [9].

Depending on the tracked object, converting the video to other colour spaces, such as Hue, Saturation and Value (HSV) or Luminance and two Chrominances for the colours (YUV) , can be beneficial for the reliability of the algorithm. This technique can be implemented on a FPGA for fast parallel process-ing [10].

To detect moving objects in a video stream, subsequent frames are compared to each other in order to detect movement. This requires more processing power since multiple frames are correlated to each other whereas colour recognition can be performed by thresholding single frames.

(19)

CHAPTER 2. UAV CAMERA SYSTEMS IN LITERATURE 5

are buffered to extract the motion of objects within those frames. Assuming the targeted object is the moving object in the processed frames results in object recognition. On steady cameras, a reference frame is stored and pixels are checked for new objects appearing in the scene [12]. On moving cameras (like on UAVs) this is even more difficult, since the camera itself is already moving.

The Moving Object Detection and Tracking (MODAT) framework stitches subsequent camera frames to a “map”, on which motion is detected using a Gaussian Mixture Learning technique. This technique learns to discriminate background movements from the real target objects. A fast parallel processing unit is needed to calculate the motion in real time [6].

In both methods the camera and processing adds noise to the image, which leads to the detection of undesired objects (noise). Various noise reduction methods have been suggested in literature. Noise reduction can be achieved by applying some blur to the image, which removes detail and noise. Similar a median filter can remove variations between pixels in the image and eliminate noise as well as huge differences [6, 9]. A method of noise compensation used after object detection is called binary erosion, which checks the pixels and their neighbours for connectivity. Assuming the object consist of more than one pixel, only “connected” pixels are recognized as an object [10].

2.4 Velocity and Motion Vector Calculation

Besides the location of the object its velocity is needed, in order to track an object. The estimation of the objects location is discussed in section 2.3, whereas the estimation of the velocity and motion is covered in this section.

The calculation of a motion vector field from a video stream is called optical flow. The method is based on the assumption that the changes in two subsequent image frames are small and the colour of the object stays the same. Now the displacement of each pixel in two subsequent frames is calculated. The resulting vector field contains the direction and magnitude of movement of each pixel. On the UAV used in this project, the camera itself moves, thus the motion vector of the object is given by the difference between the background and object motion.

In literature many different ways of calculating the motion vector field on video streams are discussed. [14] discusses an optical flow algorithm to estimate the motion of objects. The proposed algorithm is focused on the motion of edges since the error rate of common optical flow algorithms is quite high at the edges of objects.

The calculation of the motion field of a wavelet transformed video is discussed by Lui et al. [21] and [22]. The wavelet transform of an image consists of the contour lines of an image. By calculating the movement of the contour lines of objects it is easier to assign a single motion vector to an object and to separate different objects.

Another accurate but computational intense method of motion estimation is proposed by Farnebäck [23]. This algorithm is based on intense tensor computations and is not suited for embedded solutions. However Farnebäck suggests to only calculate the displacement of pixels in the near neighbourhood of the object to save computation time.

Overall the methods to calculate the motion of objects discussed in literature are computationally intense. The development of motion detection algorithms suitable for embedded platforms is beyond the scope of this thesis. So only simple optical flow methods have been studied within the scope of this thesis.

2.5 Object Tracking with a Camera

Object detection over multiple frames can lead to object tracking and is necessary for autonomous UAVs. The generation of steering signals from a captured video in order to track an object is a complex task, especially for large velocity differences between the object and UAV. The camera system has to calculate

(20)

6 2.5. OBJECT TRACKING WITH A CAMERA

the movement of the tracked object in order to fly in its direction.

[7, 9, 24] propose the use of servomotors to change the position of the camera relative to the UAV. This minimizes the chance of losing the tracked object.

[24] describes the main two algorithms for path planning with UAVs: Tangent vector field guidance (TVFG) and Lyapunov vector field guidance (LVFG). According to Chen et al. [24] a combination of both algorithms is desired for UAVs with larger turning limit circles than the tracked object.

Common UAV autopilots, such as the Paparazzi project [25], implement such path planning algo-rithms. By supplying GPS coordinates and object speed to the autopilot, it is able to delineate a path to the specified location. Even complex tasks, like flying in an eight shape above a location, can be performed by Paparazzi.

(21)

Chapter 3

Build Setup & System Overview

The literature study in chapter 2 points out that the accuracy and reliability of real time object detection and motion estimation is mainly limited by the computational power of the processing unit. It might therefore not be unexpected that the choice of hardware components and especially the core processing unit of the system is one of the key points of this thesis. The algorithms presented in chapter 4 are directly affected by the hardware setup presented in the following sections.

In this chapter an overview of the hardware setup and the overall system is given. Section 3.1 points out the system requirements as defined in the design brief. Based on these requirements several cameras and processing units are analysed. The hardware configuration is explained in section 3.2 for the camera and in section 3.3 for the processing unit. Both sections give an overview of existing hardware in order to rate each of them with respect to the system requirements. The hardware choices made are motivated in section 3.4 The selection of the power converter is discussed in section 3.5. In section 3.6 an overview of the final system is provided.

3.1 System Requirements

In this section the requirements regarding the camera system are explained. First the explicit require-ments as noted in the design brief (Appendix A) are listed. Thereafter requirerequire-ments from a technical point of view based on findings in the literature study are stated.

3.1.1 Design Brief

In the design brief the following requirements regarding both the camera and the processing unit are listed:

• Speed

– The speed of the system is high enough to track vehicles up to 55 km/h at a maximum UAV speed of 100 km/h;

– The speed of the image processing reaches a minimum amount of at least 15 frames per second (fps) , implying that a picture is taken at least every 2 m.

• Object Tracking

– The object can be detected and followed from at least 50 m. • Power

– The system can fly one hour on a separate battery;

– Therefore it was decided that the systems power consumption is below 5 W . • Weight

– The maximum payload of the quadrocopter UAV is 100 g; – The maximum payload of the fixed wing UAV is 250 g; – The power source is excluded from this weight.

(22)

8 3.1. SYSTEM REQUIREMENTS

• Finalizing

– The system is documented in a manner which allows other persons to understand and expand the system;

– The system has enough computational power and free I/O ports in order to extend the system after delivery.

Test platform

As a test platform the quadrocopter AR.Drone was used. Although the system is designed for a fixed wing UAV, a quadrocopter is used for several reasons. First of all the destination platform, a fixed wing UAV, is developed at the aerospace faculty simultaneously with the camera system, such that the fixed wing UAV cannot be used as a test platform. Since a quadrocopter is able to hover, it can be used indoor in contrast to a fixed wing UAV which needs to move forward to take off. This allows weather and wind independent indoor testing. Finally, in the case the control over a quadrocopter is lost, it simply keeps hovering or crashes down. Whereas a fixed wing UAV becomes a projectile it is more likely for people to get hurt or for the UAV to be damaged. Because of these reasons a quadrocopter is used as testing platform.

3.1.2 Camera Requirements

Besides the requirements mentioned in the design brief, extra requirements derived from literature, the available timespan and the framework of the project are given:

• Usable resolution;

• Available drivers for a camera; • Field of View (FoV) < 45◦_; • Auto colour / light adjustment;

• Maximum price ofe 250 (Processing board, camera and power converter);

• Delivery in less than one week.

In literature the difficulty of processing large resolutions in real time on embedded devices is described. Therefore only cameras with a maximum resolution of around 640 × 480 are examined.

In the context of this thesis real time is defined as a soft real time criterion. A system is a soft real time system in the case the result of an algorithm loses its value after a certain deadline. In this context the term real time is interpreted as directly processing the image data before capturing a new frame. Thus the location of the target object is updated in the speed of the frame rate. A lower frame rate or delays in the algorithm result in a lower quality of service and accuracy of the system. But the system can still operate with a low frame rate, so that the real time criterion is not a hard criterion.

The field of view should be in the range 30-45◦. Lower than 30◦results in a narrow FoV, so that the

UAV should fly high before being able to oversee a large area. Higher than 45◦results in a wide FoV,

which results in large areas when flying on a normal high. Therefore the camera FoV needs to be in the specified range.

Since the system is used indoor as well as outdoor, auto colour and light adjustment is desired to prevent the image from getting dark or bright.

Because of the limited time the availability of drivers is an important prerequisite. Furthermore, the choice of products is limited to those deliverable in less than one week in view of the short project scope. 3.1.3 Processing Requirements

The processing takes place on an embedded device. This device has to be capable of handling the video data stream, recognizing and tracking the objects, controlling the UAV and processing sensor communication. Since the whole project is divided over three groups the processing board has to be compatible with the hardware choices made by other groups.

(23)

CHAPTER 3. BUILD SETUP & SYSTEM OVERVIEW 9

The wireless transmission and visualization group demanded that the processing board can commu-nicate via UART with a XBee for wireless data transmission to the ground station.

The position and control group of the UAV project requires that the processing unit can communicate

via UART with a barometer, via I2C with a GPS sensor and with the AR.Drone via its the serial UART

port. Thus the processing board needs to support UART and I2C communication via multiple I/O ports.

As the system is still in development after this bachelor thesis the processing board should still have free I/O ports which leave room for expansion at the delivery of the system. The processing power should be adequate to do those calculations in real time: minimum 15 fps encoding/decoding.

3.2 Camera

As mentioned above a lightweight, easy interfaceable camera is desired. The available camera types are • Mono colour cameras;

• Full colour cameras; • Infra red (IR) cameras.

Mono colour and IR have the advantage of having only one data layer, instead of three for full colour. This results in a decrease in processing time for mono colour and IR cameras, since less data has to be handled.

Colour has the advantage of containing more information, which makes object extraction based on colour possible. In literature colour cameras and colour recognition techniques are most common.

IR cameras only provide information about the heat of an object. No IR camera is used, since the heat of the tracked object is not defined and it cannot be assumed that an IR contrast of the object and background is given.

The use of a full colour camera allows a more reliable detection, than a mono colour camera. Al-though the use of a full colour camera results in a higher data rate, the system will benefit from a more accurate and reliable object detection. Because of this a full colour camera is used.

The availability of colour cameras is abundant. A selection of cameras that satisfies the requirements mentioned above are listed and explained below. An overview of the camera modules is given in Ta-ble 3.1. In total 5 cameras were compared, three of them were a digital camera, one analogue camera and one webcam.

Table 3.1: Overview of cameras.

Attribute TCM8230MD [26] LI-VM34LP [27] Omnivision 9665 [28] 20B44P [29] Conrad Mini Webcam Delivery speed One week One day Two weeks (with CMUcam) One week One day Open source software No No Yes No No Drivers available No Yes Yes No Yes Resolution 698 × 502 752 × 480 1304 × 1036 640 × 480 640 × 480 Frame rate 30 fps 60 fps 30 fps 25 fps 30 fps Data type Digital Digital Digital Analogue Digital Data output Parallel Parallel Parallel - Serial Output format RGB or YUV RGB or YUV RGB or YUV CVBS RGB Size 6 × 6 × 4.5 mm 10 × 10 mm 4 × 5 mm 17 × 17 × 12 mm 20 × 20 mm Power 120 mW 200 mW 80 mW 550 mW 1.5 W Costs e 7.50 e 85 e 89 e 89 e 15 Features Auto luminance control

Sub sampling

Luminance control (via I2_C)

Sub sampling

Automatic white balance Automatic luminance control

Back light compensation Ultra wide vibration spec.

Auto luminance & colour control Sub sampling Advantage Cheap, Small

Not over dimensioned

Drivers High frame rate

High frame rate

Compatible with processing units

-Easy to implement Robust Disadvantage No breakout board

No drivers

Expensive No auto luminance control

Not available without CMUcam Long delivery

A/D conversion needed

No sub sampling Computationally intense

(24)

10 3.2. CAMERA

3.2.1 Analogue (CMOS) Camera

Almost all cameras are based on a CCD (Charge-Coupled Device) or CMOS (Complementary Metal Oxide Semiconductor) sensor. Both devices produce digital images, where CMOS is the cheaper tech-nique [30]. Although both sensors produce digital videos, most CMOS cameras available convert the video on chip to a composite video format . This format is an analogue video standard, where the com-plete video stream is modulated on one wave form. It includes the three colours, the clock frequency and offset data.

For (colour) object recognition techniques a digital image is required. An A/D video converter either as a standalone device, or integrated on a microprocessor is needed to deliver digital pictures with an analogue camera. An analogue CMOS camera is shown in Figure 3.1.

Figure 3.1: Analogue CMOS camera 20B44P [29].

3.2.2 Digital (CMOS) Camera

Cameras that lack the D/A conversion of an analogue CMOS camera directly offer a parallel digital

output. Those cameras are also configurable via a I2C bus, where the output resolution and colour format

can be set. The digital outputs of the camera can be directly connected to a microprocessor for further image processing and object recognition.

The advantages over the analogue method are the lack of an extra needed (standalone) A/D converter

and an adjustable data output format (I2C). In Figure 3.2(a) a camera of about e 7.50 is shown [31].

This is a very small digital camera of 6 × 6 mm and weights less than a gram [26]. The problem with this camera is the absence of a prefabricated breakout board. Such a breakout board [32] is depicted in Figure 3.2(b).

(a) Toshiba TCM8230MD Camera (b) Toshiba TCM8230MD Breakout Board

Figure 3.2: Digital CMOS camera Toshiba TCM8230MD [26] & additional needed breakout board [32]. There are also more expensive, prefabricated boards with a camera already attached to them. Those boards are ready to be used and no manual work is needed. Such a board with a camera similar to the

above mentioned Toshiba camera costse 85 [33] and is made by Leopard Imaging [27]. In Figure 3.3

an example of an out of the box camera module is given.

The main advantage over the Toshiba camera is that no breakout board is needed. The camera has an interface that can be attached to the camera header of a Leopard Imaging Board and a Beagleboard, both processing boards are described in the following section.

(25)

Figure 3.3: Leopard digital camera module LI-VM34LP [27].

3.2.3 USB Webcam

A USB webcam is almost the same as a digital camera, but it is more robust because of the case its in. Furthermore it has its own small microprocessor that adapts the video data for transmission over USB. Besides the advantage of general available drivers and support, USB webcams transmit the video in a serial format rather than parallel like a digital CMOS camera.

Another difference is that the data is normally compressed, whereas the data of the parallel cameras is uncompressed. Normally, if the read out speed of both cameras is equal, a digital CMOS camera captures more video frames per second than a USB camera. Figure 3.4 shows an image of the Conrad mini webcam. This webcam is small and only weights 25 g which is less than other webcams available on the consumer market.

Figure 3.4: Conrad mini webcam.

3.3 Video Processing

Since the system is doing real time (video) processing, the processing should take place on the UAV. Therefore an onboard microprocessor is needed to run those applications.

Furthermore the total system has to communicate with a GPS-sensor, a barometer, a XBee and the AR.Drone itself. From the design requirements perspective the board should have enough I/O ports to be easy expandable. It might therefore not be unexpected that the processing board needs multiple USB

-ports and GPIOs (General Purpose Input/Output) , which are also accessible as UART or I2C ports.

For both setups, either with analogue or digital cameras, the microprocessor has to process the digital (or digitalized) video data. Not all microprocessors are suitable for the purpose of video processing. In Table 3.2 an overview of the available video processing boards described below is given.

(26)

12 3.3. VIDEO PROCESSING

Table 3.2: Overview of processing boards.

Attribute CMUcam 4 [28] Leopardboard 365 [27] BeagleBoard XM [34] Raspberry Pi [35] SPARTAN-3 [36] Delivery speed One week One day One day On Request One Day Price (excl. BTW) e 90 (incl. camera) e 100 e 160 e 35 e 120

Open source Yes No Yes Yes N.A.

Processor Parallax P8X32A ARM9 26EJ-S ARM Cortex A8 ARM11 76JZF-S XC3S200 FPGA, 200K Gates

Generation - ARMv5 ARMv7 ARMv6

-Speed 80 MHz 300 MHz 1 GHz 700 MHz 50MHz

RAM Not specified 128 MB 512 MB 256 MB 1 MB

Memory Slot µSD µSD µSD SD N.A.

Power Consumption < 2.5 W < 2 W < 2 W < 4 W Not specified

Voltage 5 V 5 V 5 V 5 V 5 V

Size Not specified 76 × 64 mm 83 × 83 mm 86 × 54 mm Not specified Parallel Camera Port Yes Yes Yes Yes∗ Yes Weight Not specified 54 g 74 g 44 g Not specified Advantage Cheap

Low power Easy interface with LI-camera

Fast GPU Camera interface

Light Cheap

Fast parallel processing Optimal use of Hardware Disadvantage Slow processor Not enough I/O ports Heavy

Expensive Unknown delivery time

Slow prototyping No drivers for peripherals ∗_{= No camera drivers available.}

3.3.1 CMUcam

There are some products that combine both a camera and microprocessor. They provide a camera and a circuitry board with a microprocessor that can be flashed to run a specified code (for example an object recognition code). The CMUcam [28] is an example of such an all in one solution. The software that can be flashed on the microprocessor is open source and can be adapted to run an object recognition code.

The microprocessor however is slow compared to other boards and can only analyse 10 fps at a low resolution (160 × 120 pixels). Also the storage capacity is small, so the possibility to buffer data is limited. It is also not suitable for further needs like place estimation and velocity calculations.

Figure 3.5: CMUcam 4 (Board & Omnivision 9665 camera) [37].

3.3.2 Leopardboard DM365

The company Leopard Imaging develops a system similar to the CMUcam, however it is not open source. Various Leopard Boards are available with a (more advanced) microprocessor on it, namely an ARM processor. Besides that, they also have a special Video Processing Subsystem (VPSS) , which will be able to handle (a part of) the video processing [27]. The Leopardboard 365 (Figure 3.6) is one example of the available boards.

The board has a parallel video port that is compatible with Leopard Imaging camera modules. The microprocessor is fast enough to process 50 million pixels per second, which would be more than

(27)

ade-CHAPTER 3. BUILD SETUP & SYSTEM OVERVIEW 13

quate for the purpose of the project.

The microprocessor has the ability to run the place and velocity calculation besides the recognition code. However, the board only has a single USB and no UART port, which makes the board unsuitable for communication with multiple sensors and a XBee.

Figure 3.6: Leopard board DM365 [27].

3.3.3 Beagleboard XM

The Beagleboard XM is a prototyping board, widely used by hobbyists. The board is based on the ARM Cortex A8 processor which runs at 1 GHz. From the boards discussed in this thesis this board has the most advanced processing unit. Furthermore a wide range of GPIOs and four USB ports are available on the board. This makes the board suitable as main processing board for the whole UAV (camera) system. The Beagleboard XM [34] is shown in Figure 3.7. A general purpose microprocessor is the core of this board, a Linux OS can be run on the board for easy prototyping. The board is also compatible with the Leopard Imaging Modules via the camera header. Although there are drivers for some Leopard camera modules available, only a small range of Linux kernels support Leopard Imaging cameras.

Compared to the other processing boards the Beagleboard has the disadvantage of being quite heavy (74 g). The final product has to have a weight of less than 250 g, but the AR.Drone which is used as testing platform, as a prove of concept, has a maximum payload of 100 g. This makes the Beagleboard unsuitable for the prove of concept on the AR.Drone, but suitable for the product targeted in the design brief.

Figure 3.7: Beagleboard XM.

(28)

14 3.3. VIDEO PROCESSING

3.3.4 Raspberry Pi

Similar to the Beagleboard, the Raspberry Pi [35] is an ARM based processing board. However it has an older version of the ARM processor. The main advantage over the Beagleboard is its smaller size, weight and lower price. The main disadvantage is its slower processor and its long delivery times. Specifications are listed in [35]. The Raspberry Pi is shown in Figure 3.8.

Figure 3.8: Raspberry Pi [38].

3.3.5 FPGA Spartan-3

In contrast to the other processing boards presented in this section the Spartan-3 is not a microprocessor but a FPGA. A FPGA is taken into account as Price et al. [10] report a significant speed up of vision algorithms on FPGAs. Images can be processed parallel on a FPGA rather than subsequently as on a microprocessor. An image can be processed in parallel and therefore a FPGA can process an image in fewer clock cycles than a microprocessor. However the clock speed of the Spartan-3 is twenty times lower than the one of the Beagleboard. On a FPGA the hardware can be configured dedicated for a specific problem.

The FPGA Spartan-3 Starter Kit [36] is shown in Figure 3.9. The board has three 40-pin expansion connectors and a serial port. Furthermore a range of leds, switches and buttons are provided to simplify the prototyping process.

The documentation does not contain any information on the size or weight of the board. This is a disadvantage since especially the weight is an important issue for this project. Furthermore two other subgroups of the project work in high level programming languages which are hard to map to a FPGA platform. Furthermore the drivers for cameras and sensors are not provided in VHDL format, the primary language to configure FPGAs. Therefore a microprocessor which handles sensor communication would be needed in cooperation with the FPGA in the case a FPGA is chosen.

(29)

3.4 Selection of the Hardware Components

In this section the design choices, regarding the hardware setup presented in section 3.6, are motivated and presented. In the first part subsection 3.4.1 the selection of the processing board is discussed. In subsection 3.4.2 the camera choice is motivated.

3.4.1 Selection of the Processing Unit

The choice for a processing board was made based on the previous section and Table 3.2.

The CMUcam4 is not suited for the camera system, since the processor is just fast enough for object recognition. However the processing board is not only responsible for object recognition, but has to read out all sensors, send data via a data link and estimate the location of the target. The CMUcam4 cannot handle this amount of processing steps.

The Leopardboard DM365 does not meet the requirements either. The number of I/O ports and speed of the processor is much smaller compared to the Beagleboard and the Raspberry Pi. The Leopardboard not only runs at a slower clock speed than the Beagleboard and the Raspberry Pi, but also uses an obsolete processor.

Although the delivery times for a Raspberry Pi on the consumer market are above 3 months, the officer for education of the Raspberry Pi foundation was persuaded this project deserves a Raspberry Pi via a letter of the authors. Because of the available I/O ports and its low weight the Raspberry Pi is suitable for this project. However the Raspberry Pi’s processor performance is inferior to the performance of the Beagleboard. Furthermore the Raspberry Pi’s power consumption is twice as high as the one of the Beagleboard, according to its specifications. It is therefore not used in the prototype.

Although being an elegant and fast solution, the FPGA Spartan-3 is not used in the prototype. The lack of drivers for peripheral sensors and cameras as well as the not specified weight and size of the board are the most important issues for not using the FPGA. In view of time the system would only be implementable with a FPGA and microprocessor in coexistence. However the system would get gratuitous complex with this approach.

Thus for the final system and prototype the Beagleboard XM is the best choice since it has a parallel camera port and the largest computational power, compared to the other boards. Its power consumption is

low. The Beagleboard has a wide range of GPIO ports that support I2C as well as UART communication.

The computational power and free I/O ports allow to extent the system after this project, which is a requirement of the clients.

3.4.2 Selection of the Camera

The choice of a camera was made, based on the overview given in section 3.2 and Table 3.1. To keep the prototype simple and fast implementable, it was chosen to use a camera with digital output, so the 20B55P was not chosen as its output is an analogue composite format.

In section 3.4 it was decided that the CMUcam4 had to less computational power to reach the system requirements. Thus the Omnivision 9665 sensor was not used, as it is dedicated for the CMUcam and has quite long delivery times.

The final system has to guarantee operation under various light conditions, like bright sunlight, shadow and even very cloudy weather or indoor light conditions. Therefore easy luminance control is a must. The luminance of the LI-VM34LP needs to be controlled externally in software, which is more complicated than auto luminance control. Thus the LI-VM34LP is not used in the system although it has the highest frame rate of the compared cameras. Furthermore the camera board is over ten times more expensive than the TCM8230MD and more than five times as expensive as the Conrad mini web-cam. This is not a reasonable price for a camera of equal specifications and therefore the LI-VM34LP was not chosen.

For the TCM8230MD no drivers and breakout board are available. Although a breakout board can be designed and fabricated on the faculty within one day, writing drivers can easily cost a week. In view

(30)

16 3.5. POWER CONVERTER

of time it was decided to use the Conrad mini webcam since its technical specifications meet the design requirements. Furthermore it is easy implementable under the condition an operating system is used on the processing platform. Although the USB webcam consumes much more power than the other cameras, the processing unit and camera consume only about 3.5 W according to their specifications. Thus with the combination of Beagleboard and webcam, the power requirement of maximum 5 W is kept. Since the camera needs to be mounted outside the UAV the system will benefit from a robust camera like a webcam.

An important parameter of a camera is its field of view. However the FoV of the Conrad mini webcam is not given in its documentation. So the field of view angles of the camera had to be determined. By measuring the visible height and width of an image. The camera was mounted at a height h of 0.77 m looking downwards to the ground. In this setting, the visible width w = 0.636 m and length l = 0.468 m were measured with a ruler. From these measurements the angles were calculated using Equation 3.1.

θ = 360 π · arctan w 2 · h and δ = 360 π · arctan l 2 · h (3.1)

For the used Conrad mini webcam, the results are given in Table 3.3. Table 3.3: Camera calibration.

Horizontal Angle θ 44.9◦

Vertical Angle δ 33.8◦

3.5 Power Converter

On a UAV the whole system is powered via a rechargeable battery. The whole system may have a maximum power consumption of 5 W. All devices are powered by the main processing board which itself runs on a 5 V DC source. The AR.Drone runs on a 11.1 V battery, but the power supply of the final fixed wing UAV is chosen by students at Aerospace Engineering. To make the system as transferable as possible the power converter should be able to handle a wide range of input voltages. Since the power consumption directly influences the maximum flight time, a power converter with a high efficiency is desired. Due to the limited budget, the price of the converter is an essential parameter as well. In Table 3.4 a selection of power converters is listed with the major design attributes.

Due to the limited payload of the testing platform, the weight is important, however all components weight less than a gram, so this is not an issue.

Table 3.4: Overview of power converters.

Attribute Buck converter LDO Voltage divider

TPS5430 [40] L7805ABP [41] Simple resistors

Delivery speed One day One day One day

Max. Power output 15 W 5 W >15 W

Price e 6 e 0.22 e 0.11

Efficiency 95% 44.3% (@ 11.1 V) <44.3% (@ 11.1 V)

Range input voltage 5.5-38 V 7-35 V fixed

(31)

The efficiency of the Low-Dropout Regulator (LDO) and Voltage divider is strongly dependent on the used input voltage and output current. According to [42] the efficiency of a LDO can be calculated according to,

Efficiency = IoutVout

(Iout+ Iquiescent)Vin

· 100%, (3.2)

where Iout denotes the output current, Vout the output voltage, Vin the input voltage and Iquiescent the

quiescent current needed for correct device operation. For the example of an AR.Drone and the LDO L7805ABP the efficiency is 44.3% according to 3.2.

The voltage divider is not used as it only performs at a single input voltage. In order to not double the power consumption of the system with an inefficient power converter, the buck converter TPS5430 is chosen as power converter for the system. The TPS5430 and the battery of the AR.Drone are shown in Figure 3.10.

Figure 3.10: Buck converter TPS5430 and AR.Drone battery.

3.6 System Overview

In this section a system overview of the entire UAV camera system is provided. As concluded in sec-tion 3.4 and secsec-tion 3.5 the camera system consists of the Conrad mini webcam, the Beagleboard XM as central processing unit and the TPS5430 buck converter. The hardware setup is shown in Figure 3.11.

Figure 3.11: Hardware setup with a webcam, the Beagleboard, the buck conveter and the AR.Drone battery.

(32)

18 3.6. SYSTEM OVERVIEW

Furthermore the position and control group decided to measure the position of the UAV with a GPS sensor and its height with a barometer. The wireless transmission group chose XBee Pros to transmit data from the UAV to the base station. In order to communicate with all these sensors in an easy way, a USB to UART shield is used. This shield provides easy software access to the sensors.

The function of the system is depicted in Figure 3.12. All tasks performed by the Beagleboard are coloured blue, whereas the tasks of the ground station are shown in red. The three groups are each responsible for a specific task. The position and control group reads the output of the GPS sensor, the barometer and AR.Drone sensors. After processing this data it is passed to the transmission and visualization group in order to be sent to the base station for visualization.

Furthermore, the data is passed to the camera system. The object detection and image processing part is responsible for capturing an image and detecting the target object. In the position and velocity estimation part the location of the object is calculated from the captured image and the sensor data passed from the position and control group. The latter produces control signals from the target position and velocity, which are send to the AR.Drone in order for the UAV to track the target.

The transmission and visualization group is responsible for transmitting all flight data to the base station and visualize it. The user can request a picture from the webcam. The picture is captured by the webcam and compressed in the image processing part of the system.

Control Signals AR.Drone Sensor Output AR.Drone

Position &

Control

GPS Sensor Outpout Barometer Sensor Outpout Camera Control Signals

Camera Image Signal Translation Position and Velocity Estimation Object Detection & Image Processing Signal Processing Target Postition & Velocity Speed, Position, Height UAV Compressed Camera Image Object Location Sensor Data System Input

Signal Processing for Wireless Transmission _{Visualization}

User Input

Camera

System

Transmission

& Visualization

Figure 3.12: Overview of the full electronic UAV system; the chart shows the interaction between the different project groups.

In order to program on a high level and run multiple applications at the same time on an embedded platform an operating system is required. Furthermore, an operating system is responsible for memory management and handling peripheral devices. Although the activity of the operating system slows down the camera system, it makes prototyping easier and faster.

The Beagleboard runs a minimal version of Ubuntu 11.10. with a XFCE graphical environment. This operating system supports a wide range of peripheral devices. The graphical user interface allows software development on the Beagleboard which saves valuable time.

(33)

Chapter 4

Object Detection & Tracking Algorithms

In this chapter the algorithms used to track an object and estimate its parameters are studied. In sec-tion 4.1 object detecsec-tion algorithms are a topic of discussion to find a fast, reliable algorithm, which suits the need of embedded real time object detection. Furthermore it is concluded that colour threshol-ding matches the need of this project, such that section 4.2 explains the colour thresholthreshol-ding algorithm in more detail. The algorithm tends to find the object in a picture and give the image coordinates. The position and velocity estimations find the real position and speed of the object that is detected in the first algorithm. The derivation of algorithms for position and velocity estimation are given in section 4.4 and section 4.5, respectively.

The optical flow algorithm explained in section 4.6 has the possibilities to calculate motion of points in subsequent video frames. This can be used to calculate the motion of the object and surroundings and so the real speed of the object. This is a more complicated algorithm compared to the previous algorithm for speed estimation.

The generation of steering signals is described in section 4.7. Finally the implementation and simu-lation results are given in section 4.8, in addition the system results are covered in chapter 5.

4.1 Proposed Algorithms

A wide range of object detection algorithms have been proposed in literature as shown in chapter 2. In this section four object detection algorithms are a topic of discussion: colour thresholding, mean/variance model and Gaussian mixture model background subtraction as well as the TLD algorithm better known as the predator algorithm.

The algorithms are compared on their reliability and accuracy of object detection. Special attention has been paid to their computational complexity and use of memory, since an embedded platform can only handle algorithms with low complexity in real time. These algorithm properties are listed together with their main advantages and disadvantages in Table 4.1.

With colour thresholding the colour values of each pixel are checked for a certain threshold condition. Pixels that pass the condition are detected as object. Thus a single computational operation is performed on each pixel. This makes the algorithm fast compared to the other options and suitable for real time applications. However pixels that belong to the background might pass the thresholding condition and disturb the outcome.

Background subtraction is a more complex algorithm than colour thresholding. For background sub-traction an image of the background as seen by the camera image is stored in a reference frame. Once an object enters the frame it can be detected by comparing the reference frame with the captured frame.

(34)

20 4.1. PROPOSED ALGORITHMS

Table 4.1: Overview of available object recognition algorithms.

Attribute Colour Background subtraction [43] TLD [44]

thresholding mean/variance model Gaussian mixture (Track, Learn, Detect)

Accuracy

Reliability Low Medium High Very high

Computational

Complexity Low Medium Medium Very high

Memory

Intensity Low Medium High Very high

Detect Multiple

Objects No Yes Yes Yes

Advantages

Fast Can handle multiple Objects

Easy implementable Adapt to background Learning algorithm

Can deal with Filters high frequency Can deal with luminance,

luminance changes background movement object and background changes

Disadvantages

Inaccurate Computational complex and memory intensive

Background may not vary fast Complex algorithms

Only detects Inaccurate under fast Stores a lot of

one Object changing luminance conditions images for learning feature

Lots of versions of background subtraction have been developed. However the two main approaches have been included in these investigations: the mean/variance model and the Gaussian mixture model.

In the mean/variance model the reference frame contains the mean and the variance of each colour component averaged over a number of frames, for every pixel. The background is thus modelled by a single Gaussian distribution for each pixel. The mean and variance of each pixel is updated after each iteration to make the algorithm adaptive.

In the case the Gaussian mixture model is used as background model, the background colour distri-bution is modelled by multiple Gaussian curves. Thus more complex backgrounds, like trees moving in the wind, can be modelled with the Gaussian mixture model. The drawback however is the fact that more data needs to be stored for this algorithm.

Although background subtraction is mainly used on static camera systems, [45] described background subtraction for mobile platforms.

The recently developed TDL algorithm, also known as predator algorithm, is the most advanced ob-ject detection algorithm studied in this context. The algorithm can track multiple obob-jects at the same time. Furthermore the algorithm is a learning based algorithm which stores images of the object in each geometrical and luminance position to optimize its tracking. The algorithm includes object detection based on shape as well as colour. However the algorithm requires advanced computers to be run on a reasonable frame rate. Thus the algorithm is too complex for embedded platforms.

As the final system needs to operate at high frame rates to not lose the tracked object, it was decided to implement the colour thresholding algorithm. The algorithm is easy implementable, fast and can deal with changing light conditions. It can track a single object which limits the system to only be functionable when no other object of the specified colour appears in the processed frame.

The background subtraction methods require more processing power than colour thresholding, but their use results in more reliable object detection. The algorithms have not been used as a high frame rate is more important for the purpose of object tracking with fast moving UAVs, than an accurate and re-liable algorithm. Furthermore background subtraction algorithms become unrere-liable under fast changing luminance conditions as the reference frame contains the background image and different light condi-tions.

(35)

CHAPTER 4. OBJECT DETECTION & TRACKING ALGORITHMS 21

4.2 Colour Detection

To detect an object in a captured video frame, colour thresholding on pixels is implemented.

4.2.1 RGB or HSV

Two colour spaces are shown in Figure 4.1. The RGB and HSV colour space are compared on their capability of colour separation. In subsection 4.2.2, red pixels need to be found.

(a) RGB colour space (b) HSV colour space

Figure 4.1: Comparison of the RGB and HSV colour spaces [46].

In the RGB colour space this is done by setting three thresholding criteria, however as can be seen in Figure 4.1(a) the transition between colours is very smooth.

In the HSV colour space, on the other hand, the separation of the colours is well-defined as can be seen on Figure 4.1(b). This is because the colour is only stored in the hue component. Thus if red needs to be detected, only thresholding on the hue component is sufficient, to get red pixels. For an even better result, thresholding on the saturation and value component yields only pixels that are not too dark or bright for example. Therefore the HSV colour space is used to do colour thresholding on the captured video frames.

4.2.2 Thresholding

After converting the incoming frame to the HSV colour space, colour thresholding is done, using the following principle.

In the converted frame, each pixel will be processed and tested for three threshold criteria concerning its hue, saturation and value components. If the pixel passes the thresholding, its location in the current frame is stored. A counter is incremented every time a pixel location is added. After all pixel locations are added up, the number is divided by that of the counter, which results in the centre of mass of the object.

This principle is illustrated in Figure 4.2. In Figure 4.2(a) the original set of pixels from a picture is shown. The algorithm finds the red pixels by checking their HSV values and colouring them black if they satisfy the given criteria. The calculated centre of mass is finally coloured white, as shown in Figure 4.2(b). This algorithm is given in pseudo code below. The image is loaded and the HSV components are thresholded, the pixel values written back are given as RGB colours.

(36)

22 4.3. STABILIZATION

(a) Original (b) Result after the described algorithm

Figure 4.2: Example of a set of pixels that are checked by the algorithm; black is accepted; centre of mass is white.

1: N_pixelsy = picture width in pixels

2: N_pixelsx = picture height in pixels

3: count= 0

4: x_centre= 0

5: ycentre= 0

6: for j = 0 → N_pixelsx do

7: for i = 0 → N_pixelsy do

8: if image(i, j, HSV ) < (TH1, TS1, TV1)k| image(i, j, HSV ) > (TH2, TS2, TV2) then

9: image(i, j, RGB) ← (0, 0, 0) 10: x_centre= xcentre+ i 11: ycentre= ycentre+ j 12: count+ + 13: end if 14: end for 15: end for

16: X_centreobj = xcentre/count

17: Y_centreobj = ycentre/count

18: image(X_centreobj ,Y_centreobj , RGB) ← (255, 255, 255)

4.3 Stabilization

In view of time it was decided that attaching servomotors to stabilize the camera, as proposed by [7], would make the system more complex than desired for a simple prototype. Further means of hardware stabilization were found to be either to heavy or not useful. Sensor shifting based stabilization is only done inside (expensive) cameras and can therefore not be used.

Stabilization in software on the contrary would demand to much system resources and slow down the processing. The third approach, a moving average filter [13], is therefore implemented.

A weighted moving average filter is not only a low process power demanding algorithm, but also a practical way to remove noise and obtain a stable system output. The displacement of the pixel locations of the tracked objects centre is a weighted average of the previous locations. An example of a weighted moving average filter of length l is given by

−−−−−−−−→ FTp+l;p+l+1= l

∑

n=1 an·−−−−−−−−→FTp+n;p+n+1 (4.1)

where an is the n−th filter coefficient. The filter coefficients themselves have to satisfy normalization

criteria l

∑

n=1

Homulle-Zimmerling_Thesis_UAV_Camera_System.pdf

Bachelor Thesis

UAV Camera System for Object Searching and Tracking

Harald Homulle

4032578

Jörn Zimmerling

4047621

Preface

Summary

Contents

List of Figures

List of Tables

List of Abbreviations

Chapter 1

Introduction

Chapter 2

UAV Camera Systems in Literature

2.1

Object Detection Methods

2.2

Stabilization of the Camera

2.3

Object Detection with a Camera

2.4

Velocity and Motion Vector Calculation

2.5

Object Tracking with a Camera

Chapter 3

Build Setup & System Overview

3.1

System Requirements

3.2

Camera

3.3

Video Processing

3.4

Selection of the Hardware Components

3.5

Power Converter

3.6

System Overview

Position &

Control

Camera

System

Transmission

& Visualization

Chapter 4

Object Detection & Tracking Algorithms

4.1

Proposed Algorithms

4.2

Colour Detection

4.3

Stabilization

∑

∑