T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN

(1)

TOBCAT C ASE GEOVISAT – DE TECTIE EN BLURRING VAN PERS ONEN IN

PANORAMISCHE BEELDEN

Goal is to process 360 degree images and detect two object categories

1. Pedestrians, make sure they are blurred out so that privacy issues are no longer an actual issue. Would be good if this happens during the capturing of the data.

2. Number plates of cars, again to ensure privacy of people.

Example of such a 360 degree panoramic view:

There must be a possibility to go through the blurred images and apply an extra blurring manually if something was forgotten.

FIRST TEST – USING THE LATENTSVM BUILT-IN OPENCV DETECTOR

Since there is a built in object detector with several object models the OpenCV code was tested first. However processing an image of 4880x2400 pixels is quite time consuming up to 5 minutes to match to model to the complete image.

Some statistics, keeping in mind that the detection score should be > 0.3 (manually entered)

 4800*2400 pixels – all scale ranges – 5 minutes

 2400*1200 pixels – all scale ranges – 1.3 min

We notice two things, the smaller the resolution, the faster the processing. However if the image is resized, than the actual cars contain less pixel information. In the images we can see that on the right a car detected at full resolution is lost at half resolution. This is also a downside of this technique.

Going further to a resolution of 600*300 pixels results in not a single car being detected.

(2)

We applied the same detector but now used the pretrained pedestrian model. Processing times raised a bit higher since the model is smaller and thus more image patches are available for classification.

Here we see two remarkable effects. At full resolution we have multiple false positive detections but we do succeed in detecting all available pedestrians. Lowering the resolution again leads to losing the pedestrians completely.

Results can be seen below.

(3)

Our research group has an optimized C – implementation of the algorithm. This was used to perform detections on the complete dataset available.

If you would be interested in processing your data with these detectors that are not publicly available, then please mail to toon.goedeme@kuleuven.be.

SECOND TEST – USING THE EAVISE C-VERSION OF THE LATENTSVM DETECTOR AND PROCESSING THE RETRIEVED

RESULTS

A dataset of 450 images was manually annotated in order to get a ground truth dataset. Based on that data a scale – location mapping was made. Again the bounding box center Y value is plotted against the height of the bounding box.

(4)

In the above relation the two blue borders define regions in which objects can occur. Going outside those borders is not smart since that are sky and car areas. The red line is a linear relation mapped onto the data with the green borders describing a -3sigma / +3sigma range around that linear mapping.

We then processed the 450 images and retrieved the thresholded detections. As a threshold a score of 0 was used. The result is quite different and will be explained below.

Here we see a mapping of every single detection in those 450 images. This includes true positive and false positive detections. However from our training data we know that it is stupid to perform detections outside of our boundaries.

The larger spread of detections is mainly due to the fact that we are performing detections on several scale layers of the scale pyramid. Looking below, the green dots are actually the only points still interesting for us, so we can remove the others.

(5)

However, we now know that elements outside the green borders are quite impossible. The borders should capture about 98 percent of all the possible scale occurrences of pedestrians. It is safe to decide to remove all points that are not in those regions because they will not fulfil the following rules

1. It is a possible location in the image to have a pedestrian

2. The scale at that position is possible, so larger near the camera and smaller away from the camera

So only the red dots are actually detections that should be kept. Using this we can see already that a large set of false positive detections can be removed and thus the accuracy of our algorithm will increase.

(6)

We applied the same steps on two images 1. Manual annotations

2. Standard model detection using optimized C-implementation

(7)

We immediately notice that false positive detections occur in the air, even if we put the score high enough.

3. Using the retrieved model relation between scale and position to filter detections

(8)

POSSIBLE EXPANSIONS TO THIS APPROACH

We notice that several persons are missed due to being on a bicycle.

Using a bicycle model we do retrieve these objects and we can define the persons above.

The only downside of this extra approach is finding a decent threshold value to apply so that false positives can be filtered out without losing crucial detections. Again a space scale relation for bicycles could be built up here to reach a cleaner image and to classify each retrieved detection correctly.

MAKING THE BORDERS MORE STRICTLY

We know that when moving further away from the car, the variation in scales in the training data becomes clearly smaller. Therefore we want to redefine the borders.

(9)

By applying these new rules we get even better results and less false positive detections inside of the image database.

A check with images shows us we detect all pedestrians without a flood of false positives. We even succeed in removing objects that still crawled through the previous boundaries.

(10)

But one of the most problematic items are linear structures with the correct size and location who still have a large score. Some examples:

Thresholding even further is possible, losing these detections but giving raise to loosing good detections because of their lower score as seen below.

First image has detection score of -0.36 and applying a more strict score threshold will result in losing the detection. The second image has a score of -0.46 but here the question rises if seeing their backs is actually still a privacy issue here.

REMARK: a good discussion on what can be seen as a privacy invasion should be held here.

(11)

By adding the score > 0 threshold we keep only the green detections, which are actual pedestrians. Also an interface was built to calculate the smallest scale still classified as a pedestrian to get an insight in the limits of the model used.

Final result – pedestrians detected by using a smart approach!

(12)

However one element still results in a problem, being a yellow-blue traffic sign construction as seen here and it is consistent in several images:

In order to solve this last problem, two approaches were implemented and shall be discussed below.

SEGMENTING TRAFFIC POLES FOR A ROUNDABOUT

POST DETECTION CLASSIFICATION BASED ON A SELF CONSTRUCTED RELATION

Looking purely on the color distribution of such a pole, we can see the following properties

 Central yellow color region of the lower 2/3

 Upper 1/3 has a blue sign

 Color separation can be optimized in the YUV or HSV space Combine this with eliminating the detection of a similar sign, see figure2.

First collected a small set of testing data to learn some relations

(13)

Segmentation of yellow objects looking at different color spaces 1. Looking at the YUV space

 U space thresholding on darker regions

 V space thresholding on lighter regions

2. Looking at the HSV space

 S space thresholding on lighter regions

 H space thresholding on darker regions

(14)

3. Looking at the YMC color space

Thresholding on the yellow channel finally gave the clearest result for thresholding the poles.

After that we needed to define blue plates and try to segment those. This was done based on the normalized blue channel and to ensure that the top 50% has a certain amount of blue pixels.

After combining this all we had the following results into two rules we could classify the poles 1. Yellow poles – if percentage of yellow is crossed  no person

2. Blue plates – if a percentage in the top 50% region is crossed  no person On the following set this gave a pretty decent result

(15)

Looking at the output

We notice that most classifications are correctly except when people start wearing fluo jackets because that contains a large yellow component. A fix was made by switching to a learning technique called naïve bayes.

CONCLUSION: it is possible to make a decent result on trial and error and searching for good thresholds. However computer vision has more advance techniques that can learn those relations themselves.

Code can be found at

USB Stick > TOBCAT > Software > Matlab > segment_trafic_signs.m

POST DETECTION CLASSIFICATION BASED ON A NAÏVE BAYERS CLASSIFIER

Code of this approach can be found at

USB stick > TOBCAT > Software > cpp_windows > naive_bayes_model.cpp USB stick > TOBCAT > Software > cpp_windows > naive_bayes_detect.cpp Basically there are several steps that should be followed

1. Create a positive and negative set that resembles the data you want to separate.

POSITIVE SET CONTAINING TRAFFIC SIGNS

(16)

NEGATIVE SET CONTAINING PROBLEM CASES 2. Also create a test set that will be used to see if it all works.

TEST SET WITH CASES THAT SHOULD PASS

3. Select a set of features that can be used to represent the sample provided. For this we could use knowledge from the previous approach.

 YMC color space – good respondance in Y and M color channel and bad reaction in the C color channel.

 Selection of regions inside the image that are specific to a traffic sign. See image below. 1/3 of the image above can be ignored horizontally. The rest is divided vertically in a region between 1/3 and 2/3.

 Inside the region the average Y, M and C channel value is calculated.

4. Push those features into a feature vector for each training image and supply a corresponding label set. Use that to train the classifier.

5. Then use the resulting classifier to decide on the test set.

1 = traffic sign / 0 = something else