aims and c ontribu tions of thesis - Semantic labelling of road scenes using supervised and uns

This research aims to produce a working prototype of a system which can take raw vehicle sensor data and use it to produce a fully labelled description of the scene ahead. If this information can be reliably created then it would make possible several major advances in driver assistance and safety systems.Figure 1.5shows an example of the results produced by this prototype. Please refer toFigure 1.6for clarification on the steps involved and how these elements fit together.

Because the primary goal is to have this applied to personal vehicles the possible solutions must be restricted to those which can be fitted economically and do not need significant

1.3 aims and contributions of thesis 13

Figure 1.5: Example of classification results which will be presented inChapter 6. The image on the left has been automatically segmented using the fusion segmentation method presented in this thesis (white lines show segmentation boundaries). Each segment has then been analysed to determine texture, colour and 3D attributes. These are then fed into a trained SVM to produce the segment classification seen on the right. The types of surface which can be predicted are shown along the bottom.

configuration from the user. Based on prices today this would exclude the LUXLIDAR

from being used in this research however representatives from ibeo, the manufacturers of the LUX, seem confident that it will not be long until solid stateLIDARdevices can

be produced. The solid state models would be able to deliver similar performance to current devices but at only a fraction of the cost. For that reason this research assumes that affordableLIDARdevices will soon be common so the benefit of using them should

not be excluded from current research.

Also, because it would be ideal for the system to function without requiring significant amounts of human training, an unsupervised approach is also investigated to see if possibilities exist in this direction. With these aims in mind the following contributions have been produced during the course of this research:

• A novel method of calibrating aLIDARand stereo camera for the purpose of sensor

fusion. The published method allows an unrectified camera system to be aligned with aLIDARprojection without the need for any particular test pattern or precise

1.3 aims and contributions of thesis 14

reverse distortion model for the camera and the necessary transformation matrix to projectLIDARmeasurements directly onto the rectified or unrectified image

stream (seeChapter 3).

• A novel hybrid image segmentation method was developed, it combines a Canny edge detector (modified to create closed regions), the Edge Detection and Im- age Segmentation (EDISON) [32] mean-shift segmentation algorithm and a depth

map segmentation. This method combines the main strengths of the mean-shift algorithm; the clustering of areas of similar texture and of the Canny edge detector; preserving weaker and softer edges between regions. Together these segmentations are fused with the depth map segmentation which separates physical surfaces and creates regions which do not span multiple types of content (seeSection 5.2on page105).Figure 1.5shows an example of an image segmented using this method. • Demonstrate the use of combined visual and depth derived segment features to

describe image segments. The novel use of polynomial surface fitting on extracted point cloud of each segment allows for differentiating between flat man-made surfaces and more natural curved or sweeping objects. Gray-Level Co-occurrence Matrix (GLCM) texture descriptors greatly reduce the dimensional size of the

training data compared to similar techniques giving a solution which is capable of classifying a scene in real time (seeSection 5.4andSection 5.5).

• A comprehensive comparison study was performed to explore the difference in performance between the two most commonly applied supervised classification methods, in the context of road imagery classification. No existing study comparing Support Vector Machine (SVM)s and neural networks, on a dataset comprised of

texture and spatial training features, could be found. Manually labelled stereo images were used to train a set ofSVMclassifiers and a neural network using best available practices. The two systems were trained on a mixture of data from two

1.4 thesis outline 15

different driving environments; urban and rural locations, then the results are compared (seeChapter 6).Figure 1.5shows an example of classification results obtained from theSVM.

• A novel classification method capable of producing acceptable results using unsupervised learning which is an area under-explored in current literature. Segment descriptors are clustered into a large number of sub-classes then the only human interaction needed is to identify which sub-classes should form the members of a true class (super-class). Hand tagged images are used to provide performance data on the technique (seeChapter 7).

• In addition to the primary contributions above, this research has made available a collection of matched stereo andLIDARdata sets along with several batches of hand tagged images for use in future projects and performance comparisons. Also to help newcomers the data collection chapter documents, the common practical difficulties encountered when setting up a data collection platform and provided solutions for future researchers looking to continue this work (seeChapter 4). Due to the large size of this data it is currently available on request only.

In document Semantic labelling of road scenes using supervised and unsupervised machine learning with lidar stereo sensor fusion (Page 34-37)