wd,n
zd,n N
D βk
θd
α K δ
wd,n : the nth word from document d.
zd,n : the topic which generated the corresponding word.
θd : topic proportions for document d.
β1:K : the K topics (distributions over words).
α δ : hyperparameters of the model.
Figure 2.11 – Graphical model of LDA, showing the conditional dependencies of each variable. Plates denote replicated variables, empty circles are unobserved variables, and shaded are observed.
Section 5.1. Each cluster forms a visual word, with the entire set of clusters forming a vocabulary. When a feature vector is associated with a cluster, it is summarised by the word (an index number of the vocabulary). Thus, a set of feature vectors comprising an object is reduced into a set of words. In this thesis, a feature vector represents the shape of a part of an object, and so words form repeatably detectable parts.
Techniques such as topic models are then able to infer the statistical relationships between parts (words).
The simplest form of topic model is Latent Dirichlet Allocation (LDA) [10], a generative probabilistic model, shown in graphical form in Figure 2.11. In the context of document modelling, the input data consists of a set of documents, with each document containing a set of words from a fixed vocabulary. In the generative model, there are a predefined number of topics, with the following structure:
(1) Each topic βk is a distribution over the vocabulary of words.
(2) Each word w in a document d is generated from a topic (zd,n is the assignment of word n in document d to a topic).
(3) Each document has a distribution over topics θd.
This is known as a mixed membership model, as each document has a mixture of topics.
Distributions are encouraged to be sparse, so that documents are effectively assigned a few topics, and topics a few words. Topics loosely capture a set of co-occurring words. In the context of document analysis, this groups words from similar topics, for example a topic pertaining to genetics would contain the words ‘sequence’, ‘molecular’
and ‘genome’ [9]. These words were likely to co-occur in a document about genetics.
Translating to the problem of 3D object classification, each object is a document, and each local feature (associated to a cluster) is a word, or predefined shape. Topics then capture sets of loosely co-occurring shapes. Rather than requiring every object in a class to have one distribution of shapes (i.e. topic), topic modelling provides an extra layer; the number of topics can exceed the number of classes. As such, the abstraction goes from isolated shapes, to sets of co-occurring shapes, to classes. For instance, vehicles may have ‘wheels, bottom edges and corners along the ground’ as one topic, with another set of shapes further distinguishing trucks, sedans etc. Indeed, hierarchical forms of topic models exist [74]. There are many variants of topic models due to the ease of modifying the graphical model.
In order to classify objects from topics, one approach is presented in supervised topic models [11], where training labels are incorporated into the graphical model, allowing the class labels of test objects to be inferred. Alternately, topics can be considered as a form of dimensionality reduction, with classes inferred from standard classifiers such as k-NN, support vector machines (SVM) etc. applied to the topic vector.
In summary, local features can be classified independently and pooled to classify an object. Alternately, features can be clustered to find the equivalent of words in a common vocabulary, and then used in frameworks such as topic models to learn co-occurring sets of shapes. This provides a more complex but possibly more powerful approach.
2.5. CONCLUSION 39
2.5 Conclusion
Dynamic range imaging in the field from a mobile robot is an informative sensing modality. However, the nature of the environment and sensing technology results in relatively sparse 3D data, significant amounts of occlusion and variation in point density, all of which change with the relative pose of the sensor in the scene.
Objects can be segmented out from their surrounds using existing techniques based on simple properties like spatial connectivity, allowing whole objects to be analysed. This provides an efficient modular mechanism for classification, in contrast to techniques that first compute features everywhere and perform intensive tasks like CRFs for segmentation and classification together. The former approach is used in this thesis for speed and simplicity, although the second is a viable alternative once an effective local representation is developed.
Representing an object in terms of a feature reduces it into an easily comparable numerical form. Global, object-wide features require some form of positional and rotational normalisation, or may be invariant to these changes by design. However, the extensive amount of occlusion, density variation and class variability motivate the use of local features.
Once an object is represented by a set of local features, it can be classified by matching and learning from these parts. In one approach, the parts can be independently classified and the results pooled together. Another method is to cluster the local features, forming repeatable shapes, or words. This allows co-occurring shapes to be found with topic modelling, providing an intermediate level to represent classes.
The remaining chapters of this thesis are organised as follows: Chapter 3 examines basic building blocks for shape analysis, required for Chapter 4, which examines local features and provides the full formulation of the line image feature. An object dataset is introduced, and classified with k-NN. Chapter 5 then looks at processing required for using topic models to classify the 3D object dataset. These are arranged into a pipeline of increasing abstraction in Figure 2.12.
Sensed range image / 3D points Segmentation
Low level shape analysis Keypoints
Line image feature
k-NN classification Labelled dataset Clustering
Classification with topic models
Chapter 3 Building Blocks
Chapter 4
Capturing Complex Shape
Chapter 5
Classification from Parts
Figure 2.12 – An outline of this thesis as a pipeline from sensory data to classification, linked to each relevant section.