SBIR is an emerging field and has been receiving extensive interests in the CBIR community. Although many SBIR approaches are adopted from CBIR, modifications and optimisations are needed at certain stages (especially at the feature extraction stage) to achieve good performance. The biggest challenge uniquely in SBIR is probably the cross-domain matching problem. Majority of shallow-feature approaches addresses this challenge by discarding colour and texture from images unifying sketches and image edgemaps into a single domain. As deep learning swept computer vision fields, direct sketch-image matching in SBIR is possible with deep CNNs. Other SBIR problems such as sketch ambiguity and imperfection could also be addressed via deep learning. Maintaining interactive retrieval speed is another crucial task in SBIR, which is often justified via compact feature extraction or indexing.
Feature representationis the core of an indexed-based SBIR system. A number of approaches (of which many are based on deep learning) have been proposed for a reliable and high-performance feature extraction scheme. Yet there is little empirical research about how the architecture, training data, training methodology and embedding dimension affect the performance of a SBIR network. The hypotheses discussed in sec. 1.4.1 have not been thoroughly verified. These shortfalls are explored in Chapter. 4 of this thesis.
Multi-modality SBIRis an interesting topic but has not gained much attention in the literature. Colour SBIR was addressed in several works but the approaches are either not very intuitive (in the way of composing colour and shape queries) or not publicly disclosed. Textual SBIR receives limited publications of which most use the text queries in a traditional text-based search to short-list the candidate images. Meanwhile, to our best knowledge texture SBIR remains an unexplored area. In the scope of this thesis, we contribute to the literature a study about colour SBIR in Chapter. 3 and a solution for texture/contextual SBIR in Chapter. 5.
Unsupervised and Semi-supervised learningare desirable learning for CBIR, consid- ering the vast majority of images on the Internet are unlabelled. It is even more attractive for SBIR since sketch datasets are much sparser than images and the cost to generate a labelled sketch set is much higher. However, most SBIR approaches that involve machine learning are supervised. Fine-grain SBIR even requires stricter supervision in which instance-level sketch-image pairs must be available at training phase. A study towards semi-supervised SBIR is conducted in Chapter. 6 of this thesis.
(a) QBIC [48].
(b) Sketch2Photo [24].
(c) MindFinder [181].
(d) Gboard [109]. (e) Detexify [90].
(f) SketchMatch [115]. (g) Emoji Recogniser [171].
Scalable Sketch-based Image Retrieval
using Colour Gradient Features
In this Chapter we address two complementary challenges of SBIR: (i) enhancing the capability of sketch queries to communicate search intent and (ii) efficient indexing for interactive retrieval speed. We tackle the former problem by integrating an additional modality (specifically, colour) into sketches at the same time retaining the convenience of sketched query creation. This enables both colour and shape of an object of interest to be searchable using a single line-art sketched query. We compare and contrast several pre- and post-fusion approaches for combining the two modalities into the search index. To deal with the latter problem, we propose a novel inverse-index for colour and shape representations that delivers scalable search with interactive query times — in the order of a few seconds over tens of millions (order 107) of images.
3.1
Introduction
Sketches are inherently multi-modal; they depict many aspects of appearance such as shape and colour. While shape (structure) is the dominant source of information to be captured within a sketch, colour can play an important role in distinguishing similar-shape objects. For example, a user wishes to search for images containing the sun, and she draws a circle as the query. A shape-only SBIR system may return whatever round objectse.g. moon, ball, wheel,etc. It would be helpful if the user can draw a red circle to search for the sun as a colour-supported SBIR system would return red and round objects which is more likely to be the sun. Eventually, incorporating colour as a new modality could help to reduce the
(a) (b) (c) (d)
Figure 3.1: Sketch examples: (a) blob-based colour sketches; (b) line-art black-white sketches and (c) our colour-intergrated line-art sketches. (d) our simple interface1 for drawing those sketches in (c).
semantic gap between the powerful and complex representation capability of human mind’s eye and a simple 2-D sketch (and the computer algorithm to encode it).
However, most existing approaches focus primarily on shape alone in SBIR while very few [48, 160, 36] consider colour as a fruitful feature. These approaches all require user- submitted queries to be blob-based sketches so that colour could be integrated, turning the sketches into colour images with flat texture (Fig. 3.1 (a)). This complicates the production of sketch queries, therefore defeats the original purpose of creating visual queries with ease and little effort. Instead, we propose using coloured strokes in line-art sketches to represent both shape and colour (Fig. 3.1 (c)). With the popularity of smart phones and tablets, this provides a simple yet powerful way of creating sketch queries. Although drawing with this technique is not always possible (for example if an inner line separates two colour regions but we can only use one colour to draw that line), it is especially useful for users who just want to outline coarse colour of the objects of interest rather than looking at fine-grain detail.
This Chapter presents a novel SBIR framework that uniquely accepts line-art colour drawings as queries. To deliver this framework two technical contributions are proposed:
• We extend the classical GF-HoG descriptor to enable colour-shape retrieval. We report a comprehensive investigation exploring integration points for colour as a new modality within the GF-HoG descriptor and index representation. We make several recommendations on how this second, novel modality of colour can be integrated for maximum accuracy and search efficiency (speed).
• We show how the slow linear and kd-tree based search strategies currently proposed for GF-HoG can be substituted for an efficient inverse index structure enabling scalability
of GF-HoG to over three million images (several orders of magnitude greater than largest image dataset previously demonstrated for this framework) whilst retaining retrieval speeds of less than one second.
The layout of the rest of this Chapter is as follows: sec. 3.2 describes how shape and colour features are extracted from sketches and photo images. Sec. 3.3 presents the method of inverse indexing and several fusion techniques for colour and shape features. Finally, experimental results are discussed in sec. 3.4.