All technologies described so far are different building blocks for a CBIR system. Ex- cept for the optional relevance feedback, each part is required. Research in the CBIR area often requires a prototype implementation and thorough testing of the outcome. Instead of building a search engine from scratch each time, the application of an existing framework should be considered.
An early solution was the Virage search engine [8, 50]. It already allowed to extend the basic system by simple or complex primitives. MetaSEEk [9] is a meta search engine, supporting the systems VisualSEEk [125], WebSEEk [23], QBIC [85] and Virage.
GNU Image Finding Tool (GIFT) framework. It uses altered techniques from traditional information retrieval. The idea is to merge several sources of information for the retrieval itself, such as textual as well as visual features. Currently the system is being replaced by a new development, called “ComMon SensE: Cross-Modal Search Engine”.
Joshi et al. [63] describe the recent, scalable web architecture PARAgrab for image retrieval, able to handle CBIR and keywords as well as interactive image tagging. It is part of a research group supervised by James Z. Wang at the Carnegie Mellon University, who also developed SIMPLIcity [138] and other CBIR related research.
2.1.7.1. Merging/Fusion
When merging multiple sub queries into a single one, there are several possibilities to do so. “Multi-modal fusion” is a recent research area emerging from the needs of modern retrieval systems.
(a) weighted average [65] (b) convex, clustering [65] (c) concave, disjunctive [65]
(d) concave, conjunctive
Figure 2.4.: Query shape
single feature space for relevance feedback (figure 2.4). In fig. 2.4(d), the model used by the author of this thesis is visualized. It can be argued that all given query points are within the desired cluster. The search engine must now decide how the final similarity should be calculated. In the simplest case, all query points are averaged in the feature space to get a new query point. The generation of convex or concave similarity shapes is more complex, but provides advantages if the cluster has no circular borders. Especially the first two methods require the use of a simple distance measure. Otherwise the calculations may become overly complex.
The Rocchio’s formula [105] is based on the vector space model and represents the query type in figure 2.4(a):
Q1 = Q0 + β n1 X i=1 Ri n1 − γ n2 X i=1 Si n2 (2.1) where Q0 is the original query, Ri is the feature vector for the relevant document
i and n1 the number of relevant documents. This formula also considers non-relevant
documents Si by subtracting them from the query. The importance of the two sums can
be justified by the weights β and γ [57]. A second formula for query point movement by Porkaew et al. [98] would be:
C[j] = n X i=1 wiEi[j] n X i=1 wi (2.2)
where E1 to En are the n query objects/feature vectors and w1 to wn are the cor-
responding weights. C denotes the resulting centroid to be used as the next query. Parameter j stands for a single dimension of the feature vector. The second approach depicted in figure 2.4(b) assumes, that all relevant documents lie close to each other in the feature space. The query points are clustered to a maximum of N clusters. For each cluster, the point closest to the centroid is used and weighted according to the cluster size. These form the next multi point query M = hn, P, W, Di, where P is the set of points, W the corresponding weights and D is the distance function between two points. In this case — also proposed by Porkaew et al. [98] — the distance between M and any point x is:
D(M, x) =
n
X
i=1
wiD(Pi, x) (2.3)
The third approach (fig. 2.4(c)) is a multi point query, where each query point is taken as a separate query and the final result is merged from the sub results. This could be done by applying weighted sums, fuzzy sets or else. An example is the approach by Fagin [37], who interprets results as graded sets. They are basically lists sorted by similarity and set characteristics. He proposes to apply the basic Fuzzy rules defined by Zadeh [147]:
• Conjunction:
µA∧B(x) = min{µA(x), µB(x)} (AND)
• Disjunction:
µA∨B(x) = max{µA(x), µB(x)} (OR)
• Negation:
µ¬A(x) = 1 − µA(x) (NOT)
Using the disjunction would directly produce the concave shape depicted in fig. 2.4(c). In general, the merging approach which is most suitable for a given use case cannot be decided. This highly depends on the nature of the implemented features. Systems designed for experts should offer a choice, but for occasional use a decision has to be made by the administrator or the system itself.
This problem applies to a single feature with multiple query points and also to multiple disjunct features for a single query image. Further a more complex combination of these cases can be imagined. The proposed query language [95] offers the ability to create and use those queries. Though, it does not define the way of merging. A fourth merging alternative is the weighted sum of sub results (fig. 2.4(d)), where the similarities for all query points are added up and normalized [93]. Another, non-linear fusion approach called “super-kernel fusion” has been presented by Wu et al. [144].
2.1.7.2. Communication Protocols
It is remarkable that even after several years the existing prototypes are not yet estab- lished in daily life. Probably one reason is the diversity of approaches and the lack of interoperability. Recently emerged communication protocols could be the key to link many smaller systems together.
A communication protocol for this purpose, called Multimedia Retrieval Markup Lan- guage (MRML), is proposed by M¨uller et al. [81]. It is based on XML, has a formal specification and is already in use. The most recent proposal is version 2.0.
Currently, a de-facto standard for textual retrieval is emerging and gaining popularity. The Open Search Interface [1] is developed by A9.com. It wraps up several XML formats in order to allow federated search across the web. The whole standard is based on the assumption, that specialized engines are best suited to certain domains. Hence, a unified communication between clients and search engines is meant to be the best approach.