Related Work - Probabilistic temporal multimedia datamining

In this section, we describe the past works related to the following three main contribu- tions made in this work: (1) NDVC detection, (2) Novelty re-ranking of web video search results and (3) Conceptual clustering.

5.2.1 NDVC detection

NDVC problem can be dealt with at different levels of feature representation with their associated pros and cons:

• Global level content based feature approaches [30] [153]: Global compact and reliable fea-

tures generally referred to as signatures or fingerprints which summarize the global statistic of low-level features e.g color, motion and ordinal signature and prototype-based signature. The matching between signatures is usually through bin-to-bin distance measures. They are faster compared to the other two approaches mentioned below, but are limited to identifying almost identical videos, and can detect minor editing in the spatial and temporal domain.

• Shot level content based feature approaches [145]: Commonly, a video is first partitioned into

a set of shots based on editing cuts and transitions between frames, and then a representative keyframe is extracted to represent each shot. The matching between low-level features of keyframes can be done with a variety of matching algorithms such as dynamic time warping [4], as well as maximal and optimal bipartite graph matching etc. It can detect some copies

with simple to moderate levels of editing assuming content in keyframes is not changed dramatically. Most approaches indeed focus on keyframe level duplicate detection as they are computationally faster than a region level feature based approach and more accurate than the global level feature based approaches.

• Region level content based feature approaches [8] [77]: Each of the frames or keyframes is

segmented into regions, and a set of color, texture and shape features are computed for each region. Regions can be as simple as square blocks of certain size or it can be salient local regions detected over image scales, which locate local regions that are tolerant to geometric and photometric variations [84]. It can detect near duplicates in a more sophisticated way. The real challenge is involved in the algorithm to match and when it scales to too many local points for efficient comparison between two frames.

• Shot level concept based feature approaches [121]: a video is first partitioned into a set of

shots and then a representative keyframe is extracted to represent each shot. Each shot is de- noted by binary concept signature, absence equalling 0 and presence equalling 1, of semantic concepts. The matching between binary concept signature of keyframes can be done with sliding window algorithms. Video copy detection aims at determining whether a given query video sequence appears in a target video sequence, and if so, at what location. It can detect some of the copies with simple to moderate levels of editing assuming content in keyframes is not changed dramatically.

A majority of existing works in NDVC detection have considered only low-level features. There are important research work on understanding near-duplicate videos from a user centric approach in [19] [29]. Where they found the evidence that users perceive as near-duplicates videos that are not alike but which are visually similar and semantically related [29]. But, these works are mainly focus on understanding the user perception for near duplicate video and thus they do not provide actual solution to the problem of semantic NDVC detection. In [121], authors used binary concept signatures, but errors in binary classifications are frequent and the knowledge that a shot contains certain concept with a certain confidence cannot be exploited for semantic comparison ac- curately [10]. Also, their aim was limited to identifying content level near identical copies of the query video’s subsequences in other videos. All of the existing NDVC algorithms fall under the content based NDVC definition whereas we propose a more informative solution from a user’s per- spective that produces the novelty clusters of semantic NDVCs generated based on their semantic concepts’ confidence value over time.

5.2.2 Novelty re-ranking

Earlier methods of video search re-ranking were proposed to improve the initial list of results returned through a text-based search, based on the assumption that visually similar videos should have close ranking scores [64][83][137][12]. While they have shown significant precision improvements over text-based video searches, they have the problem of many near-duplicate results before enough novel videos are found [66]. Thus, recently a novelty re-ranking mechanism was proposed with the assumption that near duplicate video clips should be removed to increase the novelty in the top results [145][146]. We have shown the different existing methods proposed for near duplicate detection in section 5.2.1. Among which sophisticated local features like local interest points (or keypoints) extracted from keyframes are expected to be effective [145]. How- ever, a keyframe typically has hundreds or more interest points, each of which is represented by high dimensional feature vectors. Comparing a large number of local interest points between two keyframes is computationally very expensive for a long list of videos. Thus, to guarantee effective near-duplicate detection while meeting the speed requirements for large-scale video collections, a hierarchical method is proposed for novelty re-ranking in [145], which utilizes both global signatures and local keypoints for detecting near-duplicate web videos. And then a novel solution integrating content and contextual information is proposed to further improve efficiency [146].

But, none of these novelty re-ranking methods considers semantic level novelty re-ranking as we propose here in this chapter. Also, none of the algorithms consider the dynamic nature of video-sharing web-sites. In propose algorithm the newly added or deleted video either decides to create a new cluster, merge to an existing cluster or it may split an existing cluster. Other approaches compare the query video clips to a particular seed video and categorize it as near duplicate of that seed video [145] or generate query videos after applying predefined transformations to a part of the original video [121]. None of the above mentioned NDVC algorithms discover different conceptual categories, but simply classify as near duplicate or no near duplicate of the seed video.

5.2.3 Conceptual clustering

Clustering analysis is one popular datamining technique used to find patterns and groups in video data. Most existing methods for clustering focus on finding the optimum overall parti- tioning. However, these approaches cannot provide any descriptions of the clusters. Conceptual clustering not only partitions the data, but generates resulting clusters that can be summarized by a conceptual description. For example, COBWEB is an incremental clustering algorithm based on

probabilistic categorization trees. However, pure COBWEB [43] only supports nominal attributes. It cannot be used for abstracting numeric data. To incorporate numeric values, extensions of the COBWEB algorithm are proposed, such as COBWEB/3 [90], ECOBWEB [114], AUTOCLASS [27], Generality-based conceptual clustering (GCC)[136], and Error-based conceptual clustering (ECC)[32]. However, they are not suitable for video data due to spatial and temporal character- istics, and high-dimensional attributes. In the proposed method, we transform multivariate time series of concept confidence values into suitable categorical data to do COBWEB clustering. To our knowledge this is the first time conceptual clustering is done for video data considering semantic features.

In document Probabilistic temporal multimedia datamining (Page 105-108)