Recently, a new videosearch task named “Known-item Search” (KIS) has emerged in TRECVID 2010 [ TRE ]. It aims to ﬁnd a desired video that has been seen or known before by a user. In this task, a user inputs a text description of the search task, and the system returns a ranked list of results with the expectation that the correct match is ranked as high as possible. Although research on KIS task is just beginning, researchers have discovered that text-based videosearch is the only eﬀective mean to tackle this problem [ CWZea10 ; CYNea10 ]. MRVS is similar to KIS but with one big diﬀerence: MRVS deals more with users’ personal media depositories where metadata and text descriptions are sparse, and visual matching of the desired content is often the primary mode of search. Hence the text-based techniques developed in KIS and earlier multimedia question answering approaches [ NWZ + 11 ] will not be eﬀective. In particular, there are four challenges when applying the text-based videosearch approaches to MRVS tasks. First, the text words associated with the desired video are often incomplete and vague. Second, a user may remember only fragments of visual contents instead of stories or the actual conversations in the desired video, hence they are not able to provide an accurate text query. Third, many of the visual scenes are hard to describe using text. Fourth, users sometimes only want to ﬁnd the desired video segments inside a long video, while the text-based approach is unable to support this because of the absence of text annotations at the video segment level. Hence, in MRVS task, users will need to issue various models of queries to recall his/her memory of a desired video.
This method is hugely relies on the efficiency, reliability, correctness of the Semantic Web database and the communication latencies between the user and Search engine. Since, the modern day high performance web can execute distributed and parallel computations, the performance factors of the existing web servers will suffice for this type of Query processing. The Sematic Web database is a dynamic database having huge collection of knowledge populated into it. It consists of relationships among various members, the various meaning of a particular word out of which, only one will be fitting for a particular context in a query. In order to obtain satisfactory performance, the Network communication rates must be sufficiently high, meaning that an Internet connection having a transfer rate less than a minimum threshold value will not result in a timely response. As per the report of Akamai, a global content delivery network the average internet download speed of the world is 3.1 Mbps (raising 4% from the previous quarter. The Query processing model discussed in this paper would require at least a 1Mbps internet connection so as to accept a query, process it and provide appropriate suggestions in a minimum time frame. When concerned with the predictions within a session, the system realistically assumes that the user takes some time to find the required information in a web page, which is one of the results of his first query in the session and this model will take some amount of time to perform NLP and to obtain relevant predictions and to compare them with the existing patterns and returns the eventual predictions to the user when he starts to type his next searchquery. The advantage of this model is that these computations occur in background and the probability that the time taken to compute NLP and to return the end results being more than the time between consecutive searches is very less.
Pool-based sampling has been the most prosperous branch of active learning, due to its eﬀectiveness. It has been widely used in many real world applications (e.g., text cat- egorization , videosearch , image classiﬁcation  and action retrieval ). Pool-based active learning was ﬁrst introduced by Lewis and Gale . The algorithm maintains a pool consisting of unlabelled instances and se- lects the most informative one at each iteration. The main issue with active learning in this scenario is how to mea- sure the informativeness. The most commonly used strate- gies are uncertainty sampling [10, 27, 28, 11] and query by committee [29, 30, 13]. There are also methods aiming at expected error reduction [31, 32]. Strategies such as un- certainty sampling and query by committee can select the instances closest to the decision boundary, which is most informative. However they only measure the value of a single instance, as a result they may suﬀer from querying similar instances repeatedly. To overcome this limitation, the local structure of the data can be considered while se- lecting queries. For example, clustering was introduced into active learning in [33, 34] to select the most repre- sentative instances. Representativeness is also taken into account in batch mode active learning , where the au- thors considered an instance’s similarity to the remaining unlabelled instances. Huang et.al  extended this min- max view of active learning to take into account both the cluster structure of unlabelled instances and the class as- signments of the labelled instances. More recently, Zhang et.al  used locally linear reconstruction to exploit the intrinsic geometrical structure of the data, so as to select the most representative instances.
Given an inquiry picture and one peculiarity modality, a customary visual re-ranking system treats the top-positioned pictures as pseudo positive occurrences which are inescapably boisterous, hard to uncover this corresponding property, and hence prompt second rate positioning execution. The creator proposed a novel picture re-ranking approach by presenting a Co-Regularized Multi-Graph Learning (Co-RMGL) structure, in which the intra- chart and between diagram stipulations are all the while forced to encode affinities in a solitary diagram and consistency crosswise over diverse diagrams. Additionally, pitifully managed learning determined by picture credits is performed to de clamour the pseudo marked cases, accordingly highlighting the interesting quality of individual gimmick modality.
In this paper, we propose a completely unique approach for understanding short texts. First, we introduce a mechanism to enrich short texts with concepts and co-going on phrases which is probably extracted from a probabilistic semantic community, called Probase. After that, every short text is represented as a 3,000- dimensional semantic feature vector. We then layout a greater than deep learning version that is stacked with the useful resource of three stacked denoising auto-encoders with unique and effective studying competencies, to do semantic hashing on those semantic function vectors for short texts. A two-level semi-supervised training approach is proposed to optimize the model such that it is able to capture the correlations and abstract features from short texts. When training is completed, the output is thresholded to be a 128-dimensional binary code that is seemed as a semantic hashing code for that enter textual content. We perform comprehensive experiments on short textual content centered obligations which includes statistics retrieval and sophistication. The big improvements on each task show that our enrichment mechanism ought to efficiently boom short textual content representations and the proposed auto-encoder based deep recurrent neural network getting to know model is capable of encode complex functions from input into the compact binary codes.
2.1 Neural-network Based Semantic Models Deep neural networks have shown the effectiveness in dis- covering hierarchical features from raw training data for var- ious tasks (Salakhutdinov and Hinton 2009; Collobert et al. 2011; Hinton and Salakhutdinov 2011; Tur et al. 2012; Socher et al. 2012; Huang et al. 2013; Shen et al. 2014; Hu et al. 2014). Among them, the DSSM (Huang et al. 2013) and the ARC-I (Hu et al. 2014) are the most related to our work. The DSSM uses a deep neural network to map the raw bag-of-words term vectors of search queries and Web doc- uments into semantic vectors. The relevance score of a pair of query and document is the cosine similarity of their cor- responding semantic vectors. However, bag-of-words repre- sentations cannot keep the contextual structure within the query or documents. In contrast, the ARC-I captures both word level and sentence level contextual structures. It uses pre-trained word embeddings to present the sentences, and then takes multiple convolutional and max-pooling layers to capture the global features for matching. A multi-layer per- ceptron is used to calculate the matching degree of the two sentences. One drawback of these models is that they only capture simple representations and structures of queries and documents, and are at the risk of losing important informa- tion for relevance analysis.
The problem of vocabulary mismatch in informa- tion retrieval where semantic overlap may exist while there is no lexical overlap, can be greatly alleviated by the use of query expansion (QE) techniques; whereby a query is reformulated to improve retrieval performance and obtain addi- tional relevant documents by expanding the origi- nal query with additional relevant terms, and re- weighting the terms in the expanded query (Xu and Croft, 2000; Rivas et al., 2014). This can also be done by learningsemantic classes or related candidate concepts in the text and subsequently tagging documents or content with these seman- tic concept tags, that could then serve as a means for either query-document keyword matching, or for query expansion, to facilitate downstream re- trieval or question answering tasks (Lin and Pan- tel, 2002; Xu and Croft, 2000; Lin and Pantel, 2001b; Xu et al., 2014; Bhagat and Ravichandran, 2008; Li et al., 2011; Tuarob et al., 2013; Halpin et al., 2007; Lin and Pantel, 2001a; McAuley and Yang, 2016). This is exactly the approach we adopt in order to achieve query expansion in an au- tomated, fully unsupervised fashion, using a neu- ral language model for local relevance feedback (Xu and Croft, 2000).
Semantic hashing is a new information retrieval method that converts texts into compact binary codes using deep neural networks (DNN). It could be viewed as a method to convert texts from a high dimensional vectors into a low- dimension binary vectors, and meantime the semantic relationship between texts is preserved by the compact binary codes as much as possible. Semantic hashing equip two main advantages: First, with non-linear transformations in each layer of the deep neural network, the model has great expressive power in capturing the abstract and complex correlations between the words in a text, and hence the meaning of the text; Second, it is able to represent a text by a compact, binary code, which enables fast retrieval. A deep neural network (DNN) is constructed with 3-layer stacked auto-encoders to perform semantic hashing for short texts. Each auto-encoder has specific learning functions, and we implement a two-stage semi-supervised training strategy, including a hierarchical pre-training and an overall fine- tuning process, to train the model. This auto-encoder based deep neural network (DNN) model is able to capture the abstract features and complex correlations from the input text such that the learned compact binary codes can be used to represent the meaning of that text.
SemanticVideo Retrieval seeks to match complex activities in a surveillance video to user described queries. In surveillance scenarios with noise and clutter usually present, visual uncertainties introduced by error-prone low-level detectors, classifiers and trackers compose a significant part of the semantic gap between user defined queries and the archive video. To bridge the gap, we propose a novel probabilistic activity localization formulation that incorporates learning of object attributes, between-object relationships, and object re-identification without activity- level training data. Our experiments demonstrate that the introduction of similarity learning components effectively compensate for noise and error in previous stages, and result in preferable performance on both aerial and ground surveillance videos. Considering the computational complexity of our similarity learning models, we attempt to develop a way of training complicated models efficiently while remaining good performance. As a proof-of-concept, we propose training deep neural networks for supervised learning of hash codes. With slight changes in the optimization formulation, we could explore the possibilities of incorporating the training framework for Person Re-Identification and related problems.
Concept-based search systems try to determine what it mean, not just what it say, unlike keyword search systems. Concept-based search returns hits on documents that are "about" the subject/theme that explores, even if the words in the document don't precisely match the words you enter into the query. Clustering systems are builded by using different methods which are very complex, depends on sophisticated linguistic and artificial intelligence theory and so on. and the software associated with it determines meaning by calculating the frequency with which certain important words appear or When several words or phrases that are related to a particular concept also appear close to each other in a text. The search engine concludes that the piece is "about" a certain subject by statistical analysis.
Literature reports use of machine learning algorithms to train a classifier and predict the category of an input query. However accuracy of such systems could be enhanced only when both discriminative features as well as sufficient sample size co exists, which is a rarity in a real world scenario. It must be noted that an ideal system should be context aware and be able to respond to the queries with high accuracy. Hence understanding the intent of the user is important for providing relevant responses to the user queries. Another significant factor that has to be taken care of is the ever growing size of the content. Optimal method of indexing the content and scaling the solution is also as important as the response of these systems. However with recent advances in cloud and distributed computing the scalability part could be solved.
In operation, the user first selects which of the videos to annotate. The videos of interest at the moment are transcoded at the request of the course team, and we serve them directly from the Notitia server, but the architecture supports using video served from anywhere on the Web.Having selected a video, the user will be presented with a timeline of any annotations previously made by herself and others: the user can skip to particular instants or durations noted in the annotations. Each vocabulary is controlled and accurately defined in the Linked Data Cloud. It owns a unique URI to distinguish it from other vocabularies, so there are no conflicts between different vocabularies and meanings. Different vocabularies, which describe the same thing, are linked using the owl: sameAs property as an equation definition. Meanwhile, a number of semantic annotations are used to build the relationships between different vocabularies, such as rdfs: subclass of and rdfs. : see Also. Once a vocabulary is applied to an annotation, the related vocabularies are associated with the annotation. Therefore, the collaborative and multilingual issues are well addressed. The most basic way to create an annotation is simply to pause the video at the appropriate point, enter duration if appropriate, and add a Semantic Web/Linked Data URI. This is sent to the server, and the annotation recorded.
Ease of use, instant sharing, and high image quality have resulted in abundant amounts video capture not only on social media outlets like Facebook and YouTube, but also personal devices including cell phones and computers. Around the world people upload 300 hours of videos per minute just on YouTube 1 . If a video is not tagged properly as per the content, it might lose its usability. Several solutions are available to manage, organize, and search still images. Applying similar techniques to video works well for short snippets, but breaks down for videos over a few minutes long. While computer vision techniques have significantly helped in organizing and searching still image data, these methods do not scale directly to videos, and are often computationally inefficient. Videos those are tens of minutes to several hours long remain a major technical challenge. Ensuring that important moments are preserved, a proud parent may record long segments of their baby’s first birthday party. While the videos may have captured cherished moments, they
Query suggestion is an assistive technology mechanism commonly used in search engines to enable a user to formulate their search queries. Query suggestion extends the feature of search engine which helps users to describe their information needs more clearly. Building an efficient query suggestion system, however, is very difficult due to the fundamental challenge of predicting users’ search intent, especially given the limited user context information. The proposed system based on both users’ query actual semantic meanings from online dictionary sites and learning from query logs which predicts user information needs. To achieve this, the first method uses the online dictionary resource to find the context of the query and second method mines the search logs using a similarity function to perform query clustering from search logs and provides the recommendations.
The earlier version of the web based query interface of BilVideo was initially handling only spatio-temporal and trajectory queries. Creating a textual query using a query language can become a very complex task when the number of conditions increases. To make the querying process easier, we designed a GUI for entering semantic queries (Figure 4.3). The users are able to enter semantic queries using this GUI without knowing the semanticquery language. They enter their queries visually according to our hierarchical semantic model by using a tree structure similar to the one used in the Video Annotator tool, which is used for showing the results of annotation process. The GUI was developed in Java as a standalone application considering the fact that it would be integrated into the web based query interface of the BilVideo System.
We propose a method for learningsemantic categories of words with minimal supervi- sion from web searchquery logs. Our me- thod is based on the Espresso algorithm (Pantel and Pennacchiotti, 2006) for ex- tracting binary lexical relations, but makes important modifications to handle query log data for the task of acquiring semantic categories. We present experimental results comparing our method with two state-of- the-art minimally supervised lexical know- ledge extraction systems using Japanese query log data, and show that our method achieves higher precision than the pre- viously proposed methods. We also show that the proposed method offers an addi- tional advantage for knowledge acquisition in an Asian language for which word seg- mentation is an issue, as the method utiliz- es no prior knowledge of word segmenta- tion, and is able to harvest new terms with correct word segmentation.
One way to represent such a data structure is by using RDF. RDF (Resource Description Framework) is a framework used to describe data. It can be used to describe data structures much like a domain class model. Using this net of connecting concepts, it is possible that the search engine understands the search context and can retrieve more accurate search results. Using this conceptual network it is also possible to do word sense disambiguation . If the user searches for a word with multiple meanings, the search engine chooses the most probable meaning by examining the other words in the query and the available concepts.
Calì, Andrea and De Virgilio, R. and Di Noia, T. and Menichetti, L. and Mirizzi, R. and Nardini, L.M. and Ostuni, V.C. and Rebecca, F. and Ungania, M. (2014) Semanticsearch in RealFoodTrade. In: Gottlob, G. and Perez, J. (eds.) AMW - 2014 Alberto Mendelzon Workshop on Foundations of Data Management. CEUR Workshop Proceedings 1189. CEUR Workshop Proceedings.
In practice, we can observe that some word al- terations are irrelevant and undesirable (as in the “Steve Jobs” case), and some other alterations have little impact on the retrieval effectiveness (for ex- ample, if we expand a word by a rarely used word form). In this study, we will address these two problems. Our goal is to select only appropriate word alterations to be used in query expansion. This is done for two purposes: On the one hand, we want to limit query traffic as much as possible when query expansion is performed. On the other hand, we also want to remove irrelevant expansion terms so that fewer irrelevant documents will be retrieved, thereby improve the retrieval effective- ness.
advantage is it doesn’t take user’s bandwidth, and the interaction between S3 and Amazon virtual machine will be much higher. The disadvantage is that everytime when a user adds data, a virtual machine will be created. When there are a lot of users, it will be costly. The other approach is to divide big files into segments. Limit the file size of each segment (for example 50M). Segments of a file have the same file name prefix, and the suffixes indicate the sequence number of segments. The advantage of this approach is it doesn’t consume much extra distributed resource. The disadvantage is it needs more complex file management, so the query efficiency may drop. Because our purpose is to study the common way of implementing similar applications, we didn’t solve this problem caused by Amazon S3 character in this research.