One of the most challenging aspects of the MPEG-7 stan- dard in terms of application is to use it eﬃciently. The selec- tion of the optimum set of DSs and Descriptors for a given application is an open issue. Even if the identification of the basic features that have to be represented is a simple task, the selection of specific descriptors may not be straightfor- ward: for example, DominantColor versus ScalableColorHis- togram or MotionTrajectory versus ParametricMotion, and so forth. Moreover, the real power of the standard will be ob- tained when DSs and Descriptors are jointly used and when the entire description is considered as a whole, for example, taking into account the various relationships between seg- ments in trees or graphs.
The popularity of digital images is promptly increasing due to advance technologies and innovations in digital Image acquisition and storage technology. It has led to tremendous growth in large image database. Due to growing amount of multimedia data accelerates the need of standards for multimedia content description efficient image retrieval technique. The initial step in standardization was by establishing a standard for publishing multimedia data. MPEG-7 describes multimedia data like image, audio and video . Content based image retrieval is an image retrieval technique for retrieving semantically-relevant images efficiently from an image database based on automatically-derived image features. The performance of content-based image retrieval system is mainly limited by the gap between low-level features and high-level concepts of images . To narrow down this gap, region based image retrieval techniques are introduced.
lope, both derived from the MPEG-7 Audio Standard , and an adaptive dynamic thresholding is introduced. However, the most significant proposal we make is that of a fusion scheme, which combines the partial results so as to achieve better scores than those obtained by the same algorithm without fusion. Every speaker is modelled by a Gaussian probability density function and whenever more information is available the speaker model is updated . The evaluation criterion used is the BIC-type criterion proposed by Ajmera et al . A multiple pass algorithm employing a distinct feature at each pass is utilized. Each pass is executed independently from others because this has the advantage that if time efficiency is of greater importance than performance, we can prune the last passes at the expense of performance deterioration.
The ideal automation tool to assist in metadata creation for multimedia source material would process all low- level visual content, define shot boundaries, extract video keyframes, analyze audio content, perform speech to text conversion, extract keywords, perform optical character recognition on textual video frames, and present a domain expert with a summary of these details for annotation. While a single tool to accomplish those tasks is not yet available, great strides have been made in those individual areas of research. The software subject for evaluation in this regard is the IBM MPEG-7
This paper describes a system where audio is segmented into silence and non-silence segments. Then it is classified into six different classes such as music, speech over music, pure speech, and speech over environmental sound, environmental sound, and silence. Two classifiers HMM and SVM are used for classification of audio data. The HMM is preferred in MPEG-7 audio classification. The different combinations of extracted features are used to train the classification model. The performance of classification methods is analysed. The experimental results show that the average classification rate is 95%.
content independent of its format and coding. MPEG-7 aims to standardize a core set of quantitative measures of audio- visual features, called Descriptors (D), and structures of descriptors and their relationships, called Description Schemes (DS) in MPEG-7. MPEG-7 will also standardize a language—the Description Deﬁnition Language (DDL)—that speciﬁes Description Schemes to ensure ﬂexibility for wide adoption and a long life. The visual descriptors, which are speciﬁed in this standard, deﬁne the syntax and the semantic of each feature (meta-data element). These descriptors are classiﬁed according to the feature which is described, such as color, shape, texture, etc. MPEG-7 has five color descriptor named as Dominant Color,Scalable Color, Color Structure, Color Layout and Group of Frames/Group of pictures Color.There are two descriptors related to texture known as Homogenous Texture Descriptor and Non-Homogenous Texture Descriptor (Edge Histogram). The Visual Shape Descriptors consists of 3-D Shape Descriptor, Region-Based Descriptor, Contour-Based Shape Descriptor, 2-D/3-D Shape Descriptor. There are four Motion Descriptors named as Motion Activity Descriptors,Camera Motion Descriptors,Motion Trajectory Descriptors, Parametric Motion Descriptors.
Zheng Xiaojian, In this research work author address the security issues of the classified multimedia contents, a multilevel encryption scheme is presented which support multilevel encryption by introducing the time seed which used for generating time master key and then further generating encryption key. This scheme takes advantage of the properties of MPEG-7 standard to generate the multimedia hierarchy organized into tree structure involving many elements each of which belongs to a security clearance level, the sensitive element or security content can be encrypted with proper key in term of security level and then the entire multimedia encrypted with another appropriate key and the multilevel encryption achieved and introducing the secure user who must be the valid user and passed the identity authentication, secure server which is responsible for key management and identity authentication, secure object involving sensitive or privacy information of classified multimedia, and the comprehensive user identity authentication which considering the environment factors and the fingerprint of secure user, and the time seed which used for generating time master key and then further generating encryption key. This scheme takes advantage of the properties of MPEG-7 standard to generate the multimedia hierarchy organized into tree structure involving many elements each of which belongs to a security clearance level. Jayshri Nehete et.al, In this research author encrypts MPEG video stream is quite different from traditional textual data because inter frame dependencies exists in MPEG video. Author present a real- time MPEG video encryption algorithm based on AES which is fast enough to meet the real-time requirements. Author selectively encrypts a fraction of the whole video
Detailed performance evaluations against an alternative system called XISS have shown that our index system re- quires less storage size for the complete test data set. While about 10% seems to be a minor di ﬀ erence, we discussed some ideas that allow a further reduction of the storage size. At the moment, we are working on these compression mech- anisms which is estimated to increase the diﬀerence signif- icantly. Comparing the number of required comparisons to answer a query has shown that the presented significantly im- proves processing e ﬃ ciency. With our query set, we not only demonstrated the eﬃciency of our index system concerning searching and filter mechanisms but we also have shown dif- ferent application scenarios for the usage of MPEG-7-based description.
multimedia description, that have been originally composed for application in image retrieval including color, texture, shape, and motion descriptors. MPEG-7 feature descriptors have already been successfully applied in terms of human body posture estimation (Moghaddam and Piccardi, 2010) as well as visual surveillance (Annesley and Orwell, 2006). Within our evaluation, we use the features of the Homoge- neous Texture Descriptor (HTD) (Ro et al., 2001) which is one part of the MPEG-7 texture descriptors for classifica- tion. It provides a homogenous image characterization for similarity retrieval based on local spatial-frequency statistics. The descriptor yields 62 integer values on every sub-window. The first two features are the mean and the standard deviation of the image (cf. Wu et al., 2001). The remaining 60 are the energy and energy deviation of the Gabor filtered responses of the 30 so-called channels in the subdivision layout of the frequency domain. For these features, the Radon transform followed by an 1-D fourier transform is applied.
Users are able to store a profile of their preferences and both the metadata about the current programme and the search results will be filtered according to this profile before display. Some preferences will relate to the type of data to be dis- played, for example, a user could select that he was not inter- ested in place information or that he only wanted to see the two best match results from a search. Other options could be added about the interests of the user exploiting the Naviga- tion and Access component of MPEG-7. Such parameters are, for example, the number of results per page, whether they see a thumbnail or not, or the level of detail of the descrip- tion of the results . A possible development would be to monitor user searches and requests to automatically build a user profile to filter results. Note that viewers will have the option of editing a set of parameters in their personal profile concerning the display of the list of search results.
Swapnalini Pattanaik et al.in her work explained in detail about idea of how to retrieve images using different Mpeg-7 features. Two main steps of CBIR system areFeature extraction and Similarity matching are also mentioned .The main aim of Mpeg-7 is to provide a set of technologies to describe multimedia content. Here, Color Structure Descriptor for color and Edge Histogram Descriptor for texture are explained. Both features fused to enhancethe performance of CBIR systems.
Note that each number in the tables below corresponds to some range of bitrates (see Appendix 7. Figures Explanation for more details). Unfortunately, these ranges can differ significantly because of differences in the quality of compared encoders. This situation can lead to some inadequate results when three or more codecs are compared.
content information, MPEG-4 Advanced video coding (AVC), and File Format for audio and video and lastly the MPEG-21 usage environment for other necessary description to efficiently implement adaption and finally achieve the customization in the well-mannered way. In TSM scenarios, it is essential to customize more easily and efficiently for the desired content. The contents have available descriptions of the parts that fit to be matched/ bridged: the content and the usage environment. The major objective of this paper probes into the use and de- velopment of a TSM system that has capable of manag- ing and customizing content so that any information can be delivered to different terminals and networks.
The viseme FAP identifies 14 visemes, each correspond- ing to the facial expression produced when a specific phoneme is articulated. Thus each phoneme is mapped to a viseme (although some phonemes may map to the same viseme). Visemes (as well as expressions) are characterized as high-level FAPs, since their values can be expressed in terms of standard (low level) FAPs. The visemes for the Greek phonemes are similar to the visemes for the English phonemes. Thus, we have designed a lookup table that maps Greek visemes to the corresponding English ones, or per- forms a blending of two visemes in cases that this proves to be necessary. This is supported by MPEG-4, as a Viseme FAP can specify two out of the predefined list of 14 visemes, and a factor to blend between them.
On-Line animation uses automatically calculated animation rules for given model in order to interpret FAPs and calculate new coordinates of model vertices. Finally we implement special post processing of mouth area since MPEG-4 standard claims to use for head animation only Facial Animation Parameters that makes strong FAP’s coordination to be necessary especially for feature points located on natural head contour lines, such as inner and outer lips contours. We correct mouth area deformation during real-time animation to ensure mesh quality in mouth area and coordinate lip’s motion.
After the MPEG-2 data stream is encoded by the video encoder, the encoded stream formed by the six video layers of MPEG-2 is called a video elementary stream, and the basic video stream passes through the multiplexer to form a transport stream . The six video layers are video sequence layer, image group layer, graphics layer, slice layer, macro block layer, and tile layer. Each layer contains some information of the video frame, and the most important is the picture layer, which starts with the picture header. The picture header giving information such as time reference information, picture coding type and video buffer verifier . The graphics in MPEG-2 use three encoding methods:
We have presented techniques for complexity scalable MPEG encoding that gradually reduce the quality as a function of limited resources. The techniques involve modifications to the encoder modules in order to pursue scalable complexity and/or quality. Special attention has been paid to exploiting a scalable DCT and ME because they represent two compu- tational expensive corner stones of MPEG encoding. The in- troduced new techniques for the scalability of the two func- tions show considerable savings of computational complex- ity for video applications having low-quality requirements. In addition, a scalable block classification technique has been presented, which is designed to support the scalable process- ing of the DCT and ME. In the second step, performance evaluations have been carried out by constructing a com- plete MPEG encoding system in order to show the design space that is achieved with the scalability techniques. It has been shown that even a higher reduction in computational complexity of the system could be obtained if available data (e.g., which DCT coeﬃcients are computed during a scal- able DCT computation) is exploited to optimize other core functions.
As evident from their names, an I-frame is encoded completely as it is without any data loss. AnI-frame usually precedes each MPEG data stream. P-frames are constructed using the differences between the current frame and the immediately preceding I or P frame. B-frames are produced relative to the closest two I/P frames on either side of the current frame. The I, P, and B frames are further compressed when subjected to DCT, which helps to eliminate the existing interframe spatial redundancy as much as possible. A significant portion of the interframe encoding is spent in calculating motion vectors (MVs) from the computed differences. Each non encoded frame is divided into smaller.
Header is composed of 32 bits made up of a synch-word plus a description. This enables receivers to hook to the carrier, making it possible to broadcast a file. MPEG version is specified within the synch bit. When the protection bit is 1, CRC is used. A 4 bits bitrate informs the decoder what bitrate the frame is encoded. 1 bit private bit is used for application of triggers. 2 bits mode shows what channel is used The Copyright bit, if set it means that it is illegal to copy the contents. If home and 1 bit is set it shows that the frame is located in its original media. 2 bits emphasis bit informs the decoder that the file must be equalized again.