Face detection is the first stage of a face recognition system. A lot of research has been done in this area, most of which is efficient and effective for still images only & could not be applied to video sequences directly. In the video scenes, human faces can have unlimited orientations and positions, so its detection is of a variety of challenges to researchers . In recent years, multi-camera networks have become increasingly common for biometric and surveillance systems. Multi view face recognition has become an active research area in recent years. In this paper, an approach for video-based face recognition in camera networks is proposed. Traditional approaches estimate the pose of the face explicitly. A robust feature for multi-view recognition that is insensitive to pose variations is proposed in this paper. The proposed feature is developed using the spherical harmonic representation of the face, texture mapped onto a sphere. The texture map for the whole face is constructed by back- projecting the image intensity values from each of the views onto the surface of the spherical model. A particle filter is used to track the 3D location of the head using multi-view information. Videos provide an automatic and efficient way for feature extraction. In particular, self-occlusion of facial features, as the pose varies, raises fundamental challenges to designing robust face recognition algorithms. A promising approach to handle pose variations and its inherent challenges is the use of multi-view data. Face recognition in videos is an active topic in the field of image processing, computer vision and biometrics over many years. Compared with still face recognition videos contain more abundant information than a single image so video contain spatio-temporal information. To improve the accuracy of face recognition in videos to get more
Nowadays, in order to diagnose abnormal gait patterns in medical clinics, gait anal- ysis systems play a crucial role. Among state-of-the-art systems, motion capture (MO- CAP) is without a doubt the most popular one due to its very high precision. Each pa- tient is asked to wear several infrared (IR) reflective markers so that multiple IR cameras can observe the signal. Such systems cost thousands of dollards and require complex knowledge from the operator to make it function properly. In recent years, along with the development of computer hardware, video-based human motion capture and analy- sis has achieved huge advances. Along with the appearance of RGB-D cameras on the market, high-accuracy motion capture can be archived in real time based on this new technology. Most of these systems aim for recognizing human gesture only (based on the upper body), so their application in gait analysis, which mostly focuses on lower body, has not been fully studied and tested yet. In this thesis, we focus on developing a new low-cost video-based gait analysis system that can automatically reconstruct a 2.5D human skeleton over time and then analyze the gait to detect related pathological problems. Combining multiple vision and optimization techniques to create such a sys- tem that can outperform vision-based state-of-the-art system in gait analysis, that is the principal contribution of this thesis.
Image-based rendering is one of the hottest new areas in computer graphics. Instead of using 3D modeling and painting tools to construct graphics models by hand, IBR uses real-world imagery to rapidly create photorealistic shape and appearance models. However, IBR results to date have mostly been restricted to static objects and scenes. Video- based rendering brings the same kind of realism to computer animation, using video instead of still images as the source material. Examples of VBR include facial animation from sample videos, repetitive video textures that can be used to animate still scenes and photos, 3D environment walkthroughs built from panoramic video, and 3D video constructed from multiple synchronized cameras. In this paper, I survey a number of such systems developed by our group and by others, and suggest how this kind of approach has the potential to fundamentally transform the production (and consumption) of interactive visual media.
In this chapter, we are focused on videobased pointing method which could be utilized as a choice for information input. Given people’s experiences with currently available technology devices, our goal has been to develop a non-contact, comfortable, reliable, and inexpensive communication device that is easily adaptable to serve different applications. Motion, color, contour, and curvature are all combined to reliably extract the fingertips. Kalman filter (Mahmoudi & Parviz, 2006) is also adopted to speed up the tracking process. Section three gives an overview of the proposed system. Section four describes the detection of the moving forefinger. In Section five, the tip of forefinger is extracted and the corresponding positions in each frame are accumulated as the written strokes. Directional strokes, semi-circle strokes, and loop strokes are defined for 1-D character representation. Here, pen-up and pen-down are not easy to be differentiated with only one camera. In Section six, strokes connection rules are induced for pen-up strokes detection. All possible 1- D extracted strokes sequences are matched with the pre-built 1-D character models by the Dynamic Time Warp matching algorithm for character recognition. Experiments were conducted to verify the effectiveness of the proposed system in Section seven. Finally, conclusions and discussions are given in the last section.
Videobased face detection is used to detect face from a given input video. An image is passed, from these detected faces this test image is compared. If the match is found, the detected face is displayed from the dataset of the detected faces. PCA algorithm is used for face extraction and recognition. In this, Eigen values are calculated for each image for comparison, in both face detection optimization, where similar redundant faces were ought to be eliminated and for face recognition where the Eigen value of the test image was being compared with the Eigen value of images in the dataset. The Eigen face approach to face recognition was motivated by information theory, leading to the idea of basing face recognition on a small set of image features. The Eigen face approach provides a practical solution that is well fitted to the problem of face recognition. It is fast, relatively simple and has been shown to work well in a constrained environment. This method not only is helpful in face detection but also for face recognition.
Video modelling (VM) is an evidence-based intervention approach for teaching individuals with autism. In this study, the imitation ability of Mathew, a 15 year-old boy with autism, was assessed and then the use of VM to teach him to match coins, respond to questions in a group discussion time and prepare noodles was evaluated. Matthew seldom responded to imitative opportunities in the assessment and although he learnt to perform some of the steps in the noodle preparation task during baseline, no changes in his ability to perform the three target behaviours resulted from either the adult-as-model or sibling-as-model conditions. Thus, his limited imitation skills provide an explanation for why the VM procedure was not an effective intervention approach for Matthew. These findings regarding participant prerequisite skills and model type are discussed in terms of future practice and research.
The objective of this research is multi-fold: to explore the approach for video production in virtual studio with a degree of freedom; to combine various existing technologies targeting each specific feature into one with almost all of the features needed for virtual studio; and above all to create high quality video production using graphics and video. The approach requires breaking down multiple videos up to frame level, so that multiple components/objects can be handled like graphics for further object level processing within three dimensions (3-D). We also provide a framework for describing different features of the stored media object. Initially a workbench is developed; where a user is allowed to open any type of standard graphic format (jpg, gif, bmp, avi or mpg) and can provide annotations. If the object is time based, then a time line is also provided. Real images are considered as a single frame, however the user can select the part of image to construct another media object. Finally, all required objects are mixed together to construct a final video. We compare this approach with existing ones in terms of the degree of freedom, platform independence and show that it can be used effectively in the domain of entertainment and education etc. Keywords: Virtual Studio, Graphics, Video, Image, Camera Motion.
NetMeeting is an audio and videoconferencing tool that enables the communication between two persons, connected via Internet. The video communication of a larger number of users is only possible using a MCU (Multipoint Control Unity), which may be very expensive and, therefore, not viable in many situations. Other resources available are a text chat tool and a whiteboard that allows the real-time collaboration via graphic information. These resources enable more than two users participating in a session. More advanced resources include the application sharing during a conference and the remote desktop, which enables a computer to be operated from a remote location [Microsoft 2003].
We are able to alter the frame rate in the video to our specified frame rate and then detect the faces in the video by using our own datasets, further for comparison the detected facial expressions in the video to the store, further at last we are able to identify the face of the individual in the video. The face of our group members are able to get recognized by the tool. We are satisfied by the results and the system is working efficiently with both Facial Detection and Recognition properly which is shown in Fig 7 & 8.
The goal of ambulatory alertness prediction technologies is to predict oper- ator alertness/performance at different times based on interactions of sleep, circadian rhythm, and related temporal antecedents of fatigue. Note that these technologies are different from our work as they do not assess fitness online as the work is performed. This technology predicts alertness using devices that monitor sources of fatigue, such as how much sleep an operator has obtained (via wrist activity monitor, defined below), and combine this information with mathematical models that predict performance and fatigue over future periods of time. As an example to such a system US Army medical researchers have developed a mathematical model to predict human performance on the basis of prior sleep . They integrated this model into a wrist-activity monitor based sleep and performance predictor system called ”Sleep Watch”. The Sleep Watch system includes a wrist-worn piezo electric chip activity monitor and recorder which will store up records of the wearer’s activity and sleep obtained over several days. While this technology shows potential to predict fatigue in operators, more data and possible fine tuning of the models are needed before they can be fully accepted .
Given the pose and geometry of the body, the formation of an image of the per- son depends on several other factors. These include properties of the camera (e.g., the lens, aperture and shutter speed), the rest of the scene geometry (and perhaps other people), surface reflectance properties of clothing and background objects, the illumination of the scene, etc. In practice much of this information is unavail- able or tedious to acquire. The exception to this is the geometric calibration of the camera. Standard methods exist (e.g., Forsyth and Ponce ) which can estimate camera parameters based on images of calibration targets. 1 With fixed cameras this need only be done once. If the camera moves then certain camera parameters can be included in the state, and estimated during tracking. In either case, the camera parameters define a perspective projection, P (X), which maps a 3D point X ∈ R 3 to a point on the 2D image plane.
Video surveillance has gained a lot of importance today. But analyzing the whole video manually is a time-consuming process. In this paper, a system is proposed where user can search within the videobased on its content with minimum human intervention like in a query response mechanism. As an initial step the system is constrained in human activity recognition only. HMM is used to detect human activity within the video. The proposed system was trained to detect human activity and this was carried out for three different activities such as running, walking and hand clapping. The test accuracy and experimental results shows that HMM is well suited for Human activity recognition. Its future scope includes, building a system which consists of a mechanism to identify videos with obscene content on-line, full length feature film that have been illegally uploaded, crime occurrence, property loss by mistake etc. within a video. REFERENCES
Another way of coding stereo videos is by using asymmetric resolution coding techniques, where video quality of the additional views are reduced by scaling down the resolution spatially or temporally. Asymmetric video coding techniques benefit from human visual system’s tolerance to suppressed high frequency components and reduced resolution in one of the views. Coding efficiency for different scaling levels and resolution for the stereo views was studied by Hewage et al.  and Gürler et al. , the coding performance of their techniques was found to be close to the standard multi-view video coding technique, whereas it was able to deliver higher subjective qualities. The subjective study on coding performance of asymmetric and symmetric stereo video coding techniques conducted by Saygili and Gürler , based on H.264/MVC codec, showed that asymmetric coding out performs symmetric coding at high bitrates, with compression efficiency close to that of H.264/MVC codec. An adaptive spatial resolution down sampling technique was proposed by Aflaki et al. , wherein a frequency domain analysis is used to estimate the spatial resolution of both views of the stereo video streams. To achieve the best objective compression performance they have defined down sampling thresholds for a non-linear filter, after analysing high frequency components of the first frame of the video sequence. Majority of these techniques, propose modifications to the codec to encode available videos. Video frame resolution sampling, frame packing, and video sequence altering techniques proposed by few researchers assemble the video sequence to be encoded without modifications to the video codec. Setbacks in the current coding techniques necessitate a newer approach to encode stereo videos.
Now-a-days, with the rapid development of information technology, the use of digital information is increasing day by day. And it is challenging to protect this information from piracy and other type of attacks. Digital watermarking techniques have been developed to protect the copyrights of multimedia such as audio, video and images. The success of a digital watermarking technology depends on its robustness to deal with different attacks that are aimed at removing or destroying the watermark from its host data. Video Watermarking can be done either in Spatial domain or in Frequency domain. Spatial domain refers to image plane itself and is based on direct manipulation of the pixels in an image. It is easy to implement but it has low information handling capacity and it can be easily erased by lossy image compression. Frequency domain is based on modifying the fourier transform of an image. Some of the Frequency Domain techniques are DCT (Discrete Cosine Transform) and DWT (Discrete Wavelet Transform).
The proposed algorithm was implemented in C++11 with Open CV 3.0. We used VIRAT video dataset to train and test our detector models. These videos are captured by stationary HD cameras and each video has its own annotation file which carry information about bounding box of objects in each frame of the video. . The virat_s_050000_03 sequence is used as training video and VIRAT_S_040103_02 sequence as test video. These sequences have 1607 and 2392 frames respectively, each of size 1920 X 1080. All extracted objects from source and target video, are scaled to the size of 64x64. Features are extracted and computed using HOG, and thus, descriptor vectors are 1764- dimensional. As a detector model, we used Linear SVM with C=0.01.
Wearing a helmet can reduce the risk of severe injury by 72% and the risk of death by 39%, according to the World Health Organization. Using helmet also has some cons like the blind spots which are created by the sides of the helmet. Because of the blind spot a rider cannot completely view the traffic behind the bike with the help of mirrors. This paper discusses the computer vision approach to make a Smart helmet which will monitor the traffic behind the motorcycle using video processing. Hence giving a rider necessary and important information about rear traffic hence helping the rider to commutate safely.
The compression efficiency of the designed codec was compared with the standard MV-HEVC codec. To achieve this, views 5-4, 1-3 and 2-4 of “Poznan_Street”, “Kendo” and “Newspaper1” standard multi-view video sequences were chosen respectively and coded using the proposed MV-HEVC based stereo video coding scheme. These video sequences cover both the indoor and outdoor scenes with static and dynamic backgrounds at different levels of illuminations. The coding performance of the proposed codec was then compared with the anchor MV-HEVC codec, as presented in JCT3V-G1100 document . Tables 2a-c show the resulting PSNR and consumed bitrate for coding “Poznan_Street”, “Kendo” and “Newspaper1” stereo videos at QP 25, 30, 35 and 40. The proposed codec outperforms the MV-HEVC codec in terms of the PSNR of the decoded frames (up to 1dBs) while it significantly reduces the bandwidth requirements. From Table 2a, it can be seen that the proposed codec exhibits almost the same coding performance to that of MV-HEVC in terms of PSNR at QP of 35 when coding Poznan_Street sequences. However, the proposed codec requires about 8% lower bandwidth to transmit the videos, which implies improvement in coding in comparison to that of MV-HEVC. The proposed stereo video codec in general provides an average bitrate savings of 25% relative to the reference standard MV-HEVC codec.
weighted K-L divergence to decompose the computation of distance between scenes into the summarization of the distance among shots. Our clustering algorithms accept audio file and video scene boundary as input. It checks the distance between each single shot scene and its neighbor scene. If the distance is less than pre-defined threshold, mark this scene and its neighbor as the same scene. Repeat this procedure until the entire single shot scenes are checked. Effective shot clustering techniques grouping related shots into scenes are important for content-based browsing and retrieval of video data. However, it is difficult to develop a good shot clustering technique if we use visual information alone. Using audio information attract relatively few attention. We present a novel audio assisted shot clustering technique based on the strict scene definition and divergency measurement. The crux of our approach is the careful analysis of feature space and distribution and the divergency concept used to measure the distance between shots.
Table III shows the PSNR values for retrieved color video with different scaling. The computation error is the cause of the poor quality in the retrieved video reflected in the PSNR values in the range of 9 to 10 dB after retrieving the color video from the cover color video. In order to enhance the quality of the retrieved video the image elements are scaled up in each. Results are shown on Table III indicating significant improvement when integer 3 up scaling is applied. However, if the integer multiplier is greater than a threshold integer, then the quality degrades. It could be concluded that the decryption process is feasible and quality of the retrieved video is indicated by the acceptable PSNR figures.
technique consists of two phases, extraction and connection as shown ﬁgure 1. • Extraction Phase: The data are extracted based on video frame from the unallocated space, as extracted from the storage medium for restoration. The start code signature of video frame is searched for without consideringthe ﬁle system and the ﬁle composition. The frames are extracted based on the start code signature, the extracted frame data are veriﬁed through the decoder, and it is determined if the data are frames. • Connection Phase: The codec and ﬁle speciﬁcations are used to connect the frames veriﬁed in previous phases. Based on the extracted frame sets, the length information of each frame recorded in the ﬁles is used to connect frame sets that are restored into a connected picture. Figure 1 shows an overall process of the proposed ﬁle restoration technique. In extraction phase, we extract frame data, F1, F2, F4, F5, and F6, which have a start code signature of frame from the unallocated space, the region of a video ﬁle to recover, containing the deleted video ﬁles and verify if the decoded frame is a normal frame data. Veriﬁed frames form a frame set, which will be connected as far as it can go in the stage of connecting frame set. When the video ﬁle is fragmented, we restore a video ﬁle by connecting fragmented pieces of data. In case of a partially overwritten ﬁle, not overwritten parts are connected to create a connected video. In this manner, the proposed method ﬁnds meaningful data in the video ﬁle using the codec and convert into ﬁle structure after connecting them.