PROPOSED OBJECT EXTRACTION
Many attempts have been made to extract data from video and film in a form Many attempts have been made to extract data from video and film in a form suitable
suitable for use by animators and modelers. Such an approach is attractive, sincefor use by animators and modelers. Such an approach is attractive, since motions and movements for people and animals may be obtained in this way that motions and movements for people and animals may be obtained in this way that would be
would be difficdifficult ult usiusing ng mechmechanicaanical l or or magnmagnetic motion capture etic motion capture syssystemstems. . VisuVisualal extraction is also appealing since it is non-intrusive and has the potential to
extraction is also appealing since it is non-intrusive and has the potential to capture,capture, from film, the motion and characteristics of people or animals long dead or extinct.
from film, the motion and characteristics of people or animals long dead or extinct.
Almost all attempts to perform visual extraction have been based around Almost all attempts to perform visual extraction have been based around bespoke
bespoke computer vision applications which are difficult for non-experts to use or computer vision applications which are difficult for non-experts to use or adapt to
adapt to their own their own needneeds. This s. This pappaper er prespresents a ents a gengeneric approaceric approach h to extractito extracting datang data from
from videvideo. o. WhilWhilst st our approaour approach ch alloallows ws lowlow-lev-level el infinformormatioation n to to be be extrextracteacted d wewe sho
show w thathat t highigherher-le-level vel funfunctictionaonality lity is is avaavailailable ble alsalso. o. ThiThis s funfunctictionaonalitlity y can can bebe utilized in a manner that requires little knowledge of the underlying techniques and utilized in a manner that requires little knowledge of the underlying techniques and pri
principnciplesles. . Our Our appapproacroach h is is to to appapproxroximaimate te an an imaimage ge usiusing ng prinprincipcipal al comcomponponentent analysis, and
analysis, and then to train a then to train a multi-lamulti-layer perceptron to predict the feature required byyer perceptron to predict the feature required by the
the user. user. ThisThis requires the user to hand-label the features of interest in some of therequires the user to hand-label the features of interest in some of the frames of the image sequence. One of the aims of this work is to keep to a minimum frames of the image sequence. One of the aims of this work is to keep to a minimum the number of frames that need to need labeled by the user. The trained multi-layer the number of frames that need to need labeled by the user. The trained multi-layer perceptron is then used to predict features for images that have never been labeled by perceptron is then used to predict features for images that have never been labeled by
the user.
the user.
Other attempts to extract useful information from video sequences include the Other attempts to extract useful information from video sequences include the use of edge-detection and contour or edge tracking, template matching and template use of edge-detection and contour or edge tracking, template matching and template tracking. All such systems work well in some circumstances, but fail or require tracking. All such systems work well in some circumstances, but fail or require ad
adapaptatatiotion n to to memeet et ththe e reqrequiuiremrementents s of of nenew w ususersers. . FoFor r insinstantance, ce, in in the the cascase e of of template tracking, the user needs to be aware of the kinds of features that can be template tracking, the user needs to be aware of the kinds of features that can be tracked well
tracked well in an in an image and image and also choosalso choose a suitable template size. This is not a triviale a suitable template size. This is not a trivial task for non-specialists.
task for non-specialists.
6.1 Method 6.1 Method
The main steps in extraction using our system are detailed below:
The main steps in extraction using our system are detailed below:
The user selects the sequence (or set) of images for which they wish data to be The user selects the sequence (or set) of images for which they wish data to be extracted from. This may well comprise of several shorter clips taken from different extracted from. This may well comprise of several shorter clips taken from different parts of a film.
parts of a film.
These images have some pre-processing performed on them (principal components These images have some pre-processing performed on them (principal components analysis) to reduce each image to
analysis) to reduce each image to a small set of numbers.a small set of numbers.
The user decides what feature(s) they wish to extract and labels this feature by The user decides what feature(s) they wish to extract and labels this feature by hand in a fractio
hand in a fraction of the images chon of the images chosen at randosen at random. m. The labeThe labeling procling process mayess may involve clicking on a point to be tracked, labeling a distance or ratio of distances, involve clicking on a point to be tracked, labeling a distance or ratio of distances, measuring an angle, making a binary decision (yes/no, near/far etc.) or classifying the measuring an angle, making a binary decision (yes/no, near/far etc.) or classifying the feature of interest into one
feature of interest into one of several classes.of several classes.
Once this ground-truth data is available, a neural network is trained to predict the Once this ground-truth data is available, a neural network is trained to predict the feature values in images that have
feature values in images that have not been labeled by the user.not been labeled by the user.
6.2 Feature Extraction 6.2 Feature Extraction
Principal components analysis (also known as eigenvector analysis) has been used Principal components analysis (also known as eigenvector analysis) has been used extensively in computer vision for image reconstruction, pattern matching and extensively in computer vision for image reconstruction, pattern matching and classification.
classification.
Given the i
Given the iththimage in a sequence of images, each of which consists of Mimage in a sequence of images, each of which consists of M pixels,
pixels, we form the vector xwe form the vector x ii by concatenating the pixels of the image in raster scan by concatenating the pixels of the image in raster scan order and
order and removing the mean image of the sequence. The matrix X is createdremoving the mean image of the sequence. The matrix X is created using the x
using the xii's as 's as column vectors. Traditionally, the principal modes, qcolumn vectors. Traditionally, the principal modes, qii, are extracted, are extracted by computing
by computing
XXTqXXTq ii==iiqqii (1)(1)
Where
Where ii's are the Eigen values.'s are the Eigen values.
a measure of the amount of variance each of the eigen vectors accounts for.
a measure of the amount of variance each of the eigen vectors accounts for.
Unfortunately, the matrix XXT is typically too large to manipulate
Unfortunately, the matrix XXT is typically too large to manipulate since it is of sizesince it is of size M by M.
M by M. Such computation is wasteful anyway since only N princiSuch computation is wasteful anyway since only N princi pal modes are pal modes are meaningful, where N is the number of example images. In all our work
meaningful, where N is the number of example images. In all our work N M. N M.
Therefore we compute:
Therefore we compute:
XTXu
XTXu ==i ui uii(2)(2) and we can obtain the q
and we can obtain the qii's that we actually require using:'s that we actually require using:
qi
qi = X= Xuiui (3)(3) In practice only the first P modes are used, P
In practice only the first P modes are used, P3300 N. N.
The principal mode extracted from a short film clip is shown in Figure 1 and is The principal mode extracted from a short film clip is shown in Figure 1 and is us
used ed lalateter r to to hhelelp p an an ananiimamatotor r to to coconsnstrtrucuct t a a cacartrtooon on vverersision on oof f tthe he cclilip.p.
It is tempting to think that such modes could be used directly to predict, say, the It is tempting to think that such modes could be used directly to predict, say, the rotation of the man's shoulders.
rotation of the man's shoulders. However, the second mode also encoHowever, the second mode also encodes informationdes information about shoulder movement and it is only by combining information from many modes about shoulder movement and it is only by combining information from many modes that rotation can be reliably
that rotation can be reliably predicted.predicted.
REFERENCE;
REFERENCE;
[1] ' Special issue on third generation surveillance systems', froc. IEEE
[1] ' Special issue on third generation surveillance systems', froc. IEEE , , 2001,2001, 8989 , , JAIN
JAIN,, R., R., KASKASTURTURI, I, R., R., and and SCSCHUNHUNCK, CK, B.GB.G.. ThiThis s pappaper er givgives es the the detdetailkailkss about the surveillance systems
about the surveillance systems
[2]
[2] 'Ma'Machichine ne visvisionion' ' (McG(McGrawraw-Hi-Hill ll IncInc., ., 1991995) 5) PONPONS, S, J., J., PRAPRADESDES-NE-NEBOTBOT, , J.,J., AL
ALBIBIOLOL, , A,A,,, and MOLand MOLINAINA, J.his pap, J.his paper provier provides the detaides the details about ls about artiartificficienientt intelligence.
intelligence.
[3
[3] ] 'M'Mototioion n vividedeo o sesensnsor or in in ththe e cocompmpreressssed ed dodomamainin'. '. SCSCS S EuEuroromemedidia a CoConfnf.,., Va
Valelencnciaia, , SpSpaiain, n, 20200101, , ThThis is papapeper r prprovovidides es ththe e dedetatailils s ababouout t algalgororitithmhms s inin compressed domain.
compressed domain.
[4] Y. Song, A perceptual approach to human motion detection and labeling. PhD [4] Y. Song, A perceptual approach to human motion detection and labeling. PhD thesis, California Institute of
thesis, California Institute of Technology, 2003. This paper provides the details aboutTechnology, 2003. This paper provides the details about human motion detection
human motion detection
[5] N. Howe, M. Leventon, and W. Freeman, “Bayesian reconstruction of 3D human [5] N. Howe, M. Leventon, and W. Freeman, “Bayesian reconstruction of 3D human mot
motion ion frofrom m sinsinglegle-ca-camermera a vidvideo,eo,” ” TecTech. h. RepRep. . TR-TR-99-99-37, 37, MitsMitsubiubishi shi EleElectrctricic Research Lab, 1999 This paper provides the
Research Lab, 1999 This paper provides the details about 3d human detection.details about 3d human detection.
[6] L. Goncalves, E. D. Bernardo, E. Ursella, and P. Perona, “Monocular tracking of [6] L. Goncalves, E. D. Bernardo, E. Ursella, and P. Perona, “Monocular tracking of the human arm in 3D,” in Proc. 5th Int. Conf. Computer Vision, (Cambridge, Mass), the human arm in 3D,” in Proc. 5th Int. Conf. Computer Vision, (Cambridge, Mass), pp. 764– 770, 1995.This paper provides the details about 3d
pp. 764– 770, 1995.This paper provides the details about 3d human detection.human detection.
[7] S. Wachter and H.-H. Nagel, “Tracking persons in monocular image sequences,”
[7] S. Wachter and H.-H. Nagel, “Tracking persons in monocular image sequences,”
Computer Vision and Image Understanding, vol. 74, pp. 174–192, 1999.
Computer Vision and Image Understanding, vol. 74, pp. 174–192, 1999.
This paper provides details about motion detection in image sequences.
This paper provides details about motion detection in image sequences.