PROPOSED OBJECT EXTRACTIONPROPOSED OBJECT EXTRACTION

PROPOSED OBJECT EXTRACTION

Many attempts have been made to extract data from video and film in a form Many attempts have been made to extract data from video and film in a form suitable

suitable for use by animators and modelers. Such an approach is attractive, sincefor use by animators and modelers. Such an approach is attractive, since motions and movements for people and animals may be obtained in this way that motions and movements for people and animals may be obtained in this way that would be

would be difficdifficult ult usiusing ng mechmechanicaanical l or or magnmagnetic motion capture etic motion capture syssystemstems. . VisuVisualal extraction is also appealing since it is non-intrusive and has the potential to

extraction is also appealing since it is non-intrusive and has the potential to capture,capture, from film, the motion and characteristics of people or animals long dead or extinct.

from film, the motion and characteristics of people or animals long dead or extinct.

Almost all attempts to perform visual extraction have been based around Almost all attempts to perform visual extraction have been based around bespoke

bespoke computer vision applications which are difficult for non-experts to use or computer vision applications which are difficult for non-experts to use or adapt to

adapt to their own their own needneeds. This s. This pappaper er prespresents a ents a gengeneric approaceric approach h to extractito extracting datang data from

from videvideo. o. WhilWhilst st our approaour approach ch alloallows ws lowlow-lev-level el infinformormatioation n to to be be extrextracteacted d wewe sho

show w thathat t highigherher-le-level vel funfunctictionaonality lity is is avaavailailable ble alsalso. o. ThiThis s funfunctictionaonalitlity y can can bebe utilized in a manner that requires little knowledge of the underlying techniques and utilized in a manner that requires little knowledge of the underlying techniques and pri

principnciplesles. . Our Our appapproacroach h is is to to appapproxroximaimate te an an imaimage ge usiusing ng prinprincipcipal al comcomponponentent analysis, and

analysis, and then to train a then to train a multi-lamulti-layer perceptron to predict the feature required byyer perceptron to predict the feature required by the

the user. user. ThisThis requires the user to hand-label the features of interest in some of therequires the user to hand-label the features of interest in some of the frames of the image sequence. One of the aims of this work is to keep to a minimum frames of the image sequence. One of the aims of this work is to keep to a minimum the number of frames that need to need labeled by the user. The trained multi-layer the number of frames that need to need labeled by the user. The trained multi-layer perceptron is then used to predict features for images that have never been labeled by perceptron is then used to predict features for images that have never been labeled by

the user.

Other attempts to extract useful information from video sequences include the Other attempts to extract useful information from video sequences include the use of edge-detection and contour or edge tracking, template matching and template use of edge-detection and contour or edge tracking, template matching and template tracking. All such systems work well in some circumstances, but fail or require tracking. All such systems work well in some circumstances, but fail or require ad

adapaptatatiotion n to to memeet et ththe e reqrequiuiremrementents s of of nenew w ususersers. . FoFor r insinstantance, ce, in in the the cascase e of of template tracking, the user needs to be aware of the kinds of features that can be template tracking, the user needs to be aware of the kinds of features that can be tracked well

tracked well in an in an image and image and also choosalso choose a suitable template size. This is not a triviale a suitable template size. This is not a trivial task for non-specialists.

task for non-specialists.

6.1 Method 6.1 Method

The main steps in extraction using our system are detailed below:

The user selects the sequence (or set) of images for which they wish data to be The user selects the sequence (or set) of images for which they wish data to be extracted from. This may well comprise of several shorter clips taken from different extracted from. This may well comprise of several shorter clips taken from different parts of a film.

parts of a film.

These images have some pre-processing performed on them (principal components These images have some pre-processing performed on them (principal components analysis) to reduce each image to

analysis) to reduce each image to a small set of numbers.a small set of numbers.

The user decides what feature(s) they wish to extract and labels this feature by The user decides what feature(s) they wish to extract and labels this feature by hand in a fractio

hand in a fraction of the images chon of the images chosen at randosen at random. m. The labeThe labeling procling process mayess may involve clicking on a point to be tracked, labeling a distance or ratio of distances, involve clicking on a point to be tracked, labeling a distance or ratio of distances, measuring an angle, making a binary decision (yes/no, near/far etc.) or classifying the measuring an angle, making a binary decision (yes/no, near/far etc.) or classifying the feature of interest into one

feature of interest into one of several classes.of several classes.

Once this ground-truth data is available, a neural network is trained to predict the Once this ground-truth data is available, a neural network is trained to predict the feature values in images that have

feature values in images that have not been labeled by the user.not been labeled by the user.

6.2 Feature Extraction 6.2 Feature Extraction

Principal components analysis (also known as eigenvector analysis) has been used Principal components analysis (also known as eigenvector analysis) has been used extensively in computer vision for image reconstruction, pattern matching and extensively in computer vision for image reconstruction, pattern matching and classification.

classification.

Given the i

Given the i^th^thimage in a sequence of images, each of which consists of Mimage in a sequence of images, each of which consists of M pixels,

pixels, we form the vector xwe form the vector x ii by concatenating the pixels of the image in raster scan by concatenating the pixels of the image in raster scan order and

order and removing the mean image of the sequence. The matrix X is createdremoving the mean image of the sequence. The matrix X is created using the x

using the x_{ii's as}'s as column vectors. Traditionally, the principal modes, qcolumn vectors. Traditionally, the principal modes, qii, are extracted, are extracted by computing

by computing

XXTqXXTq ii==iiqqii (1)(1)

Where

Where ii's are the Eigen values.'s are the Eigen values.

a measure of the amount of variance each of the eigen vectors accounts for.

Unfortunately, the matrix XXT is typically too large to manipulate

Unfortunately, the matrix XXT is typically too large to manipulate since it is of sizesince it is of size M by M.

M by M. Such computation is wasteful anyway since only N princiSuch computation is wasteful anyway since only N princi pal modes are pal modes are meaningful, where N is the number of example images. In all our work

meaningful, where N is the number of example images. In all our work N M. N M.

Therefore we compute:

XTXu

XTXu ==i ui uii(2)(2) and we can obtain the q

and we can obtain the qii's that we actually require using:'s that we actually require using:

qi = X= Xuiui (3)(3) In practice only the first P modes are used, P

In practice only the first P modes are used, P3300 N. N.

The principal mode extracted from a short film clip is shown in Figure 1 and is The principal mode extracted from a short film clip is shown in Figure 1 and is us

used ed lalateter r to to hhelelp p an an ananiimamatotor r to to coconsnstrtrucuct t a a cacartrtooon on vverersision on oof f tthe he cclilip.p.

It is tempting to think that such modes could be used directly to predict, say, the It is tempting to think that such modes could be used directly to predict, say, the rotation of the man's shoulders.

rotation of the man's shoulders. However, the second mode also encoHowever, the second mode also encodes informationdes information about shoulder movement and it is only by combining information from many modes about shoulder movement and it is only by combining information from many modes that rotation can be reliably

that rotation can be reliably predicted.predicted.

REFERENCE;

[1] ' Special issue on third generation surveillance systems', froc. IEEE

[1] ' Special issue on third generation surveillance systems', froc. IEEE , , 2001,2001, 8989 , , JAIN

JAIN,, R., R., KASKASTURTURI, I, R., R., and and SCSCHUNHUNCK, CK, B.GB.G.. ThiThis s pappaper er givgives es the the detdetailkailkss about the surveillance systems

about the surveillance systems

[2]

[2] 'Ma'Machichine ne visvisionion' ' (McG(McGrawraw-Hi-Hill ll IncInc., ., 1991995) 5) PONPONS, S, J., J., PRAPRADESDES-NE-NEBOTBOT, , J.,J., AL

ALBIBIOLOL, , A,A,,, and MOLand MOLINAINA, J.his pap, J.his paper provier provides the detaides the details about ls about artiartificficienientt intelligence.

intelligence.

[3] ] 'M'Mototioion n vividedeo o sesensnsor or in in ththe e cocompmpreressssed ed dodomamainin'. '. SCSCS S EuEuroromemedidia a CoConfnf.,., Va

Valelencnciaia, , SpSpaiain, n, 20200101, , ThThis is papapeper r prprovovidides es ththe e dedetatailils s ababouout t algalgororitithmhms s inin compressed domain.

compressed domain.

[4] Y. Song, A perceptual approach to human motion detection and labeling. PhD [4] Y. Song, A perceptual approach to human motion detection and labeling. PhD thesis, California Institute of

thesis, California Institute of Technology, 2003. This paper provides the details aboutTechnology, 2003. This paper provides the details about human motion detection

human motion detection

[5] N. Howe, M. Leventon, and W. Freeman, “Bayesian reconstruction of 3D human [5] N. Howe, M. Leventon, and W. Freeman, “Bayesian reconstruction of 3D human mot

motion ion frofrom m sinsinglegle-ca-camermera a vidvideo,eo,” ” TecTech. h. RepRep. . TR-TR-99-99-37, 37, MitsMitsubiubishi shi EleElectrctricic Research Lab, 1999 This paper provides the

Research Lab, 1999 This paper provides the details about 3d human detection.details about 3d human detection.

[6] L. Goncalves, E. D. Bernardo, E. Ursella, and P. Perona, “Monocular tracking of [6] L. Goncalves, E. D. Bernardo, E. Ursella, and P. Perona, “Monocular tracking of the human arm in 3D,” in Proc. 5th Int. Conf. Computer Vision, (Cambridge, Mass), the human arm in 3D,” in Proc. 5th Int. Conf. Computer Vision, (Cambridge, Mass), pp. 764– 770, 1995.This paper provides the details about 3d

pp. 764– 770, 1995.This paper provides the details about 3d human detection.human detection.

[7] S. Wachter and H.-H. Nagel, “Tracking persons in monocular image sequences,”

Computer Vision and Image Understanding, vol. 74, pp. 174–192, 1999.

This paper provides details about motion detection in image sequences.

In document 3D motion detection using neural networks (Page 39-44)