Modelling Optic Flow Analysis

At (t2-tl)

1.2.3 Modelling Optic Flow Analysis

W hile the foregoing algorithm s provide an account o f local m otion detection, they do not serve as general m odels; they are restricted to one direction o f m otion, and are confounded by the aperture problem . T he aperture problem refers to the inability o f a cell restricted in the spatial extent o f its analysis to signal m ore than one direction o f m otion. A stim ulus extending beyond this c e ll’s receptive field (rf) m ay m ove obliquely through the rf, so speed and direction are confounded in the cell’s output. To account for the perception o f m otion in tw o dim ensions m any have proposed that local m otion signals are ‘pooled’. C om bining the outputs o f local m ovem ent detectors gives the ability to register the type o f com plicated w ide-field patterns encountered when m oving through a scene com prising static as well as dynam ic objects. The optic flow that such relative m otion generates has been the subject o f m uch theoretical and practical m odelling. This interest reflects the im portant role that may be played

by optic flow in visual navigation (R ieger & Toet, 1985; W arren & H annon, 1990), detecting object boundaries (H ildreth, 1992), coding the m ovem ent o f independent objects in the field (Zem el & Sejnow ski, 1998), and generating opto-kinetic eye-m ovem ents and sm ooth pursuit (D ursteler and W urtz, 1988; K om atsu and W urtz, 1988; E rickson and D ow , 1989; K aw ano et. al., 1994). M odels o f optic flow processing are m ost often identified with extra-striate area m edial superior tem poral area (M ST) in prim ates, w here large receptive field sizes and sensitivity to com plex, optic flow -like stim uli are found (e.g. Saito et. al., 1986; Tanaka et. al., 1986; Lagae et. al., 1994).

In a recent review, Perrone (2001) suggested a distinction betw een biologically- inspired ‘tem plate m o d els’ o f optic flow pro cessin g and the m ore ab stract m athem atical ‘vector m odels’. The tem plate class o f m odels are so nam ed for their m otion m atching architecture. In order to register a com plex pattern o f m otion a hypothetical detector receives signals from m ultiple local velocity detectors, specifically organised in their spatial distribution to m atch a typical or com m only occurring pattern. D ifferent patterns dem and different organisational structures, but each is a sim ple prototype aw aiting an appropriate trigger. Such a technique has the advantage o f being able to side-step com plex tasks (Glunder, 1990). For instance, com puting speed could be achieved by estim ating distance travelled and tim e taken, follow ed by the appropriate division: distance/tim e. H ow ever, a tem plate m odel responds w hen its spatial and tem poral param eters are m et, w ithout in term ed iate rep resen tatio n s o f distan ce and tim e b eing computed.

V ector m odels m ake use o f the observation that a flow field can be regarded as a set o f b asic flow co m p o n en ts (K o e n d e rin k & van D o o m , 1975). T his m athem atical fact has led to m any studies investigating how optic flow patterns could be decom posed into elem entary constituents for use in the types o f tasks that m obile organism s need to perform . How ever, the attem pt to retain biological plausibility has been reflected in the approach to one testing problem . In an organism with the ability to m ove the eyes and head, as well as the body, retinal signals confound all three o f these sources o f m otion. V ector based approaches

generally seek to use ju s t the retinal m otion inform ation to unconfound these signals (R ieger & L aw ton, 1985; K oenderink & van D oom , 1981; H eeger & Jepson, 1992), proving that such signals are sufficiently inform ative. H ow ever, psychophysical evidence (e.g. Royden, C row ell & B anks, 1994) suggests that extra-retinal signals may be used in determ ining heading direction accurately by accounting for self-generated sources o f retinal m otion such as eye rotation, and tem plate m odellers have recently begun to incorporate extra-retinal signals to assist in segregating the various signals (Beintem a, 1998).

Physically instantiated m odels can be trained to m ake use o f sim ulated optic flow in perform ing a task (H atsopolous & W arren, 1991; Lappe et. al., 1996). For exam ple Lappe et. al., (1996) trained a neural netw ork to perform heading identification when presented with a variety o f realistic optic flow patterns. O nce trained, such m odels can be exam ined to see w hat kind o f representations have been encoded in learning to m ake use o f the inform ation available. M any authors claim that synthetic neurone receptive field pro p erties m im ic those o f real neurones found in e lectro -p h y sio lo g ical in v estig a tio n s o f anim al m odels. E xam ples include the relativ e frequency o f units selectiv e for expansion, c o n tractio n , ro ta tio n and tra n sla tio n (m odel; Z em el & Sejnow ski, 1998; physiology: G raziano et. al., 1994); the c h an g in g resp o n se o f individual neurones, depending on the position o f the focus o f expansion (FOE), (m odel: L appe et. al., 1996; physiology: D uffy & W urtz, 1995); the occurrence o f position invariant responses in neurones (insensitivity to the position o f the FO E) (model: Zem el & Sejnow ski, 1998, physiology: G raziano et. al., 1994); and lack o f im m unity to superpositions o f non-preferred and preferred stim uli (model: Perrone & Stone, 1998; physiology: O rban et. al., 1992). This last point bears on a c o n cep t com m on to b oth vector and tem p late m o d els-th at o f decom position. It is often presum ed that an efficien t way to deal w ith the com plexity o f m ultiple m otion vectors is to decom pose them into orthogonal constituents, analogous to the w ay th at the statistical tech n iq u e P rincipal C om ponents A nalysis does to com plex data sets. In tem plate m odels this decom position is often referred to as a basis set, or canonical representation, and

th e tem p la te s resp o n sib le for ac h ie v in g it are u sually rad ia l, ro ta tio n , deform ation and translation detectors. These detectors have been linked to the theoretical com ponents identified by K oenderink & van D oom (1975), nam ely,

div: the rate o f expansion, curl: the rate o f rotation. A further category, def: the rate o f deform ation, has received a little physiological support from Orban et. al., (1992), and som e behavioural support from M eese and H arris (2001a, 2001b). N eurones whose response selectivities are im m une to contam ination by m ultiple com ponents (e.g. rotation and radiation com bined) w ould be consistent with decom position to a basis set. How ever, physiological studies have recorded m any departures from this i d e a l ’ representation. R eceptive field selectivities that form a continuum o f responses from radial through spiral to rotation patterns have been found (D uffy & W urtz, 1991), as have r f ’s responsive to sim ultaneously presented com binations o f canonical com ponents (G raziano et. al., 1994). There is also som e behavioural evidence against the decom position hypothesis. D uffy & W urtz (1993) and G rigo & L appe (1998) found that com bining radial expansion w ith translation leads to a m isperception o f the centre o f expansion, im plying lack o f the ability to separate the com ponents effectively. This effect was m odelled by Lappe & Duffy (1999), show ing that a population o f neurones was capable o f em ulating the behavioural evidence w ithout resorting to basis set selectivities. The em erging view is that M ST (and other extra-striate areas) form a com plex set o f representations o f patterns o f optic flow, depending on their utility and their frequency o f occurrence (Lappe et. al., 1996; Irvins et. al., 1999).

In document Complex motion processing (Page 33-36)