11

(1)

Abstract

The value of neural network analysis as a tool in transportation planning is tested on two transportation issues, one behavioural and the other physical. The behavioural case involved travel behaviour forecasting, comparing travel demand patterns of men and women in Israel, to determine the connection between them and a variety of key socio-economic and demographic variables. The physical case addressed a problem of trac control at a major intersection in Ramat Aviv, Israel. Both involve the kind of complex, highly dimensional, large scale data to which the methodology has been applied successfully in the physical sciences and in cognitive modelling, but is rela-tively new in the social sciences. Neural network methodology is de®ned, a variety of application categories presented to explain the rationale for the ones chosen, and both experiments detailed. The neural network results were more impressive for the physical application than for the behavioural study because of de®ciencies in the data for the latter. Recommendations are made for the kind of information that

would enrich the data and permit neural networks to be utilized to full advantage.#

(2)

Memorial Fellowship and the Technion's Graduate School, and carried out under the supervision of Daniel Shefer and Ilan Salamon. I am grateful to both of them for their guidance, comments and advice. Finally, I thank Derek Diamond for his encouragement and patience.

(3)

handling these complex and highly dimensional data. Neural network methodology provides a tool through which to model intuitive judgments without the complication of having to formalize all of the complex causal variables and relationships which other models require.

Neural network analysis has gained momentum and acceptance over the past few years, particularly in such areas as image compression and recognition, cognitive modelling, expert systems, natural language and handwriting recognition, as well as physical aspects of trac engineering, and is being continually re®ned. In the geo-graphical sciences (Hewitson and Crane, 1994), examples of the use of neural net-works for census analysis, predicting the spread of AIDS, describing synoptic controls on mountain snowfall, examining the relationships between atmospheric cir-culation and tropical rainfall, and the remote sensing of polar cloud and sea ice characteristics have been demonstrated to perform equally or better than more con-ventional methods such as multiple-regression analysis, cluster analysis and maxi-mum-likelihood classi®cation.

We have applied neural network methodologies to both behavioural factors and physical trac planning issues in order to explore their value to both areas of plan-ning analysis, to assess their potential as a transportation planplan-ning instrument and as a tool with application to the broad spectrum of spatio-temporal relationships in the world around us.

1.1. D E F I N I T I O N O F N E U R A L N E T W O R K M E T H O D O L O G Y

Neural networks stem from the desire of researchers to program computers to mimic the brain's abilities, albeit to a small degree. The premise is that, if a computer is to function like a person, it must be developed like a brain, which distributes infor-mation across a vast, interconnected web of nerve cells, or neurons.

Neural networks are a dierent computing model, and solving problems with neural networks is quite similar to the way people naturally solve problems. A neural network learns to solve problems by being given data, examples of the problem, and its solution. This is the way we learn all the time! A neural network is composed of units which correspond to neurons, and weights which correspond to the strength of

(4)

the connection (synapse) between neurons. It is a computational device that has a built-in learning capability.

The usual de®nition of a function is as a mapping device that assigns to each input exactly one output. Suppose we are given a set of inputs and corresponding outputs and then asked to provide the appropriate function. In many cases, we do not know the exact function, but we do know the function's general format.

A neural network is such a mapping device, which assigns to each input exactly one output (both input and output may be vectors of values). The function is deter-mined by the network's weights which are set while training the network. A neural network is trained, roughly, as follows. The network is shown a set of examples, each consisting of inputs and outputs. It ``learns'' the connections among them by assigning weights to connections. This is done by continuously changing weights to get closer to the desired outputs. In other words, weights are produced so as to de-®ne a function that approximates the actual function, which is implicitly given by the noisy set of examples (noise is a certain random disturbance that is added to the true value of the data and distorts it). The underlying assumption is that the examples on which the network learns are representative of the behaviour of the function in all regions of interest. Various aspects of the methodology will be described in detail in Chapters 3 and 4.

1.2. I M P E R A T I V E S O F T R A N S P O R T A T I O N P L A N N I N G

Transportation planning is a hybrid which must balance such physical components as road systems, environment, open space, and the location of industry, commerce and residences, with the behaviour of present and potential users.

Large scale mathematical modelling in transportation planning began approxi-mately 40 years ago, has undergone considerable change since its inception and con-tinues to evolve. The transportation planning process predominant during the 1950s and 1960s closely resembles Bolan's ``classical'' (rational) planning model (1967). Transportation master plans were developed and used as frameworks within which speci®c decisions were made in a rational manner. The emphasis was on long-range forecasts of the performance of region-wide systems; the transportation plan served (and serves) as the backbone of all urban master plans. Alternatively, transportation and urban master plans serve joint, interactive functions in the overall planning pro-cess.

(5)

Transportation planning had previously been considered a technical process, using quantitative toolsÐstill oriented almost exclusively towards long-term, capital-inten-sive expansions of the transportation system, such as highways (Pas, 1986), to arrive at objective solutions to technical problems. In the late 1960s and 1970s, the process ``opened'' itself to citizen participation, with the transportation planners involving the community, and their analyses including issues of immediate consequence (needs of the elderly, handicapped, poor, the environment). The methodological tools had to be able to deal with a much wider range of options related to ``policy'' issuesÐ not just physical facility options, and to produce both short and long range forecasts. Additionally, the requirement of assessing the transportation impacts on speci®c population groups had to be met.

Thus, forecasts, in the transportation planning context, are produced for two main purposes. On the one hand, there is a need to assess the total level of demand, at a network or link level. On the other hand, there is a need to assess the demand at a more detailed, disaggregated scale, in order to evaluate particular policies or plans. The ®eld of transportation forecasting has evolved in two streams: large scale aggre-gate models that essentially serve the former need, and disaggreaggre-gate, behavioural models in response to the latter need.

Aggregate models were developed in which the response variable describes the average or total behaviour of households or people. The basic unit of analysis is zones, not individuals. Zonal data come already aggregated on the basis of one vari-able onlyÐspatial proximity. The models are not based on behavioural relationships and usually fail to predict correctly the impact of demographic and societal changes on travel behaviour. These models sometimes provide an accurate prediction of what travel patterns will be like (for example, after a highway is built), but cannot tell us why the change occurs. Therefore, aggregate models may be evaluated on their abil-ity to give empirically accurate forecasts, but they do not explain what happens. They do, however, help to understand the complexities of travel ¯ows. Categories such as trip purposes, the temporal distribution of travel ¯ows, the selection of travel mode, trip lengths and spatial ¯ows (Barber, 1986) are traditionally used in models to predict travel behaviour at the aggregate level, where the level of aggregation used depends very much on the purpose of the model being built and the data collected. Accordingly, since the 1970s there has been a reversal of priorities in criteria for assessing models from their forecasting capability to their explanatory power (Hanson and Burnett, 1981).

Disaggregate choice models take individuals or households, rather than zones, as the unit of analysis. Disaggregate travel ¯ows are made up of the daily travel activity patterns of individuals (or households), which are continuous in time and space (Hanson and Schwab, 1986). The ®nal statements of these models are also about groups, rather than individuals, yet disaggregate models permit a great deal of ¯exi-bility in methods of aggregation.

(6)

Many disaggregate models are conceptually based on utility maximizationÐas are consumer choice models. Yet, in the literature the compatibility between the two is often uncertain, leaving open questions regarding the theoretical basis of these models. Even if we accept the soundness of rationality as a continuing paradigm of human behaviour, as argued by Daly (1981, p. 61), disaggregate models still have the following disadvantages for applied research:

. they are complicated when they try to specify all the causal variables and

relation-ships that might explain observed behaviour;

. empirical application requires detailed individual level data (Sheppard, 1986);

. There are inaccuracies in the relationship of the models to the actual choice of

tra-vellersÐoften the model outcome diers from reality. De®nitions of independent mode, destination, and other choices ignore the interdependence of the choices themselves and that of the behaviour of dierent members of the same family (Daly, 1981).

. There remain problems in the estimation of model parameters. The objectives of

theory and application of models are not always the same. Models are judged by their predictive success; theorists lay more emphasis on simplicity and generality. Practical models often contain many ®tted parameters that have no clear beha-vioural interpretation (Pipkin, 1986).

Ideally, one would prefer to use a consistent set of models which would respond to both forecasting needsÐassessing the total level of demand, at a network or link level and the need to assess the demand at a more detailed, disaggregated scale, in order to evaluate particular policies or plans. However, present capabilities still fall short of oering a uni®ed model. What we are left with is the challenge of developing new ideas which attempt to bridge the relative simplicity of aggregate approaches and the theoretical advantage, but complexity of disaggregate model approaches.

1.3. R A T I O N A L E F O R T H E A P P L I C A T I O N O F N E U R A L N E T W O R K

M E T H O D O L O G Y T O T R A N S P O R T A T I O N P L A N N I N G , B O T H B E H A V I O U R A L A N D P H Y S I C A L

Lately, there has been much interest in new techniques for analyzing data. Neural networks are not a panacea for all research problems. However, the approach oers a new strategy with enormous potential for many tasks in the planning, geographical and spatial sciences. As a social-science tool, neural network analysis is still in its infancy. Here we explore the application of neural networks to both travel behaviour forecasting and to physical trac control.

How then do neural networks address the shortcomings of disaggregate models as listed above?

. In a neural network, the relationships between variables are discovered

automati-cally and the ®tting takes place naturally. The overall network structure is the only place that our intuition comes into play.

(7)

. Neural networks do not have the problem of inaccuracies in the relationship of the models to the actual choice of travellers. The network operates on the data directly without the medium of an additional model.

. Like disaggregate models, neural networks suer from explanatory problems in

that there is a diculty in interpreting the weights at this time. On the other hand, the ®tting of parameters in the neural network is mathematically well founded (gradient descent in error space).

Neural networks are highly relevant to problems requiring large scale, highly dimensional, data analysis, such as travel related planning. Given the shortcomings of existing methodologies, the exploration of an additional one to augment the more conventional tools seems important. Neural networks provide a data analysis tool through which we can model our intuition, without the complications of having to formalize all the complex causal variables and relationships which other models require.

(8)

Transportation planning issues are often very complex, involving large quantities of highly dimensional data. On a general level, issues can be ®tted into two cat-egories: behavioural transportation and physical transportation challenges. Prediction of travel behaviour is a necessary preliminary to physical transportation planning. The needs of the two categories inherently dier.

Behavioural models try to deduce observed phenomenon at a dissaggregated or in-dividual level. The population attributes chosen are usually those which are easily accessible by means of simple research instruments such as questionnaires, and which serve as acceptable proxies to the population characteristics assumed to be correlates of travel behaviour. Thus, it is often necessary to obtain very large data sets, capable of capturing the dierences among the population segments. Even so, the chances of all the relevant information being available in the currently accessible transportation databases are questionable. Indeed, a prime research question which we pose is whether or not the relevant information can be found in the available databases.

This ®rst application, undertaken in the early 1990s, examines the key socio-econ-omic and demographic (SED) variables believed by transportation researchers to have a bearing on speci®c travel related behaviour, tests these hypotheses, and explores travel pattern dierences between men and women in Israel. Analysis of tra-vel behaviour issues is usually required for large population groups which are charac-terized by a wide array of attributes. However, it is hypothesized that travel behaviour is, to a large extent, determined by constructs, such as life style, roles and life cycle factors that are more complex than simple SED variables found in trans-portation surveys (Salomon and Ben-Akiva, 1983; Townsend, 1987). We use the hid-den layer of the neural networks to model our intuition and to capture these more complicated constructs. Structures oered by neural networks allow us to de®ne coarse proxies for the composite determinants which we wish to test. One important result of this inquiry is that the functions whose existence we assumed, probably do not exist, and the implication is that we must look to expanding and improving the data on which we depend to make transportation planning decisions.

The intent of this study was to demonstrate the strengths and weaknesses of neural network modelling on the kind of data usually available to transportation planners. For the behaviour application in the Israeli context, this means within the framework

(9)

of the 1984 Traveling Habits Survey (Central Bureau of Statistics, 1987). This is the most recent travel survey commissioned and ®nanced by the Ministry of Transport. The survey's content and format are similar to the prevalent American and European surveys currently intended to serve transportation planners. While out-of-date for practical current planning, it serves the purpose as a theoretical example of applying neural network analysis.

Physical transportation issues are more aggregated by nature (for example, we may look at 1000 cars, but know nothing about their drivers). Although the questions remain very large scale and complex, the chances of getting the relevant data from existing databases (simulation packages or direct observation) is greater. This is the case in the second application, completed in 1997, which studies short-term road traf-®c forecasting. Knowledge of trac in the near future may enable operators to improve and adapt the service which is provided to road users. The forecasting was undertaken for a major intersection in the Ramat Aviv neighbourhood in IsraelÐa densely populated upper-income suburb of Tel Aviv and the location of Tel Aviv University. Here the neural network models run result in signi®cant improvement in the ability to predict trac conditions at the beginning of the ``next'' time period and to determine the optimal trac light program suited for the expected future con-ditions.

The short-term trac forecasting application uses the network simulation program, NETSIM (FHWA, 1992), for generating data.

So we apply neural network models to two very dierent types of transportation planning problemsÐone macro and the other microÐin order to examine the range of relevancy which this type of tool holds for transportation planning in general. In the ®rst behavioural application, we use the neural networks to discover whether or not there is a connection between the SED variables and the travel activities reported, and to add quanti®able evidence to the hypothesis that transportation plan-ning requires gender dierentiation. In the second physical transportation application we use the neural networks to predict future road conditions based on the previous ones.

2.2. B E H A V I O U R A L T R A V E L A N A L Y S I S Ð G E N D E R D I F F E R E N C E S I N T R A V E L A C T I V I T Y P A T T E R N S

(10)

departure time to work; and departure time to non-work activities for males and females. We used the neural network model to determine whether there is a connec-tion (a funcconnec-tion which describes these relaconnec-tionships) between these SED variables found in the data, and the travel activitiesÐwork, leisure, maintenance and total number of trips reported. One focus of the analysis was on the diering patterns for men and women.

2.2.1. The motivation

Israeli society, a mirror of Western culture, has evolved into a workforce that employs women in almost equal numbers with men. While the range of jobs held by women is still narrower than that of men, the rapid change in the Israeli economy from agriculture/manufacturing to tertiary services has provided women with much broader opportunities to work side-by-side with men. Tertiary industries are, by their locational nature, more uniformly located in accordance with the general population distribution, although subject to increasing forces for concentration and agglomera-tion. This dispersal has encouraged the entrance of women into the broader work-force by facilitating access from home to work. Women continue to have primary responsibility for child rearing and home maintenance, making rapid and easy com-muting to the workplace a major factor in job choice.

The development, albeit slow, of Israel's highway and road network supports this trend, as the automotive age makes suburbanization possible. In Israel, the rate of motorization has increased ®vefold since 1967 from 50 cars per 1000 population to over 260 per 1000 by the end of 1995, with over 1,459,000 vehicles now registered for a population of 5.6 million (Central Bureau of Statistics, 1996).

Women have entered the workforce from a combination of motivation and need. The women's revolution, while not as advanced in Israel as in some Western countries, has nevertheless encouraged women to take on a wider variety of jobs as they seek social equality and self suciency. This has been reinforced by equal access to education, including higher education.

At the same time, economic pressures have driven women to expand their employ-ment roles. The process of Westernization and ``Americanization'' has placed enor-mous strains on the household budget, and ``keeping up with the Joneses,'' ``the credit card economy,'' media advertising and international travel, have placed a high premium on consumerism. Meanwhile, the slowing down of the rate of gross national product increase, and Israel's high tax rate have made it highly problematic for the male wage earner alone to support the family in the style to which it aspires. The increasing need of the female worker, in particular, to maximize her time-space bud-get, puts pressure on agglomerating many of the dierent land uses.

(11)

separated in space and time, yet greater mobility can also be seen as contributing to the increased separation and form of land uses.

One important way to understand the spatial structure of a city, or metropolitan area and how it is changing, is to understand movement or activity patterns. Urban travel patterns and behaviour are key elements within a much broader nexus of built form and human activity patterns, which we call metropolitan areas. Thus, transpor-tation both aects and is aected by the planning of a city, so that a planner's under-standing of this relationship is extremely important in order to intervene eectively in terms of shaping policy, implementation and control.

Dierences in travel demand and in patterns of travel were analyzed in the context of the changes in society's socio-economic and demographic makeup, and accord-ingly, the neural network inputs and hidden unit concepts were chosen.

The two basic tenets of the research are that (1) there is a connection between these socio-economic and demographic variables and the travel activitiesÐwork, leisure, maintenance and total number of trips reported, and (2) that transportation planning requires gender dierentiation.

2.2.2. The data

The Israeli Traveling Habits Survey for 1984 (Central Bureau of Statistics, 1987), combined with data of the 1983 Israeli Population and Housing Census, is still the most recent database for travel planning purposes*. The objective of this 1984 survey was to examine the changes in travelling habits in order to bring up to date the data and the models that are based on a previous survey that relates to 1972/73. The more recent survey looks into the activities and their purposes, while travelling is considered as a means to carry out these activities.

``Activity'' is de®ned as the movement of a person from origin to destination in order to carry out a certain purpose, e.g. work, study, shopping, entertainment, etc. A person's decision as to whether to walk or to travel by other means in order to carry out a certain activity depends on several variables, such as the purpose of the activity, the distance between origin and destination, the existence of a vehicle at the disposal of the household, an orderly public transportation system, etc. Therefore, this survey puts an emphasis on the investigation of the activities and their connec-tions with the characteristics of the individual, the household and the geographical area, as a re¯ection of the population needs, while the journeys and their character-istics serve as a means to ful®l these activities.

The population of the 1984 Survey on Travelling Habits (Central Bureau of Statistics, 1987) includes everyone aged 8 years and over, and who is a permanent resident in Israel, an Israeli resident staying abroad for a period less than 2 months, or tourists staying in Israel over a year. The number of household members also includes those aged 7 years and less, although their activities were not examined in the survey.

(12)

The survey itself was performed in three geographical areas: Jerusalem and envir-ons, the extended Tel Aviv conurbation, and the Haifa conurbation.

The 1983 Population and Housing Census examined a sample of 20% of the over-all household population with regard to a variety of socio-economic topics, in ad-dition to the demographic data which were gathered from the entire population. The in-depth 20% census sample served as the major framework for this survey.

The Tel Aviv data was examined, letting the network learn on various parts of the data records, and testing it on the rest. The Tel Aviv conurbation included a sample of 1803 dwellings in Bat-Yam, Beer Yaacov, Bene Beraq, Even Yehuda, Givat Shmuel, Givatayim, Herzeliya, Hod Hasharon, Holon, Kefar Sava, Lod, Nes Ziyyona, Netanya, Or Yehuda, Petah Tiqwa, Qiryat Eqron, Qiryat Ono, Raanana, Ramat Ephal, Ramat Gan, Ramat Hasharon, Ramla, Rehovot, Rishon LeZiyon, Rosh Haayin, Tel Aviv±Jaa, Yehud (see Fig. 1).

The survey was performed over a 6 month period. The interviews were started in May 1984 and were terminated in January 1985, with a break during July and August 1984.

The enumerator worked on Mondays through Thursdays from 16.00 to 21.00. The survey examined 2 days' activities of every person included in the sampleÐthe day of the enumerator's visit (``today'') and the day preceding the visit (``yesterday''). The data that were actually gathered refer to the activities of the interviewees on Mondays through Thursdays, not including Fridays, Saturdays, eves of holidays and holidays. Given the evidence that there is considerable variation in daily travel ac-tivity patterns of the individual, it is hypothesized that there is less variation in the multi-day travel activity behaviour of individuals than in the daily behaviour (Hu and Hanson, 1986; Koppelman and Pas, 1984). Thus, the examination of the patterns across the 2 days helps us analyze the linkages in patterns and brings us closer to looking at a more consistent multi-day pattern than the more random daily pattern.

Data was selected from the survey and fed into the neural networks as inputs or outputs. The following socio-economic and demographic characteristics comprised the inputs:

From the household record:*

. number of persons per household

. number of children aged 15+

. number of children aged 8+

. number of children aged 5±7

. number of children aged 0±4

. number of persons employed per household

. number of vehicles per household

. net income per household

From the person record for both husband and wife:

(13)

. age

. sex (redundant but was kept for veri®cation reasons)

. number of years of schooling (0, 1±8, 9±12, 13+ years)

. employed?

. driver's license?

. economic branch:

(14)

agriculture, forestry, ®shing industry

electricity and water construction

commerce, restaurants, hotels transport, storage, communications ®nancing and business services public and community services personal and other services unknown

. occupation:

scienti®c and academic workers other professionals and technicians administrators and managers clerical and related workers sales workers

service workers agricultural workers skilled workers unskilled workers not known

. net income

. hours of departure for work

. hour of departure for home from work (from activity record)

. mode of transportation to work:

none, worked at home on foot, by bike busÐone line

busÐmore than one line ®xed route taxi (``sherut'') carpool

private vehicle as driver private vehicle as passenger motorcycle

other

(15)

2.3. P H Y S I C A L T R A V E L A N A L Y S I S Ð T R A F F I C C O N T R O L I N U R B A N A R E A S

2.3.1. The need

The growth of congestion in urban networks and the consequent constraints imposed on mobility have made it vital to manage and utilize the existing infrastruc-ture more eciently. The relationships among trac ¯ow variables play important roles in trac engineering. Existing models which try to capture these relationships involve various mathematical formulations that describe, for example, the relation-ships between density, ¯ow and speed. The best mathematical curve is determined by trying several dierent formulas and applying regression analyses. This involves advance speci®cation of which mathematical formula should be adopted and where it

should be shifted to another (in the case of multi-regime models) (Nakatsuji et al.,

1995).

The problems involved are complex in terms of both space and time. Spatially, how do we recognize the state of a particular road system (e.g. that certain links are congested, or that link occupancy or queue lengths have risen above certain thresholds), and temporally, how do we accomplish accurate short term forecasting? Ideally, a control system should anticipate the congestion that arises and its actions be planned accordingly, so that predictions of what conditions are likely to be in a few minutes' time would be both feasible and reliable.

Data from dierent links at dierent times may all be relevant, and developing in-dividual algorithms for each link is slow and dicult. Likewise, an algorithm devel-oped for a particular link is not likely to be transferable, and developing general

methods of analysis is also quite complex (Dougherty et al., 1993). Neural networks

dier from other models in that the predictions are data-de®ned. Rather than a com-puter struggling with instructions written by a programmer, the neural network deduces the strengths of dierent relationships by being exposed to a set of examples of the behaviour concernedÐthe network itself develops the relations.

One of the most prevalent procedures available for controlling and managing traf-®c control is that of monitoring trac-signal timing programs. A trac-signal (trac light) timing program is a set of parameters (e.g., cycle length, green split, osets) that control the right of way and that assign priorities among the links of the net-work. These programs determine the level of service of each link in the transpor-tation network.

Research studies have tried to ®nd the optimal signal-timing program for a group of intersections during peak hours (Singh and Tamura, 1974; Michalopoulos and Stephanopoulos, 1979; Lieberman, 1990). These programs were prepared o-line on the basis of average volumes at the various links of the network. All of these attempts were based on simplifying assumptions regarding the trac ¯ow character-istics, due to the complexity of the trac ¯ow behaviour in the network. Moreover, their performance is liable to be worse than expected because of the ¯uctuations in actual volumes compared to average ones.

(16)

1. ObjectivesÐThe problem has several objectives that should be met simul-taneously, e.g. minimization of delay, length of queues, number of stops, energy consumption and environmental impacts. Some of these objectives contradict one another.

2. DependencyÐIn an urban network the output of one intersection is the input of another, and the queue at one intersection can block another intersection. As a result, an intersection cannot be treated individually and must be coordinated with its environment. Consequently, the number of variables that have to be cal-culated simultaneously (e.g. cycle time, green splits and osets) becomes very large.

3. Parameter valuesÐThe large number of parameters describing the network (e.g. number of vehicle trips, percentage of stops, speed, occupancy, storage and more) and the uncertainty regarding their values make it necessary to consider a wide range of parameter values.

4. Mathematical modelÐThe relationships among the various parameters, variables and objectives are of a complicated nature; attempts to formulate them into one mathematical model end in inadequate results.

It has become apparent that the large variation in daily trac demand, and the routine occurrence of random events make the o-line signal timing-program plan-ning insucient. Thus, eorts have been aimed at developing methods for on-line trac control. Some methods are responsive ones (Robertson and Hunt, 1982;

Khoudour et al., 1991; Mauro, 1991), meaning that they are primarily designed to

respond to ¯uctuations in trac volumes without external intervention. Other strat-egies are based on the idea of centralized control. The core of these stratstrat-egies is the selection of a set of timing programs for the entire network, based on real-time

col-lected data concerning trac ¯ow conditions (Gal-Tsur et al., 1994). Both of the

methods have limitations: the timing-program selection strategies occasionally cause long queues in the system due to drastic changes in intersection coordination; the main fault of the trac responsive methods is their tendency to converge to a local optimum, due to computation constraints.

The idea of on-line signal-timing planning on the basis of broad information regarding current trac ¯ow conditions is very appealing. As the trac ¯ow process is mathematically dicult to formulate, such a planning algorithm should use past experience as an important source of information. In order to use this information properly, it should be well formulated and processed. The simulated case study analysis described here was designed to indicate whether neural networks provide an appropriate method for this task.

In order to use neural networks for improving trac control decisions, the events in the transportation networks must be well de®ned. Three types of attributes are involved:

1. Trac ¯ow descriptorsÐThese attributes describe the trac-¯ow conditions in

the network. They include attributes such as ``queue length at link I'' or ``volume

(17)

2. Timing program descriptorsÐThe second type of attributes describe the timing-program implemented in the network, like ``cycle length'' or ``green duration of

phase k''.

3. Performance indicesÐThe level of service obtained in the network is described by

attributes such as ``queue length at critical link I'', ``output of critical intersection

k'', etc.

2.3.2. The data

Due to the diculty in collecting extensive ®eld data* the simulation package, NETSIM (FHWA, 1992) was employed for generating data for this initial exper-imentation phase. Its embedded mechanisms allow for an in-depth investigation of the interactions between trac congestion, driver behaviour and signal control strat-egies within a simulated network.

NETSIM is a simulation tool developed by McTrans Center at the University of Florida, aiming to assess the impact of various trac signal programs on the trac ¯ow conditions in urban networks. It is a microscopic simulation toolÐsimulating individual vehicles in the network and the interactions between them, according to the ``car following'' model. The newest version, employed here, was developed in 1995.

As NETSIM is a stochastic simulation, it accepts the distributions of the trac ¯ow attributes (such as free ¯ow speed, saturation ¯ows etc.) as input, and generates various ¯ow conditions for the simulation process. A very important advantage of NETSIM is its ability to simulate intersections blockage situations resulting from queue over¯ows at certain links of the network.

The duration of the time period for which the simulation is executed is de®ned by the user. This time period can be divided into several intervals. Each time interval is characterized by its own trac ¯ow conditions and signal timing programs. Thus, NETSIM can assist the transportation planner in answering complicated questions such as what are the trac ¯ow conditions that justify switching timing programs, and which is the most appropriate timing program for each situation in the network.

Apart from modelling the conventional, NETSIM also simulates irregular events in the network, such as illegal parking or accidents, thus re¯ecting realistic situations that often occur in a trac network and have a substantial eect on trac ¯ow con-ditions.

The simulation package provides very detailed output, including numerous measures of eectiveness for each link and for the trac network as a whole. It also includes a graphic module representing both inputs and outputs of the transportation model, and animation of the full simulation period. The animation is very helpful in spotting major problems such as intersection blockage.

(18)

NETSIM has two main drawbacks. It lacks the full input echo, meaning it does not provide the user with the exact input generated by the random process based on the mean values provided by the user. The second drawback is its unfriendly user interface, both with respect to the input editor and to the analysis of a large number of output ®les.

The trac network simulated using NETSIM contains 14 internal links (the stretch between two intersections) and eight additional entry links in Ramat Aviv (see Fig. 2). So the trac simulation program generated data for a sequence of time periods for the Ramat Aviv network containing 22 links.

The initial inputs (trac ¯ow and timing program descriptors) generated for the neural network:

. the length of the previous time period (ranging from 45 seconds to 3 minutes)

. trac volumes for each of the eight entry links at the beginning of the next time

period

(19)

. the turning proportions (right, straight or left) for each of the 14 internal intersec-tions

. the ®ve timing programs (one ``on'', the others ``o'') for the previous time period

. the ®ve timing programs (one ``on'', the others ``o'') for the next time period

. data per link (a link connects two intersections) for previous time period:

number of vehicle trips delay time

percentage of stops speed in miles per hour average occupancy storage

phase failure

®ve spillback (places where the line of cars over¯ows into another link) indi-cators: start, max, average, variance, total

The outputs (performance indices) generated:

. data per link at the beginning of next time period:

percentage of stops speed in miles per hour average occupancy storage

phase failure

®ve spillback indicators: start, max, average, variance, total

. data for the whole network:

speed in miles per hour storage

Very simply, the objective is to predict the trac performance values of the next time period based on previous time period trac values as well as on the previous and next trac light programs being used.

(20)

(21)

dering given the variety of network models from which to choose. How can we ascer-tain which network model is best suited to the problem being analyzed?

The combination of network architecture (or topology), learning paradigm and learning algorithm de®nes a neural network model. One way to choose the model best suited to the application at hand is by process of elimination. We ask, ``what is the speci®c information that we expect to give the network as input, and what do we want the network's output to be''? While answering these questions, we try to deter-mine the general kind of application with which we are concerned.

3.1. A P P L I C A T I O N C A T E G O R I E S

The general applications fall into several categories (Caudill and Butler, 1989, 1992):

. Mapping. A mapping problem is one in which an input pattern is associated with

a particular output pattern. In the travel behaviour issue to be analyzed, for example, SED and travel characteristics of a household on the input level are as-sociated with types of travel activity on the output level. Backpropagation and counterpropagation networks are the most commonly used for these problems.

. Associative Memory. Associative memory stores information by associating it with

other information; recall is performed by providing the association and having the network produce stored information. The dividing line between a mapping and an associative memory problem is fuzzy. Basically, it depends on what the network is required to do with an input pattern on which it has not been trained. If the net-work is to recall one of the output patterns upon which it was trained, then the problem is more of an associative memory problem. If the network is to generate new output, it is likely to be a mapping problem. Associative memory problems can be solved by a variety of networks, including backpropagation, counterpropa-gation, Bayesian, Kohonen and crossbar.

. Categorization. In a categorization problem, the inputs are to be clustered into

cat-egories. Typically, the network is provided with an input pattern and it responds with the category to which the pattern belongs. Self-organizing networks such as Kohonen, adaptive resonance and Adaline networks are possible choices.

(22)

. Temporal Mapping. This is like other mapping questions except that the input data includes a consideration of time. Avalanches, backpropagation and recurrent networks are considered appropriate.

3.2. B A S I C L E A R N I N G P A R A D I G M S

The kind of ``learning'' or ``training'' employed helps in the selection of which net-work model to use. The three main learning paradigms are supervised, unsupervised, and reinforcement (Aleksander and Morton, 1990; Bigus, 1996). Supervised is the most common and ecient training procedure used to develop neural network classi-®cation and prediction applications. Unsupervised learning is often used for cluster-ing applications. Reinforcement learncluster-ing, though less frequently used than the other paradigms, has applications in optimization over time and adaptive control.

Before describing these methods in a bit more detail, let us relate them to some familiar situations. Supervised learning is like parental guidance: every time you are given a new assignment or problem to solve, you receive immediate and speci®c feed-back about how well you performed. Unsupervised learning is like designing a house: you are given the needs and desires of your clients and you have to come up with the house plans from scratch. Reinforcement learning is perhaps the most like every-day life (and therefore the most dicult; Bigus, 1996). It's like being employed. You are given a sequence of assignments which require decisions and somewhere down the road you are given a performance evaluation so that you can try to ®gure out which decisions were right and which were wrong along the way.

Supervised training is used when the network is provided with the exact output that it has to learn to produceÐit is learning by example. It is used when you have a database that consists of, or can generate examples that contain, both the problem and the answers. It learns how the input and output are related. The learning algor-ithm takes the dierence between the correct or desired output and the actual predic-tion the neural network made, and the algorithm uses that informapredic-tion to adjust the weights of the neural network so that next time the prediction will be closer to the correct answer. The networks must be shown the examples tens, hundreds and some-times thousands of some-times (networks are slow learners!) before they can predict the correct answer to some complex problems. Supervised learning is useful for training neural networks to perform classi®cation, function approximation, and time-series forecasting where the network is trained to predict outputs at some point in the future. It is especially useful in problems where data in the form of input/output examples are available, but we do not know the exact transformation for processing the input and producing the output. There are many real-world problems that are highly nonlinear, have complex relationships between multiple variables, and for which the mathematical function is not known or cannot be easily derived (Fig. 3).

(23)

answer. Sometimes we ask the network the questions: ``How are these data related? What items are the same or dierent and in what way?'' We want the neural network to look at the patterns of data and to cluster them so that similar patterns get put into the same cluster. Neural networks that are trained using unsupervised methods are called self-organizing because they receive no direction on what the desired or correct output should be. When presented with input patterns, the output units self-organize by ®rst competing to recognize the pattern, and then cooperating to adjust their connection weights. Unsupervised learning is employed when we want to use the neural network to perform clustering or segmentation of the input data. For behavioural issues, this could be useful in identifying in population groups general socio-demographic or economic trends leading to certain activities. The neural net-work will cluster together, or segment, the populations so that certain policies can be targeted at homogeneous groups (Fig. 4).

Reinforcement, or Graded learning, is used when you provide the network with a measure of how well or poorly it is doing without speci®cally indicating what the output should be (``too small'' or ``too large'', ``too high'' or ``too low'', for example). We have a series of decisions and only later do we ®nd out if they were right or wrong (like a chess game). This approach allows for very dicult, time-dependent problems to be solved. If the information regarding the speci®c desired outcome is available, supervised training is faster and more ecient than reinforce-ment learning. But if the problem involves a time series problem or when the speci®c desired outcome is not availableÐonly secondary signalsÐthen reinforcement

(24)

ing is an appropriate paradigm. These networks perform a mathematical optimiz-ation function similar to dynamic programming (Sutton, 1988) (Fig. 5).

3.3. N E U R A L N E T W O R K T O P O L O G I E S

The architecture or the number and arrangement of the neurons (input, hidden and output units) and their interconnections have a major impact on the processing capabilities of neural networks. In all networks the input units are assigned initial values from an outside source and arranged in the input layer of the network. Many networks have various numbers of hidden layers that receive inputs only from other processing units (not from the outside world). A layer of units receives the outputs of a previous layer of units and processes them simultaneously. The set of units which consists of the ®nal result of the network is called the output layer. There are three main categoriesÐfeedforward, limited recurrent, and fully recurrent networks.

Feedforward networks are used in situations when we can bring all of the infor-mation to bear on a problem at once and present it to the neural network. In this type of network, the data ¯ow through the network in one direction and the answer is based solely on the current set of inputs. Recurrent networks are used in situations when we have current information to give the network, but the sequence of inputs is important, and we need the neural network to store a record of the prior inputs and factor them in with the current data to produce an answer. Information about past inputs is fed back into, and mixed with, the inputs through recurrent or feedback

(25)

connections for hidden or output units. The network thus contains the memory of the past inputs. Fully recurrent networks provide two-way connections between all units in the neural network. Data ¯ow from designated input units to all adjacent connected units and circulates back and forth until the activation of the units stabil-izes. Finally, the output values can be read from the output layer. Both of our appli-cations use the feedforward typology and it will be described in detail in Chapter 4. For a lucid description of limited and fully recurrent networks see Bigus (1996).

(26)

elling techniques don't exist or are unsatisfactory. For descriptive abstracts of some of the more common networks which could be used to address just such problems in areas of social science, planning and transportation, see Appendix A. Detailed descriptions of major neural network models are plentiful in the Computer Science and Statistics literature. A few examples of clear explanations can be found in Aleksander and Morton (1990), NeuralWare (1988) and Bigus (1996).

It is necessary to determine the type of application category our cases ®t into, the appropriate learning paradigm and the topology to address the issues associated with each transportation planning question addressed. On these bases, neural network models were chosen for each case. The travel behaviour application is clearly a map-ping problem using supervised training. By the process of elimination outlined in sec-tion two, our choice of networks is between backpropagasec-tion and counterpropagation. Counterpropagation performs mapping by a look-up technique and is less general than backpropagation. Thus, we approach the travel behaviour question by applying backpropagation neural networks. The trac control appli-cation, also a mapping problem using supervised training, is approached using a Bayesian feedforward neural network model. The Bayesian algorithm was chosen over backpropagation in an attempt to automatically adjust for over®tting, given the large number of network parameters and relatively small data set.

4.1. T H E F E E D F O R W A R D B A C K P R O P A G A T I O N M O D E L

The feedforward neural network employing the backpropagation learning

algor-ithm was developed largely by Rumelhart and McClelland (1986) and Rumelhart et

al.(1986).

Again, neural networks are composed of units which intuitively correspond to neurons, and weights which correspond to the strengths of the edges (intuitively synapses) connecting neurons. They are made up of three types of units: input, hid-den and output units. While learning, the input and output units get their values from the learning patterns (the real data). The hidden units act as go-betweens from the input to the output units.

(27)

The components of a neural network are: units; edges connecting units; unit output function; a propagation rule determining the total input to a unit based on the out-puts of the units connected to it; and a learning procedure.

Notation:

oj=Calculated output for unitj

wij=Weight on edge from unitjto i

f=Activation function (of one variable), in our case, the logistic function

fx 1 1eÿx

tpi=Target output for output unitifor patternp

Dpwij=Change in weight on edge fromjtoion scanning patternp

dpj=Error term for unitjon scanning patternp (part of learning procedure)

n=Learning rate (adjustable parameter for learningÐdetermines whether the

network will make major adjustments after each learning trial or only minor adjustments)

m=Momentum (used to control possible oscillations in the weights, how much

of previous change to include in the present change, an adjustable parameter for learning)

p=Example pattern consisting of input data and output data

Figure 6 illustrates the general structure of a unit. The unit is represented by a cir-cle. Below the circle are entering lines, or edges. These represent connections coming from other units. The lines above the circle also represent connections from this unit to other units. The strength of these connections is represented by weights which are

attached to the edges. Inside the circle we see two equations, for the activation aiand

for the output of the unit,oi. The activation is the sum over all unitsj, whose output

is connected to this unit (unit i) of the output of unit j times the weight of the

con-nection between unit j and unit iwhich is denoted by wij. The output of unit i,oi, is

f(ai), where f is a function; there are various possibilities for f which yield dierent

kinds of networks.

(28)

Figure 7 illustrates the inner-workings of one unit, unit i. In the diagram unit iis connected to three units. The leftmost unit has output 1 and the strength of its

con-nection to unit i is 0.3, the middle unit has output 0 and the strength of its

connec-tion to the unit i is 0.1. Lastly, the rightmost unit has output 1 and the strength of

its connection to unit iis 0.25. Thus, the activation of unit i is calculated to be 0.55

(1). In this diagram we have chosen f to be a simple threshold function and the

out-put of unit iis computed to be 1 (2).

The general structure of a feedforward neural network is depicted in Fig. 8. Without loss of generality, the network is partitioned into layers each consisting of several units. The lowest layer is composed of the input units. These receive their ac-tivation values from input patterns. The last layer consists of the output units, whose output is the result of the network's computation. The intermediate layers consist of hidden units. The output of units in each layer provide the input to the units of the next layer. The computation of such a network occurs bottom up: an input pattern is presented which determines the outputs of the input units based on the weights and

the output function f, outputs are computed in the units of the next layer, and so on

(29)

until the units in the output layer are computed. Thus, we see that a hidden unit net-work is simply a computational device for associating output values to input values.

So far, we have concentrated on using the network as a computational device, this being the forward propagation mode. (Remember, feedforward is a de®nition of con-nection topology and data ¯ow. It does not imply any speci®c type of activation function or learning paradigm.) By comparing computed outputs to desired ones, the network can ``learn''. This is done by propagating the errors from the output layer back down through the hidden layers towards the input. This is called the backward propagation mode of the network (Fig. 9).

The basic idea of the backpropagation (Rumelhart and McClelland, 1986, pp. 322±328) method of learning consists of three steps. The input pattern is presented to the input layer of the network. These inputs are propagated through the network until they reach the output units. This forward pass produces the actual, or pre-dicted, output pattern. Because backpropagation is a supervised learning algorithm,

FIG. 8. Feedforward neural networks

(30)

the desired outputs are given as part of the training set. The actual network outputs are subtracted from the desired outputs and an error signal is produced. This error signal is then the basis for the backpropagation step, whereby the errors are passed back through the neural network by computing the contribution of each hidden unit and deriving the correct adjustment needed to produce the correct output. The con-nection weights are then adjusted and the neural network has just ``learned'' from ex-perience.

There are various types of inputs and outputs. Dierent networks dier in the kind of input and output values allowed. In the feedforward networks which we use, there will be no limitation on the input values. However, due to the activation

func-tionfwhich we have chosen, the output will range between 0 and 1.

A backpropagation neural network uses a feedforward topology, supervised learn-ing, and a backpropagation learning algorithm. Backpropagation is a general-pur-pose learning algorithm. It is powerful but expensive in terms of computational requirements for training. A backpropagation network with a single hidden layer of processing units can model any continuous function to any degree of accuracy (given enough units in the hidden layers) (Bigus, 1996). There are hundreds of variations of backpropagation in the neural network literature, and all claim to be superior to ``basic'' backpropagation in one way or the other. Since backpropagation is based on a relatively simple form of optimization known as gradient descent, mathematicians soon proposed modi®cations using more powerful techniques such as conjugate gra-dient and Newton's methods [see Wasserman (1993) for a discussion of some of the many variations of backpropagation]. However, ``basic'' backpropagation is still the most widely used variant. Its major virtues are that it is simple to understand, and it works for a wide range of problems.

Formally, there is a given set of patterns. The error per example patternp is:

Ep1₂

X

j

tpjÿopj2

wherejranges over the pattern's output values.

The error for the whole set of patterns is calculated by summing over all patterns in the set:

EX

p

Ep

The goal of the backpropagation procedure is to ®nd a process that changes the

weights so as to minimize the errorE. The weight change is inversely proportional to

the derivative of the error with respect to the weights. To implement gradient descent note that:

Dpwjiaÿd_dE_wp ji

(31)

Carrying the derivation (the steps of which are omittedÐsee Rumelhart and McClelland, 1986, pp. 325±330) of the partial derivative, we obtain:

Dpwjindpjopi:

Here, dpj is the error signal which depends on the activation function used, and n

is a constant of proportionality. Our activation function is the logistic function. The

output of unitjis:

opj 1

1e

ÿX

i wjiopi

For the logistic activation function

dpj tpjÿopjopj1ÿopj

whenjis an output unit, and

dpjopj1ÿopj

X

k

dpkwkj

when j is a hidden unit. Note that for a hidden unit j, dpj is de®ned in terms of the

dpkof units in the next higher layer, and the weights between unitjand those higher

layer units (denoted by k). Thus, these equations fordpjinduce a recursive process of

computation.

The backpropagation procedure thus computes the de®nition of a functional re-lationship by continuously changing weights so as to get closer to the function. The entire data set is run through the network multiple times with the network attempt-ing to vary the weights on edges until the desired ``level of learnattempt-ing'' has been met. This process is repeated for each pattern in the ensemble of patterns under consider-ation. Then the stored changes are summed up and added to the weights in the net-work (Fig. 10). For small learning rates, the weight changes per patterns may be applied immediately rather than summing them up and applying them after each epoch (an epoch is a scanning of the whole ensemble of patterns), without causing too much imprecision. Once the network is suciently trained it is ready to be used as a forecasting tool.

How do we know, however, when the network has been suciently trained? With modelling or regression problems using backpropagation the root mean squared error (RMS) is a good prediction of accuracy. In the travel behaviour case, the neural network is applied to a designated number of data recordsÐhousehold pat-ternsÐand the network converges by slowly reducing the total sum of squares (TSS) of individual errors; we then take the TSS, divide it by the number of patterns multi-plied by the number of network outputs, and take the square root which gives ap-proximately the average error per individual record per output unit.

(32)

error does not fall, or oscillates up and down, there is a chance that the network has hit a local minima and the training should be restarted with new initial random weights.

There are two main adjustable parametersÐthe learning rate nand the momentum

mÐwhich here are used to control the training process. The learning rate controls

the magnitude of the changes made when adjusting the connection weights. Basically, it is often good to make large corrections (large learning rate) early on, and then lower it as the training progresses, so that the network can generalize to patterns that it has never seen before.

Momentum is a training parameter which complements the learning rate. It ®lters out extreme changes in the weight values, so that there is less chance that the net-work will start oscillating. The momentum parameter causes the errors from previous training patterns to be averaged together over time and added to the current error.

When training a network, it is important to know when to stop. The intuitive instinct which says that the more the network learns the better it will performÐis

(33)

not true. When a network is overtrained (it over®ts the function which relates the inputs to the outputs) it performsÐor memorizesÐthe learning set patterns very well, but is unable to generalize to the test setÐto patterns it has never seen before. Instead it produces large prediction errors because it has not learned the fundamental relationships in the training data. Thus the number of sweeps through the network (epochs) is another parameter, along with the learning rate and momentum, with which to exercise control on the learning process.

4.2. B A Y E S I A N N E U R A L N E T W O R K S

Modern neural networks are trained using as an objective function a penalized sum of squared errors. The penalty function, or regularization, penalizes models for which there are many large weights in favour of models with fewer large weights. Models with a few large weights are basically simpler, expected to avoid over®tting the noise in the data, and therefore to provide better generalizability of the ®tted model (Neal, 1996).

The regularization parameters (also called weight decay parameters) determine the size of the penaltyÐthe larger they are, the simpler the model is forced to be. The problem is, what is the right size of these parameters for a given situation? One approach is by trial-and-error using the goodness of ®t to a validation data set in order to determine suitable values for these parameters (this was the approach with the backpropagation model in the travel behaviour application).

The Bayesian approach updates these parameters dynamically, by trying to choose that combination of regularization and weight parameters that maximize the evidence of the model for the Bayesian framework. This evidence is a combination of some prior distribution for the regularization parameters, and the information in the data, and tells us how probable the model is for the data available.

It turns out that the evidence correlates highly with the generalization error. When the evidence is high for the learning set, the error is small for a validation set. Good models can be selected among a collection of models by choosing those with high evi-dence. In this case study we use the test set error to check that this relation between evidence and validation error really holds.

The Bayesian paradigm is based on the Bayesian decay learning algorithm (Hinton, 1987). The primary motivations for using this technique on the physical trac control problem was, given the vast amount of network units (since the trac intersections have to be analyzed as part of the surrounding environment), the rela-tively small number of data patterns (sets) and the large number of networks to be run, to use a technique which automatically controls for over®tting and uses a small test set (relative to the size of the learning set).

The error is ElP_jiw2

ji wherelis the weight decay penalty and can vary between

layers (input to hidden, hidden to hidden, hidden to outputÐthis is also true for the learning rates and momentum values in backpropagation). The paradigm gives a

(34)

pick up the most likely penalty for the speci®c data set. A good reference for Bayesian neural networks is Thodberg (1996).

According to Thodberg (1996), Bayesian neural networks have the following ad-vantages:

1. weight decay parameters are adjusted to their near optimal values, thereby giving the best generalization;

2. adjustment is done during training so that there is no need for a separate search for these parameters;

3. an estimate of the evidence for each model is given;

4. There is no need for a separate validation set during training. Thus all the data, apart from the test set, can be used for training, and so provide better models (this is also the case for our previous application with feedforward backpropaga-tion networks. However, the Bayesian algorithm enables us to have small training setsÐa critical factor given the size of our data set).

There are two Bayesian neural network modelling issues with which we are con-cerned here. Weight decay parameters can be chosen so that dierent types of weights are penalized dierently. For example, ®rst and second layer weights may be treated dierently. Or weights from one type of input may be treated dierently from those coming from other inputs. In the current implementation we consider both: 1. separate weight decay parameters for each weight layer; and

2. separate weight decay parameters for dierent logical groupings of the input par-ameters.

The second issue is the frequency of updating the weight decay parameters and the evidence. There are two options we consider:

1. update decay parameters frequently and update evidence infrequently; 2. update all parameters together and infrequently.

Both options were undertaken.

4.3. N E U R A L N E T W O R K S U T I L I Z I N G T H E E N T R O P Y O B J E C T I V E F U N C T I O N

Some of the output variables that were considered (see Chapter 5) were zero/one indicator variablesÐeither the thing happened or it did not (zero did not, one did).

For such outputs, one is actually trying to predict the probability of the output being a one, based on the inputs. It has been shown that in this situation, the appro-priate objective function is the (penalized) entropy function, and not the sum of squared errors (Bishop, 1995).

However, Bayesian neural networks cannot be used here since they require a Gaussian (not binomial) model for the errors. Thus, dierent decay parameters must be tried in order to adequately cover the range of models that might ®t.

(35)

regression. The goodness-of-®t of such models is based on an entropy criteria (called Kullback-Leibler distance). So, just as linear regression was ®rst tried for regular quantitative outputs (Chapter 5), logistic regression was ®rst tried for zero/one binary variables.

Evaluation of neural network or logistic regression models for binary output vari-ables is determined by the number of correct and incorrect decisions. In order to make a decision, a value for the probability threshold needs to be chosen. Then one can determine the proportion of correctly classi®ed ones and the proportion of cor-rectly classi®ed zeroes.

(36)

The simulation tool used is Explorations in Parallel Distributed Processing (McClelland and Rumelhart, 1988). A sample of some of the hypotheses and results of the neural network experiments follow. Details of all experiments can be found in Shmueli (1992).

The data were divided into test and learning sets, while experimenting with various combinations of learning and test set sizes throughout the process.

The inputs were presented to the network as unary*, binary or real data. Due to the de®nition of the logistic function, the outputs were scaled to have values between 0 and 1.

5.1.1. Neural networksÐbasic architectureÐstrategies for basic learning

Numerous networks (many of them variations on the same network for experimen-tal purposes) were constructed. Figure 11 illustrates one basic network.

Intermediate, or hidden, layers are introduced to the network in order to cap-ture more complicated concepts, or connections, among inputs and outputs. These optional layers provide an opportunity to model our intuition as to the inter-relationships, and the network can provide the relative weightsÐon edges which connect the units. The input and output layers correspond to the selected raw and processed data from the Traveling Habits Survey (Central Bureau of Statistics, 1987). The internal (or hidden) units model the more complex relation-ships of travel behaviour. Suppose a second order concept depends upon certain parameters. Then the hidden unit which models this concept is connected only to those input units corresponding to these parameters. In this example (represented in Fig. 11), the input layer consists of household and individual SED and travel data. The two hidden layers represent the model or concepts, and the output layer contains the observed travel behaviour according to trip types. Here, the outputs are de®ned as the number of female work, maintenance (shopping and

179

(37)

FI

G.

11.

Basic

network

archi

tecture

Ð

travel

behavio

ur

analy

(38)

personal), leisure and total trips reported. The type of data representation (real, unary or binary) are noted in the ®gure.

The hidden layers are constructs which may serve as composite variables. Behaviour is not determined by simple SED variables. It is aected by psychological constructs which may represent some combination of simple measurable factors, as they are interpreted in the mind. Hence, one of the strengths of neural networks is their ability to produce intermediate variables, representing composite factors mimicking hopefully the processes of the human mind.

The hidden layer concepts were built around functional interrelationships among income, educational level, age, size of household and age distribution of members, employment, economic branch, occupations, license and car ownership, mode of transport to work, and hours at work. All are considered with respect to household and individual activity patterns. The interrelatedness of the var-ious inputs are examined by creating second and third order concepts: long-term preference (male and female); economic well-being (household, male and female); homeload (male and female); and career indicator (male and female). In essence, a compromise is being made. There are no psychometric measures in the data to determine these concepts, so SED variables are employed as proxies.

Each concept is connected to various inputs:

. Long-term preference (male, female)Ðnumber of persons per household; number

of years of schooling; parental status; occupation, economic branch.

. Economic well-being (of household)Ðnumber of persons per household; net

income per household.

. Economic well-being of individual (male, female)Ðnumber of persons per

house-hold; net income per househouse-hold; age; net income per (male or female).

. Homeload (male, female)Ðnumber of persons per household; number of children

between ages 0±4, 5±7, 8 + , 15 + ; number of persons employed in household; number of vehicles per household; net income per household; employment status; driver's license, long-term preference (®rst layer hidden unit).

. Career indicator (male, female)Ðnumber of persons per household; net income per

household; number of years of schooling; employment status; occupation; net income per (male or female); number of hours at work; mode of transportation to work, long-term preference (®rst layer hidden unit).

For example, on the input level, the hidden unit ``economic well-being'' (of house-hold) is connected to the number of persons per household and the net income per household. On the output level, the connection is to the number of work, total, fun and maintenance trips.