Dynamic Networks and Behavior: Separating Selection from Influence

(1)

INTERUNIVERSITY CENTER FOR SOCIAL SCIENCE THEORY AND METHODOLOGY

Dynamic Networks and Behavior:

Separating Selection from Influence

Christian Steglich Tom A.B. Snijders

ICS / Department of Sociology, University of Groningen, The Netherlands

Michael Pearson

Centre for Mathematics and Statistics, Napier University, Edinburgh

revised version

Groningen, December 15, 2006

(2)

A

BSTRACT

A recurrent problem in the analysis of behavioral dynamics, given a simultaneously evolving social network, is the difficulty of separating effects of partner selection from effects of social influence. Misattribution of selection effects to social influence, or vice versa, may lead to wrong conclusions about the mechanisms underlying observed dynamic data, and thus be of limited predictive power. While a dependable and valid method would benefit several research areas, according to the best of our knowledge, it has been lacking in the extant literature. In this paper, we present a recently developed family of statistical models that enables researchers to separate the two effects in a statistically adequate manner. To illustrate our method, we investigate the roles of homophile selection and peer influence mechanisms in the joint dynamics of friendship formation and substance use among adolescents. Making use of a three-wave panel measured in the years 1995-97 at a school in Scotland, we are able to assess the strength of selection and influence mechanisms, identify network regions where they operate, and quantify the relative contributions of homophile selection, assimilation, and control mechanisms to observed similarity of substance use among friends.

(3)

I

NTRODUCTION

In social groups, there generally is interdependence between the group members’ individual behavior and attitudes, and the network structure of social ties between them. The study of such interdependence is a recurring theme in theory formation as well as empirical research in the social sciences. Sociologists have long known that structural cohesion among group members is a good indicator for compliance with group norms (DURKHEIM 1893, HOMANS 1974). Research on social

identity theory identified within-group similarity and between-group dissimilarity as principles by which populations are subdivided into cohesive smaller social units (TAYLOR &CROCKER 1981,

ABRAMS & HOGG 1990). Detailed network studies (e.g., PADGETT &ANSELL 1993) as well as

discussion essays (EMIRBAYER &GOODWIN 1994,STOKMAN AND DOREIAN 1997)made clear that

to obtain a deeper understanding of social action and social structure, it is necessary to study the dynamics of individual outcomes and network structure, and how these mutually impinge upon one another. In methodological terms, this means that network structure as well as relevant actor attributes – indicators of performance and success, attitudes and cognitions, behavioral tendencies – must be studied as joint dependent variables in a longitudinal framework where the network structure and the individual attributes mutually influence one another. We argue that previous studies of such joint dynamics have failed to address fundamental statistical and methodological issues, which may have had undue influence on reported results. As an alternative, we present a new, statistically based method for this type of investigation. In an elaborate example, we illustrate how shortcomings of earlier approaches can be overcome by applying the new method.

The example concerns the joint dynamics of friendship and substance use in adolescent peer networks (HOLLINGSHEAD 1949, NEWCOMB 1962). It is by now well-established that smoking,

alcohol and drug use patterns of two adolescents tend to be more similar when these adolescents are friends than when they are not (COHEN 1977, KANDEL 1978, BROOK,WHITEMAN &GORDON

(4)

same time similar on salient individual behavior and attitude dimensions – a phenomenon for which FARARO &SUNSHINE (1964) coined the term homogeneity bias. In statistical terminology, this kind of

association is known by the name of network autocorrelation, a notion originating from the spatial statistics literature (DOREIAN 1989). Up till now, however, the dynamic processes that give rise to

network autocorrelation are not sufficiently understood. Some theorists evoke influence mechanisms and contagion as possible explanations (FRIEDKIN 1998, 2001; OETTING &DONNERMEYER 1998)

– a perspective largely in line with classical sociological theory on socialization and coercion. Others invoke selection mechanisms and homophily (LAZARSFELD &MERTON 1954, BYRNE 1971,MCPHERSON

&SMITH-LOVIN 1987,MCPHERSON,SMITH-LOVIN &COOK 2001) – while still others emphasize

the unresolved tension between these two perspectives (ENNETT &BAUMAN 1994,LEENDERS 1995,

PEARSON &MICHELL 2000, HAYNIE 2001, PEARSON &WEST 2003, KIRKE 2004).

In order to explain network autocorrelation phenomena, one must take a dynamic perspective. Considering the case of network-autocorrelated tobacco use, a smoker may tend to have smoking friends because, once somebody is a smoker, he or she is likely to meet other smokers in smoking areas and thus has more opportunities to form friendship ties with them (selection). At the same time, it may have been the friendship with a smoker that made him or her start smoking in the first place (influence). Which of the two patterns plays the stronger role can be decisive for success or failure of possible intervention programs – moreover, a policy that is successful for one type of substance use (say, smoking) may fail for another (say, drinking) if the generating processes are different in nature. Modeling this as a dynamic process using longitudinal data is necessary to address the problem adequately.

The most common format of such data in sociological studies is the panel design – which introduces some analytical complications, because the processes of influence and selection must reasonably be assumed to operate unobservedly in continuous time between the panel waves. Finally, due to the potential importance of indirect ties (two persons having common friends, etc.),

(5)

complete network studies (i.e., measurements of the whole network structure in a given group) are clearly preferable to personal (ego-centered) network studies. The interdependence of individual observations in complete networks, though, rules out the application of statistical methods that rely on independent observations. To our knowledge, no previous study succeeded in a statistically and methodologically credible assessment and separation of selection and influence mechanisms.

In this paper, we show how previous approaches failed to adequately respond to these statistical-methodological challenges, and we present a new, flexible method, enabling researchers to statistically separate the effects of selection mechanisms from those of influence mechanisms. The method, introduced by SNIJDERS,STEGLICH &SCHWEINBERGER (2006), is based on a stochastic

model which formalizes the simultaneous, joint evolution of social networks and behavioral (or attitudinal) characteristics of the network actors. These models can be fitted to data collected in a panel design, where complete networks as well as changeable attributes are measured. We will call this data type network-behavior panel data, understanding that ‘behavior’ here stands for changeable attributes in a wide sense, including attitudes, performance, etc. Model fitting yields parameter estimates that can be used for making inferences about the mechanisms driving the evolution process. The new method extends earlier methodology for the analysis of ‘pure’ network dynamics (SNIJDERS 2001, 2005) by adding components that allow for the inclusion of co-evolving behavioral

variables.

Next to the already mentioned network autocorrelation phenomena, also other aspects of dynamic network-behavior interdependence can in principle be investigated with our models. For instance, GRANOVETTER’s (1982, 1983) theory about weak ties as providers of opportunities for

changing individual properties, or BURT’s (1987, 1992) theory about brokerage and structural

competition, can be tested for their validity on the actor level in a dynamic context where the network is subject to endogenous change. We hope that the new model presented here will open new paths for testing and elaborating such theories. In the present paper, however, we restrict

(6)

attention to a general sketch of the modeling framework and one illustrative application, the analysis of selection and influence mechanisms with respect to substance use behavior (tobacco and alcohol consumption), based on network-behavior panel data measured in 1995-97 at a secondary school in Scotland.

Overview

The paper is structured as follows. First, the problem of assessing simultaneously operating selection and influence processes is illustrated by identifying three major methodological obstacles that need to be addressed, and giving a summary review and critique of prior research methods. The example of friendship and substance use in adolescent peer groups will only play a tangential role at this stage. Then, the new, actor-driven model family for network and behavior co-evolution is introduced on a conceptual level. For a detailed treatment of the stochastic formalization, we refer to SNIJDERS ET AL. (2006). We illustrate the new method by applying it to a three-wave data set

about the co-evolution of smoking and drinking behavior with friendship networks (PEARSON &

WEST 2003). In these analyses, not only the separation of selection and influence effects is

addressed, but also more detailed questions about where in the network these effects occur, to what degree they are substance-specific, and to what degree network autocorrelation is co-determined by other dynamic patterns such as trend, gender-based selection, or main tendencies of reciprocity and network closure. In the concluding section, the main results of the article are summarized, and the new method is put into perspective by hinting at further research areas in which we think the method can be fruitfully applied.

N

ETWORK AUTOCORRELATION AS AN EMPIRICAL PUZZLE

In the literature on the effects of peer groups in adolescence, one can find several very specific yet partly conflicting hypotheses about how friendship networks co-evolve with behavioral dimensions

(7)

in general, and with socially harmful behaviors like tobacco use and alcohol consumption in particular. The underlying theories posit conceptually distinct, sometimes complementary mechanisms which, nonetheless, lead to similar cross-sectional patterns of network autocorrelation. A short overview is given in the following paragraphs. Taking this literature scan as panoramic background, a set of criteria (‘key issues’) is derived which an explanatory model for network autocorrelation should fulfill. The section ends with an evaluation of previous attempts at disentangling selection and influence, making use of these criteria. By then, the ground should be prepared for introducing our own modeling approach.

A panorama of theories, mechanisms, and evidence

Arguments from socialization theory stress the importance of structural cohesion for creating behavioral homogeneity in a group (OLSON 1971, HOMANS 1974). These ideas were applied to drug

use and deviance among adolescents in a series of papers by OETTING and co-workers (OETTING

&BEAUVAIS 1987,OETTING &DONNERMEYER 1998). They hypothesize that “the strength of the bond

between the youth and the primary socialization sources is a major factor in determining how effectively norms are transmitted” and that “the major source of deviant norms is usually peer clusters” (1998, p.995). In short, the claim is that adolescents are influenced by their peers, and that cohesion facilitates this social influence. In a study on peer delinquency, HAYNIE (2001) analyzed the first wave of the Add Health

survey data to investigate network autocorrelation of delinquent behavior. Using an ego-centered measure of network density, she concluded that density is a strong moderator of the delinquency-peer association, network autocorrelation being stronger in the denser regions of the network. However, due to the cross-sectional nature of the data, the study was unable to draw a firm conclusion concerning the underlying mechanisms of influence and selection. HAYNIE provides

(8)

selection – pointing more generally to a behavior-specific strength of influence and selection processes that we also aim to address.

For the dynamics of alcohol use, peer influence seems to play a strong role (NAPIER,GOE AND BACHTEL 1984,OETTING &BEAUVAIS 1987, ANDREWS,TILDESLEY,HOPS &LI 2002). On

the smoking dimension, however, the role of peer influence is more disputed than on the alcohol dimension. By comparing two studies, FISHER &BAUMAN (1988) found that influence played a

relatively stronger role in the dynamics of alcohol consumption than in the dynamics of smoking. Even more pronounced were the results by ENNETT & BAUMAN (1994), who – in possible

contradiction to the influence paradigm – found that smoking behavior predominantly occurs outside cohesive groups, among adolescents that either are isolated in their peer group or weakly attached to the more cohesive parts of the peer group. This could mean that cohesive peer pressure works in the opposite direction, and that in their particular study, peer clusters effectively enforced non-smoking. ENNETT &BAUMAN, however, prefer to explain their discovery as resulting more

from selection effects than from influence: “selection provides a more likely explanation than influence for smoking by these adolescents” (p.661).

While the literature thus promotes the selection mechanism as the better explanatory model for similarity of smoking behavior among friends, it is also widely acknowledged as affecting alcohol consumption – e.g., in their study on alcohol, FISHER &BAUMAN (1988) found evidence for

alcohol-based selection, but this evidence was weaker than the smoking-alcohol-based selection effect found in their other study. Presupposing the importance of alcohol-based selection in friendship formation, a strong tradition of research focuses on the assessment of family and personality determinants for associating with drinking peers (ELLIOTT,HUIZINGA &AGETON 1985,THORNBERRY &KROHN

1997). This illustrates the larger point that to understand substance use related selection processes, it may be necessary to control for a host of other variables. Whether it is substance use itself or rather the correlates of substance use that determine friendship formation is still an open question,

(9)

and existing studies of such processes may have unduly diagnosed substance use based selection where in fact other types of friendship formation were operating. This may be notably the case in the literature on group formation processes, which stresses the importance of homophily as a determinant of network structure (MCPHERSON ET AL. 2001). Recent contribution to this research

area suggest that homophily might neither be the only nor necessarily the strongest determinant of selection. For example, ROBINS &BOLDERO (2003) recently suggested that perceivable differences

between members of a group might be a major reason for a group hierarchy to emerge. Differences in substance use within a cohesive group might, in this way, foster group structure rather than lead to group disintegration (which a pure homophily based theory would predict). Homophily, on the other hand, might work more strongly for actors who are not (yet?) cohesively embedded in the network.

As can be seen from this little overview, multiple theoretical accounts have been advanced for explaining network autocorrelation, both for tobacco use and alcohol consumption. Selection and influence seem to occur on both dimensions (see also KIRKE 2004), in different ways for the

two behaviors, and there is reason to suspect that also other processes, not directly related to substance use, play a role determining network autocorrelation. In our own study, we will separate selection and influence mechanisms on both behavioral dimensions and show how much either mechanism contributes to the observed amount of network autocorrelation. Moreover, by analyzing smoking and alcohol consumption data on the same network, we will show how both processes are related to each other, and offer suggestions how to interpret some of the results of earlier research. Our analysis is certainly not the first attempt to simultaneously assess selection and influence, and determine the relative strength of each process. However, it differs from previous approaches by its statistical rigor, and the aim to achieve a methodologically sound separation of selection and influence effects by employing a model that explicitly represents the mutual dependence between

(10)

network and behavior. In the following sections, we will provide reasons why previous, similar attempts cannot be considered trustworthy.

Key issues and a typology of previous approaches

Only a couple of the studies mentioned in the overview above tested the competing theories against each other. The earliest publications on this topic seem to be the articles by COHEN (1977) and

KANDEL (1978), which represent two of the three major previous approaches to the study of

network autocorrelation that we propose to distinguish here. These are the contingency table approach (KANDEL 1978,BILLY &UDRY 1985,FISHER &BAUMAN 1988), ad-hoc social network analysis

(COHEN 1977, ENNETT &BAUMAN 1994,PEARSON &WEST 2003,KIRKE 2004) and structural

equation modeling (KROHN, LIZOTTE, THORNBERRY & MCDOWALL 1996, IANNOTTI, BUSH &

WEINFURT 1996,SIMONS-MORTON &CHEN 2005, DE VRIES,CANDEL,ENGELS &MERCKEN 2006).

In the following, these approaches will be shortly characterized and evaluated against three key issues that are fundamental for the separation of selection and influence effects. These key issues are incomplete observations implied by the use of panel data while the underlying evolution processes operate in continuous time, the control for alternative mechanisms of network evolution and behavioral change in order to avoid misinterpretation in terms of selection and influence, and network dependence of the actors, which precludes the application of statistical techniques that rely on independent observations.

To motivate why these are key issues, a consideration of data format requirements for the task at hand is opportune. It is obvious that longitudinal data are necessary. But which type of data exactly, and how does the choice for a specific type of data relate to the objective of separating selection from influence? If the whole process of network evolution and adjustment of behavior were traced in continuous time, little ambiguity would be left about whether selection or influence occurs at any given moment: network changes give evidence of selection processes, behavior

(11)

changes indicate influence processes. Unfortunately, panel data, measured at only a few discrete time points, are the longitudinal standard format in sociological studies, and social network research is no exception to this rule. The incompleteness of panel data makes it impossible to unequivocally identify which process is responsible for an observed change, even if only the network or only the behavior changes from one observation to the next – simply because a change on the respectively other dimension may have happened, but there has been a change back to the original value afterwards during the same period. The two columns on the left in Figure 1 illustrate such situations. Let us take a look at the middle column and suppose that a pair of pupils is observed at moments

t0 and t1. At both moments, they are non-drinkers, but while they are unconnected at t0, there is a

unilateral friendship tie between them at t1. At first sight, one might diagnose a pattern of homophile

selection. However, the unobserved process that generated these data may have looked fundamentally different, as illustrated in the brackets. A while after the observation at t0, the actor

on the right may have started drinking, say, because he didn’t have any friends. The actor on the left may have noticed that and started a therapeutic friendship with the new drinker. Under these circumstances, the drinker quit drinking again. Only now, the network is observed again at t1. In this

scenario, the processes actually happening have nothing to do with homophile selection, and to diagnose the observations as unequivocal evidence for it is plainly wrong. Nonetheless, literally all studies on the topic that we are aware of commit this error. As the example illustrates, alternative mechanisms of network formation as well as behavior change need to be controlled for in order to preclude such misinterpretation. A similar scenario, sketched in the left of Figure 1, illustrates how homophile selection (taking place shortly before observation moment t1) can be misdiagnosed as

the occurrence of social influence (the default interpretation of the observed data when the happenings in the brackets are neglected). The longer the time intervals are between observations, the higher the chances that such alternative trajectories happen. In the studies on adolescent behavior mentioned, time intervals of one year are the rule – while scenarios as sketched in Figure

(12)

1 can reasonably be assumed to take place within a few months. The use of retrospective questions for assessing the particular relationship’s history (KIRKE 2004) in principle could remedy this

predicament. However, retrospective social network information is rare, and it moreover is notorious for its reliability problems (BERNARD,KILLWORTH,KRONENFELD &SAILER 1985) such

that this practice cannot be recommended.

> Insert Figure 1 about here. <

It should be noted that the problem of alternative generating mechanisms is not limited to situations where the data are incompletely observed. In the column on the right of Figure 1, the newly created tie could result from homophile selection (and indeed would be unequivocally diagnosed as such by all previous approaches in the literature). However, it also could result from a mechanism known to play a strong role in friendship formation, namely triadic closure. Having a common friend at t0 may be the reason why at t1, a tie is established between the two previously

unrelated actors. The message is that even if we can assume that no unobserved changes have taken place, there still is interpretative leeway concerning the mechanisms responsible for a given observed change. Controlling for such mechanisms as far as possible is a criterion that previous research largely has failed to address.

Next to the temporal aspect of data collection, also the cross-sectional design is of importance for the prospect of distinguishing selection from influence effects. There are two general types of social network studies, one being the ego-centered network studies, in which for a random sample of individuals, the network neighbors and their properties are assessed. The other type are the complete network studies, in which for a given set of actors (the egos), all relational links in this set are assessed. For the present purpose, the collection of ego-centered network data is inadequate because when collected in a panel study, such data usually refer to different relational partners over time (the alters), while nothing is known about other, potential relational partners that were not selected. Due to this incompleteness, a meaningful assessment of selection processes is impossible.

(13)

For adequately measuring selection effects, therefore, a meaningful approximation of the set of potential relational partners must be made, whose individual properties must be known irrespective of whether they actually become partners or not. In studies of complete networks, these data are available for all actors in the network. However, this information comes at the price of dependence of observations, which rules out the application of the common statistical procedures, as these rely on randomly sampled data. Depending on the exact nature of the data, such analyses can be biased towards conservative as well as liberal testing (KENNY &JUDD 1986, BLIESE &HANGES 2004), and

if possible should be avoided.

An assessment of previously used analytical methods

There are earlier attempts to separate selection effects from influence effects, which above we categorized in three main groups: modeling frequencies in a contingency table, ad-hoc applications of social network analysis, and structural equation modeling. Here, we will shortly characterize these methods, in this order, and highlight the degree to which they meet the requirements on the three key issues introduced.

One of the earliest studies attempting to assess the relative strength of selection and influence mechanisms in longitudinal network data is KANDEL’s (1978) study of high school

friendship networks co-evolving with four behavioral dimensions (marijuana use, educational aspirations, political orientation, and delinquency). Prototypical for the contingency table approach, dyads of mutually-chosen best friends were cross-tabulated according to whether or not the two pupils’ friendship remains stable between first and second measurement, and whether or not their behavior falls in the same (binary) category. Influence and selection were assessed, for each behavior, in two separate analyses: influence was assessed by studying the subsample of respondents who named the same best friend in both waves, while selection was assessed on the subsample of changing friendship ties. For both types of analyses, probabilities of change towards a behaviorally

(14)

homogeneous friendship were calculated, and based on these probabilities, predictions were generated for the whole sample. The dyadwise joint distribution of model prediction and actual data then was aggregated into a φ-coefficient, for which significance levels were reported under the assumption of dyadic independence. The analyses presented by FISHER & BAUMAN (1988) and

BILLY &UDRY (1985) follow similar analytical strategies.

KANDEL’s study was a seminal contribution. Together with the parallel work by COHEN

(1977), it opened up the discussion on the determinants for network autocorrelation, and it expounded some methodological issues of the task to separate selection and influence processes on empirical grounds, such as the necessity to study longitudinal data, and the explicit admission of problems with the applied statistical methods due to network dependence of observations. The two other ‘key issues’ introduced above, though, remained unaddressed. The issue of incompletely observed data in this panel setup, puts a strong question mark behind the results. The generalization of the subsample-based findings to the whole data is dubious, as friendship and substance use can change between observations, potentially affecting the composition of the subsamples. There is no statistical basis on which the assessed effects of influence and selection could be generalized to the whole sample, let alone the population of friendship networks. The issue of alternative generating mechanisms, however, could in principle be addressed within the approach. Due to data limitations in this particular study, this can be done only by adding actor-level properties to the model. Triad-level effects cannot be addressed, because respondents were asked to name only one friend, naturally limiting the analyses to dyads.

As examples for what we call the ad-hoc social network analysis approach, let us consider the studies by ENNETT &BAUMAN (1994),PEARSON &MICHELL (2000) and PEARSON &WEST (2003).

They rely on output from the NEGOPY-software (RICHARDS 1995), which categorizes respondents

into the four sociometric positions ‘group member’, ‘peripheral’, ‘liaison’ and ‘isolate’, and all three studies focus on smoking behavior (augmented with cannabis use in the PEARSON papers). The

(15)

pre-processing of the network is typical for the studies we summarize in this approach, with all the problems associated to the arbitrariness in the choice of the particular pre-processing algorithm. COHEN (1977), for instance, relies on a definition of sociometric groups proposed by COLEMAN

(1961), while KIRKE (2004) relies on the identification of weak components provided by the GRADAP software (SPRENGER & STOKMAN, 1989). The different options available at this

pre-processing stage are manifold, and their consequences are not well-understood.

ENNETT &BAUMAN analyzed their pre-processed data by techniques similar to those of

KANDEL, basically an extension of the contingency table approach that allows to distinguish

sociometric positions. They are added as ‘independent’ variables to the predictor equations representing selection and influence for the subsamples of dyads used in the contingency table approach. PEARSON and colleagues offer an alternative perspective on how respondents’ sociometric

position and (binary) substance use co-evolve over time. By fitting a continuous-time Markov model to their pre-processed data, expected sojourn times in each of the states (sociometric position × substance use) were calculated. Short transition times associated with peripheral positions indicated the possibility of greater behavioral instability among such individuals, while longer transition times of isolate risk-takers as compared with isolate non risk-takers indicated that substance use among isolates could appear to be more prevalent in a cross-sectional study. For these studies, the reliance on NEGOPY output implies that network positions are used as if they were exogenously determined

actor attributes. Further mutual interdependencies in the network structure, or the specific identity of the peers, are not taken into account. Statistical methods are used based on independence assumptions which clearly are erroneous so that the studies cannot establish a firm statistical conclusion concerning processes of influence and selection. To illustrate, let us consider the prevalent dynamic pattern diagnosed by PEARSON &WEST (2003), the transition from group

non-risk-taker to group non-risk-taker. At first sight, one might read this transition as an indicator for peer influence. This interpretation, however, may be unfounded for at least two reasons. First, the ‘group’

(16)

referred to may be different in the two observations (in fact, the data indicate strong friendship dynamics and instability of such ‘groups’), which allows for selection effects to play a role in this transition as well. Second, it is not clear from the NEGOPY output whether the ‘group’ referred to

at either time point consists of a majority of risk-takers, or how it is composed otherwise (though the ‘groups’ are, in fact, fairly homogeneous concerning substance use behavior). When not controlling for these peer group characteristics, any firm diagnosis of adaptation to peers is precluded.

Another example, with a different ad-hoc social network analysis strategy followed, is KIRKE

(2004), who assessed friendship network data for the whole adolescent population of a district division in Dublin. She analyzed patterns of substance use observed by means of retrospective self-report data, collected at one time point, about the history of friendship formation and substance use. In the main analysis of observed network autocorrelation, the data were first pre-processed with the

GRADAP software (SPRENGER &STOKMAN, 1989) in order to retain a small number of meaningful

sub-networks (“chains”). These data were further reduced to those dyads in which friendship existed and in which both friends used the substance at the time of measurement. Quite ingeniously, peer influence was diagnosed when friendship preceded this substance use, while selection was diagnosed when substance use preceded friendship formation. Appealing as it is, the quasi-longitudinal setup of the analysis introduces several additional methodological problems. First, its retrospective self-report data format is known to have considerable reliability problems (BERNARD ET AL. 1985).

Second, the reduction of the data to presently existing homophile dyads biases the analyses. On the one hand, the impact of former friendship ties, which disintegrated before the study, is not assessed. Because again, structural mechanisms like transitivity were not controlled for, this may have led to spurious diagnoses of selection while actually, a former friend exerted peer influence (as illustrated in the middle column of Figure 1). On the other hand, selection patterns are not fully assessed when not also studying those dyads in which friendship ties could have formed, but never did. A positive

(17)

feature is that several types of peer influence are distinguished in more detail – e.g., the role of the substance supplier, of friends and of other peers in initiating substance use. As a whole, KIRKE’s

study is very appealing in its explorative character, but cannot provide statistical conclusions about the strength of influence and selection effects.

More generally, the studies applying ad-hoc social network techniques can best be understood as exploratory studies, in which a host of relevant network concepts are related to the study of selection and influence patterns. The results are interesting, but remain as idiosyncratic as the choice of the algorithms used for pre-processing the network. The ‘key issues’ of incomplete observations and network dependence of observations remain out of scope, but a few alternative generating mechanisms (such as main effects of sociometric position on behavior) can, under this approach, be included in the analyses.

The use of the third generation of modeling approaches we distinguish here, structural equation models, perhaps gets closest to a statistical separation of selection and influence effects. An early example is KROHN ET AL.’s (1996) study on the role of peer groups on drug use. The method is

applied to the analysis of self-reported drug use and perceptions of peer drug use, where peers are not individually identified but summarized as ‘your group of friends’ in the questionnaire. The data were measured in five waves of a stratified sample panel in Rochester, New York, covering a two year period from grade 8 to 10. In a ‘cross-lagged’ model specification, they estimate direct effects of previous-wave ego drug use on current-wave perceived peer drug use, and previous-wave perceived peer drug use on current-wave ego drug use (the latter effect was complemented with an indirect effect via expected peer reactions to hypothetical ego drug use). In this setup, the estimated path coefficient from ego drug use to peer drug use is taken as an indicator for selection effects, while the coefficient for the path from peer drug use to ego drug use is taken as a measure of peer influence. Their study suffers from a series of shortcomings, some of which can be (and recently have been) remedied inside the structural equation approach, while others cannot. The ‘remediable’

(18)

part in the first place is the issue of peer identity, mentioned above in the discussion of the NEGOPY

-inspired use of summary measures for network structure. If the structural equation approach were coupled with the collection of (ego-centered) network data instead of summary reports on ego’s friends, and if a distinction were made between old friends and new friends (as in FISHER &

BAUMAN’s 1988 study), the interpretation of path coefficients in terms of selection and influence

could be correct at least conceptually. In fact, recent studies by SIMONS-MORTON &CHEN (2005)

and DE VRIES ET AL. (2006) improved on these flaws. The results on selection and influence

obtained by this method are more reliable than those obtained by the contingency table approach because now, both effects are assessed in the same analysis, controlling one for the occurrence of the other. The non-remediable issues of concern about applying structural equation models, which remain also in the recent applications, are related to the ‘key issues’ of incomplete observation, alternative generating mechanisms, and the interdependence of observations. Concerning the latter, structural equation models are known to be sensitive to violations of model assumptions, which include independence of observations. So, when applying these techniques to complete network data, it remains unclear to what degree the results can be trusted. Concerning the issue of incomplete observation of the actual trajectories of change, one needs to consider that estimated path coefficients directly link the observed variables to each other, with temporal order being used as additional information useful for causal interpretation of the results – the models thus are not capable of expressing trajectories of temporal development, and this way they cannot tackle the problems implied by incomplete observations in network panel data. The related issue of alternative mechanisms that operate in-between observations and in parallel to influence and selection processes also is difficult to handle in structural equation models. This primarily concerns the insufficient control for structural, network-endogenous effects on friendship formation (like transitive closure). The inclusion of structural properties on the actor level into the discrete-time structural equation modeling framework in principle is possible, but to our knowledge never has

(19)

been tried. Considering the other problematic aspects of such modeling, it also may not be worthwhile attempting it.

As the different studies illustrate, most analytical strategies follow a two-stage procedure for analyzing their data. In the first stage, the network data are collapsed into individual-level variables (e.g., local density, centrality, indicators of group position) or dyad-level variables (behavioral homogeneity), which in the second stage figure as variables in more conventional analyses (as dependent variables for assessing selection effects, and as independent variables for assessing effects of social influence). The shortcomings of such approaches are related to the ‘key issues’ listed above. The stage of collapsing networks into individual- or dyad-level data is arbitrary and does not do full justice to the structural aspect of evolving networks. The use of such collapsed variables artificially freezes their values at the last preceding observation, which negates their endogenous nature and inhibits the study of potentially important feedback mechanisms, while retaining the problems relating to incompleteness of observations. Due to the also retained problem of non-independence of actors and dyads, such a procedure moreover does not deliver data that would meet the requirements of the statistical procedures applied in the second-stage analyses. These secondary analyses accordingly must be viewed under the additional, strong and often unwarranted assumption of conditional independence, given the results of the data reduction procedure in the first step.

A

NEW APPROACH

:

MODELING THE CO

-

EVOLUTION OF NETWORKS AND BEHAVIOR

Let us first recapitulate some requirements that would have to be met by a more suitable model for analyzing network-behavior co-evolution. First, the model must be able to express the simultaneously operating effects of the network on the behavior of the actors, and of the behavior on the network. Second, and also implied by the first requirement, the model must account for the

(20)

unobserved changes that occur in between the observation moments. Third, the interdependence of actors in the network needs to be taken into account. A basic type of such interdependence is the dependence of all ties involving one given actor, which illustrates the inadequacy of analyses based on collections of hypothetically unrelated dyads. In order to keep track of these interdependencies, collapsing the network into a vector of summary characteristics per actor or per dyad is inadequate, but rather the evolution of the complete network-behavior data structure should be modeled as a complex whole.

This is achieved by the method proposed by Snijders et al. (2006), which is an extension of earlier modeling work by Snijders (2001, 2005) for networks without co-evolving behavioral dimensions. The process of network-behavioral co-evolution is modeled here as an emergent group level result of behavioral changes occurring for single actors, and network changes occurring for pairs of actors. The model assumes that changes may occur continuously between the observation moments. Handling the dynamic mutual dependence of the network ties and the individual behavior requires a process model that specifies these dependencies in a plausible way. Specifying this as an actor-based model makes intuitive sense in a lot of applications, as it is in line with extant theories of purposeful actors who act in the context of a social network. E.g., for the study of friendship networks, taking the network actors as the foci of modeling seems natural, as commonly invoked mechanisms of friendship formation (like homophily, reciprocity or transitive closure) are traditionally formulated and understood as forces operating at the actor level, within the context of the network; the same holds for mechanisms of behavioral change (like social influence). Modeling these changes in an actor-based framework implies that actors are assumed to “make” the change, by altering either their outgoing network ties or their behavior. The central model components will be the actors’ behavioral rules determining these changes.

(21)

Action rules and occasions to act

Some assumptions need to be made in order to retain a tractable model. While we focus on the analysis of data measured at discrete time points, we assume that in the underlying dynamic process, changes in network ties and behavior happen in continuous time, at stochastically determined moments. This allows us to tackle the ‘key issue’ of unobserved changes. Distinguishing between the network changes of an actor and his behavior changes, we rule out the possibility that changes in network ties and in actor behavior, or changes by two different actors, occur at presicely the same time point. An example for such forbidden simultaneous changes would be binding contracts of the type “when you start smoking, I’ll become your friend.” While such bargaining is not impossible, we will here model it as two subsequent changes, the connection of which cannot be enforced. Given the present application of the model to the evolution of substance use and friendship ties, such an assumption seems reasonable – in other applications, it could be relaxed. The compound change that is observed between two observations thus is interpreted as resulting from many small, unobserved changes that occurred between the observation moments. The assumption that at any given moment, not more than one tie variable or one behavior variable can change, enables us to keep the rules that govern actors’ behavior relatively simple, relieving us from the burden of explicitly modeling the totality of changes between two measurements all at once (an advantage put forward already by COLEMAN, 1964). Here, this assumption provides an elegant and simple way of

expressing the feedback processes inherent in the dynamic process, where the currently reached state is also the initial state for further developments, and where the probabilities for specific changes can depend, in perhaps complicated ways, on the entire current network-behavior configuration. There is a cost to this approach, however. Because we cannot know which precise trajectory of small changes happened from one observation to the next, we have to rely on data augmentation procedures and simulation-based inference for estimating our models. Spelling out a probability model for all possible trajectories between the observed states allows such inference, and it becomes

(22)

possible to infer effect sizes of various mechanisms operating in the process, and test hypotheses about them. So, the ‘key issue’ of alternative generating mechanisms can be addressed adequately. The first observation in a to-be-analyzed panel data set is not modeled but conditioned upon, i.e., the starting values of the network ties and the initial behavior are taken for granted. This implies that the evolution process is modeled without contamination by the contingencies leading to the initial state, and that no assumption of a dynamic equilibrium needs to be invoked. For changes of network as well as behavior, we now proceed to modeling the temporal occurrence of opportunities for the different types of changes, and the rules of change followed by the actors, once they face such an opportunity.

TABLE 1

SCHEMATIC OVERVIEW OF THE MODEL COMPONENTS

occurrence rule of change

network changes. . . network rate function network objective function

behavioral changes. . . behavioral rate function behavioral objective function

These model components, summarized in Table 1, will be sketched in a formal probabilistic operationalization in the subsequent paragraphs, using the application to substance use in high school as an illustration. Formally, the model is a continuous time MARKOV process, where the

totality of possible combinations of network ties and actor behavior figures as the state space. While the model in principle is equipped for analyzing the co-evolution of multiple dimensions of networks and behavior, let us – for ease of presentation – consider the case of one network variable

X and one dependent actor variable Z only (in the empirical section, we will give an example with two behavioral dimensions). In the following, first some notational conventions are introduced, and then the formal model is sketched. For a much more detailed mathematical account of our model, we refer the reader to SNIJDERS ET AL. (2006).

(23)

Notation and data requirements

For formally introducing our model, we make use of the following notation. The network is assumed to be based in a group of N actors – e.g., business firms active in the same period in the same industry, or a cohort of pupils at the same school. The network is denoted by x, where xij(t) stands for the value of the directed relationship between actors i and j at time point t. Examples for such relational variables are share ownership between business firms or friendship between the pupils of a year group. We further assume that x is dichotomous, i.e., xij=1 stands for presence of a tie and

xij=0 stands for absence. Next, let z denote the behavioral variable, with zi standing for the score of actor i at time point t. Examples here are the activity in a given market segment of a business firm, or the smoking behavior of pupils. We assume that behavioral dimensions are measured on a discrete, ordinal scale represented by integer values (including dichotomous scales). Finally, let v and w

denote actor-level and dyad-level exogenous covariates, respectively (for ease of presentation here assumed to be constant over time), with v( )_ik standing for the score of actor i on actor covariate k, and w( )_ijk standing for the dyadic covariate k measured for the pair (ij). Typical actor covariates are gender, age or education of an employee or a pupil, or number of employees of a business firm. Examples for dyadic covariates could be the geographical distance of business firms, an exogenously prescribed hierarchical relation between employees, or a classmate relation between pupils in a year group.

We consider the case of network-behavior panel data, where, instead of being observed over some continuous time interval, the network and behavioral data are collected for a finite set of time points only (say, t1<t2<...<tM). The number of waves M must be at least two. In the following, the data are indicated by lowercase letters (networks x(t1),...,x(tM), behavior z(t1),...,z(tM), etc.), while the stochastic model components (of which these data are assumed to be realizations) are indicated by uppercase letters (network model X(t) and behavioral model Z(t)). Note that the formal model itself will describe network evolution in continuous time, notwithstanding the fact that it is used for the analysis of observations at discrete time points. The formal model is obtained by spelling out the

(24)

submodels indicated in Table 1, and by integrating them into the overall model. Although the objective functions are the most important model component, for ease of presentation we first explain the model for occurrence of changes.

Modeling opportunities for change

The assumption was already mentioned that at any single moment, only one tie variable or one behavioral variable may change. More specifically, it is assumed that at stochastically determined moments, one actor gets the opportunity to change one of his/her outgoing tie variables, or to change his/her behavior. Such opportunities for change are called micro steps. It also is allowed that the actor does not make a change but leaves things as they are. The frequency by which actors have the opportunity to make a change is modeled by rate functions, one for each type of change. The main reason for having separate rate functions for the behavioral and the network changes is that practically always, one type of decision will indeed be made more frequently than the other. In information flow networks, one can expect that the actors’ individual properties (here: knowledge states) change much more quickly than their network ties. In group formation processes, where the behavioral dimensions may represent attitudes, the opposite may be true. In the application to substance use and friendship at high school, one would expect quicker changes in the network than in substance use, caused by (a) the addictive nature of substance use and (b) the students’ social orientation phase in adolescence.

Formally, the first observations of network ties x(t1) and behavior z(t1) serve as starting

values of the evolution process – i.e., they are not modeled themselves, but conditioned upon, and only the subsequent changes of network ties and behavior are modeled. The timing of the micro steps is modeled by the following stochastic process. For each actor i and for network and behavioral changes alike, we model the waiting time until actor i takes a micro step by exponentially distributed variables Tinet and Tibeh with parameters λλλλinet>0 and λλλλibeh>0, i.e., the waiting times are

(25)

distributed such that Pr(T t> =) exp(−λλλλt) for all t>0. The parameters of these distributions indicate the rate (or speed) at which the respective change is likely to occur; the expected waiting time is 1/λλλλ. Exponential waiting times are a standard assumption for this type of stochastic processes. Since actual waiting times between changes are not observed, more complicated modeling is unwarranted. It is further assumed that all waiting times are independent, given the current state of network and behavior. Properties of the exponential distribution imply that, starting from any given moment in time (e.g., the time when the preceding micro step occurred), the waiting time until occurrence of the next micro step of either kind by any actor is exponentially distributed with parameter =

∑

(

net + beh

)

. The probability that this is a network micro step taken by actor

total i i i

λ λ λ

λλ λλ λλ

λ λ λ

i is net/ , and the probability that it is a behavioral micro step taken by actor i is . i total λ λ λ λ λ λ λ λ beh/ i total λ λ λ λ λ λ λ λ

There may be considerable heterogeneity in the activity of actors – some actors may change their network ties, or their behavior, more quickly than others. Such activity differences may be caused by individual properties (e.g., by gender differences) or by existing network structure (e.g., by the number of ties an actor already has). We can directly incorporate such activity differences between actors by allowing actor covariates and the current network positions to exert an influence on the rate functions by letting the parameters λλλλ depend on actor attributes and network positions, see SNIJDERS (2001, 2005). In this paper, however, we limit the discussion to model specifications

where both types of rate functions are constant across actors and network positions, and depend only on the periods between panel waves.

Modeling mechanisms of change

What happens in a micro step is modeled as the outcome of a changes made by the actors. Micro steps can be of two kinds, corresponding to network changes or behavioral changes. For network changes, the micro step consists of the change of one tie variable by a given actor. Say, x is the current network and actor i has the opportunity to make a network change. The next network state x' then

(26)

must be either equal to x (if i chooses to keep the current situation) or deviate from x in exactly one element in row i (if the choice is to change the tie variable linking actor i to another actor). It is assumed that i chooses that value x' for which f_inet( , , )x x z′ +εεεε_inet( , , )x x z′ is maximal, where z is the current vector of behavior scores,

f

net is a deterministic objective function that can be interpreted as a measure of the actor’s satisfaction with the result of the network decision (“what the actor strives for, behaviorally”), and _εεεεnet is a random disturbance term representing unexplained change. By making some convenient standard assumptions about the distribution of the random component (MCFADDEN 1974, PUDNEY 1989), the choice probabilities can be expressed in

multinomial logit shape, as proportional to exp

(

f_inet( , , )x x z′

)

.

In a behavioral micro step, it is assumed that a given actor either increments or decrements his score on the behavioral variable by one unit, provided that this change does not step outside the range of this variable; it is also allowed that the score is not changed. The modeling is completely analogous to that of the network micro steps. If z is the current vector of behavior scores for all actors, and i is the actor allowed to change his behavior, let z' denote the vector resulting from an allowed micro step. It is assumed that i chooses that value z' for which f_ibeh( , , )x z z′ +εεεεbeh_i ( , , )x z z′ is maximal, where now fbeh is a (different!) deterministic objective function that again can be interpreted as the actor’s satisfaction with the result of the behavioral decision, and εεεεbeh again is a random disturbance term representing unexplained change. By making appropriate assumptions about the distribution of the random component, choice probabilities can also be expressed in multinomial logit shape.

The focus of modeling is on the deterministic parts, defined by the objective functions f. A high degree of flexibility is achieved by modeling these as linear combinations of effects that express the dependence of network and behavior on each other as well as on externally given variables. The term exogenous will be used for effects depending on such external variables, while endogenous effects depend on the current values of the dependent variables (networks and behavior). For network

(27)

changes, the objective function has the general shape f_inet( , , )x x z′ =

∑

_hββββ_hnet nets_h ( , , , )i x x z′ , where statistics snet_h stand for the effects, weighted by parameters net whose size is determined by fitting

h ββββ

the model to the data. Analogously, the objective function for behavioral changes has the form . The statistics, or effects, s must be defined on substantive

′ =

∑

′

beh_{( , , ) :} beh beh_{( , , , )}

i _h h h

f x z z ββββ s i x z z

grounds, and are arbitrary from the point of view of mathematical modeling, although in practice it is an advantage that they are not too complicated computationally. The most important network and behavior effects do not depend on the previous states x' and z' but only on the new states x and

z, and their weights ββββ can be interpreted as the degree to which the actors have a tendency to change into a direction where the network-behavioral state has high values for these effects. A selection of possible endogenous network effectssnet_h is given in Table 2, while a similar selection of effects sbeh_h for behavioral changes is given in Table 3. These components are based on indicators of structural positions in networks that are of fundamental importance in social network analysis (WASSERMAN &FAUST 1994). The second column of these tables contains the formulae of the

statistics that express the respective effects.

> Insert Tables 2 and 3 about here. <

In these tables, similarity of the behavioral scores of two actors i and j is defined as , where the range of behavioral scores is defined as the maximum

= − −

: 1

simij zi zj rangeZ

minus the minimum of observed values. By this definition, similarity is standardized to the unit interval, sim=0 indicating maximally dissimilar scores and sim=1 indicating identical (i.e., maximally similar) scores. The balance effect for network evolution contains an analogous measure of structural similarity strsim :_ij =

∑

_hb x− _ih −x_jh , where b is a parameter used for standardization (DAVIS 1963, MIZRUCHI 1993, LORRAIN &WHITE 1971). Furthermore, three effects of network

position are operationalized for the current study by the functions isolate, peripheral and group, which we define as follows. Isolate(i) is an individual positional indicator, expressing that actor i has at most one incoming tie. Group(ijh) is a triadic indicator expressing that the three actors together form a

(28)

, 1 if ( ) 5 group( ) : 0 otherwise ij ji jh hj hi ih x x x x x x ijh =_ + + + + + ≥  ,

(

)

peripheral( ;i jhk) := x_ij(1−x_ji)(1−x_hi)(1−x_ki) group(jhk) , and . 1 if 1 isolate( ) : 0 otherwise i x i =





+ ≤



between( ; ) :i jh =x xji ih(1−xjh)

triad in which at least five of the possible six ties are present. Peripheral(i; jhk) is a tetradic indicator, expressing that i is unilaterally attached to such a cohesive triad, but does not get a tie back. In formulae, we have

Tables 2 and 3 can naturally only give a glimpse of the complexity and richness of modeling that becomes possible within the proposed framework. For example, when estimating a model with network effects of peripheral position and their interaction with similarity effects (rows 6, 12 and 13 of Table 2), one can study the attraction of cohesive subgroups to outsiders, and differentiate it according to the outsiders’ and the subgroup members’ average behavior – while taking into account that these groups are in constant flux themselves. Or, when estimating a model with behavioral effects of different group positions and their interaction with similarity (rows 2-4 and 9-10 of Table 3), one can differentiate actors’ susceptibility to social influence according to their position in the network. The range of research questions that can be analyzed this way will give rise to many more effects that cannot be covered here.

Integration of model components

The total model for network-behavioral co-evolution consists of the first wave observations x(t1)

and z(t1) as initial state of the stochastic process, the rates of occurrence of network or behavioral

micro steps by specific actors as sketched above, and the choice probabilities for each possible micro step. As a whole, the model belongs to the class of continuous time Markov chains (e.g., NORRIS 1997).

(29)

to specify the so-called intensity matrix which is the mathematical characterization of the Markov chain process (see formula (13) in SNIJDERS ET AL., 2006).

The model is too complicated to allow for closed-form calculations of probabilities, expectations, etc. Direct ways of parameter estimation such as maximum likelihood are therefore not easily implemented. However, once tentative parameter values are assumed, the evolution model can be implemented as a stochastic simulation algorithm which can be used to generate network and behavioral data according to the postulated dynamic process. Then, parameter estimates can be determined as those values under which simulated and observed data resemble each other most closely. In statistical terminology, this is called the method of moments. The resemblance criteria are crucial for the estimation procedure, which is described in detail in SNJDERS ET AL. (2006).

Parameter estimation for this type of model has been broadly categorized as “third generation problems” in applied statistics (GOURIÉROUX &MONFORT 1996). The methods rely on strong

computational power (used for data simulation) and by now are used widely in advanced econometric and social science data analysis. Depending on the data set, it is possible that for some models, the algorithm does not converge in a satisfactory way. This happens for models that are complicated in the sense that there are too many parameters relative to the variation in the data, or when effects are highly correlated in the data. Non-convergence my be an indicator of model misspecification. In the large majority of cases, however, with data sets ranging between 40 and a few hundred actors, our experience is that convergence results are good.

A note on the interpretation of model parameters

As a consequence of the actor-driven nature of modeling, special attention needs to be paid to the interpretation of the estimated model parameters. The parameters of the rate functions can be related transparently to the speed of the evolution process. The parameters of the objective functions, however, relate in a more indirect way to the observed global dynamics of network and

(30)

behavior. From a perspective of agency, these functions can be regarded as satisfaction measures of the actors with their local network-behavioral neighborhood. At a slightly less construing level, they should be thought of descriptively, as the behavioral rules apparently followed by the actors. These objective functions, together with the current network-behavior configuration, imply a certain type of global dynamics as emergent property of the individual changes, in which network actors are mutually constraining each other and mutually offering opportunities to each other in a complicated feedback process. In order to understand how the estimated model parameters of the objective functions relate to the global dynamics observed, the Markov property of the process model needs to be invoked. This property implies that, once model parameters are identified, these imply a stationary (equilibrium) distribution of probabilities over the state space of all possible network-behavior configurations. Because in general, the configuration observed in the first wave of the panel will not be in the center of this equilibrium distribution, the model defines a non-stationary process of network-behavioral dynamics, starting at the first observation, and then ‘drifting’ towards those states that have a relatively high probability under the equilibrium distribution – for the mathematical principles, see e.g. NORRIS (1997). The dynamics as well as the

stationary distribution of all but the simplest cases of these models are too complex for analytic calculations, but they can be investigated by computer simulation.

For the interpretation of the objective functions’ parameters, let us assume that in a simple model specification, the function f_inet( , , )x x z′ = −2.0

∑

_jx′_ij +2.5

∑

_jx x′_{ij ji} +1.0

∑

_jx′_ijsim_ij was estimated as typical network objective function, while the behavioral objective function was estimated as f_ibeh( , , ) 0.5x z z′ = z′_i +0.8

∑

_jx_ijsim_ij′ , also quite typical. The primes indicate those elements in the formulae the value of which are under the control of actor i and may be changed in a micro step. The network objective function contains three effects: the outdegree effect (with parameter estimate ββββnet_out = −2.0), the reciprocity effect (with parameter estimate net = ), and

rec 2.5 ββββ

(31)

contains two more effects: the behavioral tendency (with parameter estimate ββββbeh_ten =0.5) and the similarity effect (with parameter estimate ββββ_simbeh =0.8). We now address the question of how these parameter values can be interpreted, starting with the network objective function, and keeping a perspective of agency. To repeat, this perspective obviously is not implied by the model, but it facilitates presentation and interpretation. The parameter attached to the outdegree effect in the network objective function has negative sign, which is quite usual, and which indicates that ties to arbitrarily chosen others are costly and tend to be avoided, unless other tie properties compensate for the costs. These other properties have to be expressed in the other effects included in the objective function. In our example, these are effects of reciprocity and homophily; in more elaborate models, these may also include other network-based, behavior-based or covariate-based sources for attractiveness of having a specific tie. Here, the value which an actor attaches to an arbitrary but reciprocated tie is calculated as the sum of the outdegree parameter value (reflecting the costs of arbitrary ties) plus the reciprocity parameter value (reflecting the benefit of having the tie reciprocated), which amounts to a net value of –2.0 + 2.5 = 0.5 for a reciprocated tie. Thus, ceteris paribus, actors have a propensity to reciprocate. The positive similarity effect in the network objective function, finally, indicates that actors derive additional benefit from ties to similar others. The occurrence of the similarity effect in the network objective function implies that this benefit refers to network changes, i.e., situations in which the creation or dissolution of network ties is considered. So, it is a selection (homophily) effect. The net value of a reciprocated tie to a similar other actor can now be calculated as the sum of the parameters for the outdegree effect, the reciprocity effect, and the network similarity effect – in the example, –2.0 + 2.5 + 1.0 = 1.5. The units in which all these values are expressed seem arbitrary, but they are implicitly defined by the variance of the random components in the objective functions.

The behavioral part of the model contains two parameters. The positive tendency parameter indicates a propensity to perform the behavior in question (e.g., to smoke or to drink), while the

(32)

positive similarity parameter indicates a propensity of the actors to behave in the same manner as their friends do. The occurrence of this similarity effect in the behavioral objective function implies that it refers to situations in which the modification of own behavior is considered, so it is an influence (contagion) effect. Assume that the behavior is a dichotomous variable, and consider an actor with five friends, three of which are performing the behavior in question (zj=1) and two of which are not (zj=0) . Then the net satisfaction to this actor of performing the behavior as well (zi=1) would be 0.5 (the general satisfaction derived from performing the behavior) plus three times 0.8 (the satisfaction derived from being similar to the three behavior-performing friends), which amounts to a total of 2.9 – while the satisfaction for not performing the behavior (zi=0) would be two times 0.8 (the satisfaction derived from being similar to the two behavior-non-performing friends), in sum a value of 1.6. In this hypothetical occasion for a change of behavior, transformation of these numbers into choice probabilities via the exponential link function yields a 79% chance for performing the behavior versus 21% for not performing it.

Because both selection and influence effects were included in the same model specification (though in different parts), the effects are controlled for each other, i.e., separated. For being able to assess the empirical evidence for either effect, one needs to take a closer look at the standard errors and test the hypotheses that the effect is nil. In the empirical part reported in the following section, we will address these issues in more detail.

After this discussion, it should be clear that the negative parameter attached to the outdegree effect does not mean that the number of network ties would diminish over time. It is true that the more negative the parameter for the outdegree effect, the smaller the average density in the equilibrium distribution of the Markov process. However, whether the number of ties in the whole network increases or decreases over time depends not only on the parameter values, but also on the position of the initial network-behavioral configuration with respect to the equilibrium distribution. If there are very few ties in the beginning, the model implies that the number of ties is going to

(33)

increase despite all costs involved – while if there are very many ties already, the model implis that the number is going to decrease. As a third possibility, if the starting network is a good representation of the model-implied equilibrium distribution, the model would imply no trend in the number of ties over time.

While we chose for a formulation of the example above in terms of agency, it cannot be pointed out too often that in many cases, it makes not much sense to interpret certain effects in terms of revealed preferences. For instance, the main effect of the classmate relation (a dyadic covariate) in a friendship network at school, or of geographical location (an actor covariate) in a network of firms, may not so much reflect the attractiveness of specific network partners, but ease of access to them. The same holds for network-endogenous effects like the well-known transitivity effect: when “friends of my friends become my friends”, this may be due to a higher chance of meeting them, not necessarily to a high preference for transitive closure. The objective function expresses the total effect of preferences, incentives, costs, constraints, and opportunities on the short-term changes made by the actors. Like other models of forward-looking rationality, it is an “as if” model (FRIEDMAN 1953): the observed network and behavioral dynamics can be explained as the

emergent result of interaction among actors who behave as if their preferences corresponded to the objective function estimated. Before rushing to conclusions about actual preference configurations, it is advisable to check the plausibility of this as if assumption for each effect.

T

HE CO

-

EVOLUTION OF FRIENDSHIP AND SUBSTANCE USE

In this section, the functionality of the techniques introduced above will be demonstrated. In an exemplary application, we investigate the interplay of friendship dynamics and the dynamics of substance use among adolescents, the substances studied being alcohol and tobacco. On both dimensions, network autocorrelation is a well-documented fact, and on both dimensions, influence as well as selection were advanced as explanatory mechanisms (NAPIER ET AL.1984,FISHER &