Component One: Network Dynamics During Transition

Transition

Transitions, such as the transition to college, are associated with a high degree of stress (Fisher and Hood, 1987). Acting as a buffer or direct effect, an individual’s supportive social network has been demonstrated to mitigate stress, facilitating adaptation to the transition (Cohen and Wills, 1985; Hays and Oxley, 1986). It is worthwhile to understand the collective dynamics of supportive social networks during transition. In component one, I explore the networks articulated by freshmen at UNC-Chapel Hill in the social network site Facebook over the course of the 2005 fall semester.

In my analysis, I explore the structural dynamics of transitional social networks. First, I identify factors of association in the transitional networks articulated in the social network site. In this analysis, I explore social, structural, and demographic factors that influence the creation of ties during transition, and I explore how the strength of these factors change over time. Second, I use econometric modeling to identify social network site profile factors that contribute significantly to the growth of the socio-technical network over the course of the semester. This analysis identifies content creation and sharing behaviors that are associated with the expansion of an individual’s socio-technical support network.

3.2.1 Data

The data employed in this analysis was sourced from the Facebook profiles of UNC freshmen, collected on a weekly interval over the course of the 2005 fall semester (8/30/05-12/27/05). The data collection was approved by Facebook (Appendix A), and the research study was declared exempt by the IRB (Appendix A). Similar data sets have been collected with the knowledge of Facebook and used in research (Hamatake, Lifson, and Navlakha, 2005; Lampe, Ellison, and Steinfield, 2007; Lewis, Kaufman, and Christakis, 2008; Mayer and Puller, 2008). In the computer and information sciences, the harvesting of data from webpages (i.e. “crawling”) is a common phenomenon (e.g. Brin and Page, 1998; Kao et al., 2004; Liu, Maes, and Davenport, 2006; Ting and Wu, 2009).

To protect the privacy of students, I collected data from a “connection-less” account. In doing so, I ensured that only students whose profiles were available to the entire UNC Facebook network were included in the collection. Compared to the work of Lewis, Kaufman, and Christakis (2008), no confidence boundaries were violated in the collection of this data (that is, no subjects were included because of their relationship

with the researcher). All data has been post-processed to derivative elements, limiting identifiability in accordance with 45 C.F.R. 164.514(a)(b).

3.2.2 Research question one: Factors of association

In the first component of the analysis, I identify factors associated with the establish- ment of connections in the networks data set. Using self-reported information collected from the profiles, I employ statistical modeling to identify factors that are associated with the formation of ties between actors. For example, we may expect that an individual is more likely to be connected (via Facebook friend connection) to someone that lives in the same dorm as the individual, as compared to a random person on campus. Exponential random graph modeling (hereafter, ERGM; also commonly known as p* modeling) allows an empirical hypothesis test regarding network structure (e.g. Robins et al., 2007; Wasserman and Pattison, 1996). As described by Goodreau (2007, p. 234), the ERGM specifies the probability of connection between actorsn as:

P r(Y=y) = 1

k exp{ΣAηAgA(y)} (3.1)

whereA is an index of potential modeling vectorsg(y), ηA represents the log-odds of a

tie, and exp{ΣAηAgA(y)}is constrained by k, the normalizing constant. Using Markov

simulation to compare the observed set of connections to an Erdos-Renyi random graph, ERGM produces pseudo-likelihood estimates (similar to maximum likelihood estimates) of the probability of a tie. In the analysis, the articulated network is compared to the simulated random graph. Based on this comparison, the model provides estimates of the probability of a tie, given a common associative factor. These estimates are interpreted in a similar fashion to the results of a logistic regression, and when exponentiated they can be directly interpreted as the odds of a factor influencing connection between two actors in the network.

While the theory behind ERGM dates back almost thirty years (Hunter et al., 2008; Wasserman and Pattison, 1996), ERGM has recently grown in popularity because of the increasing prevalence of network data sets, the availability of large-scale research computers, and the development of the Statnet software package (Handcock et al., 2008). Statnet is a suite of modules for the R statistical platform that provides advanced network analysis capabilities, including the modeling of exponential random graphs. In addition to using Statnet to model factors of association with ERGM, I am also able to compute general descriptive network models (Wasserman and Faust, 1994).

3.2.3 Research question two: Modeling network growth

In the second component of the analysis, I use statistical analysis to test hypotheses regarding factors associated with the growth of networks in the social network site during transition. The dependent variable employed in this analysis is the size of an individual’s local campus network in the social network site. This research directly builds on previous work by Lampe, Ellison, and Steinfield (2007). Their paper, “A Familiar Face(book): Profile Elements as Signals in an Online Social Network,” explores the relationship between profile activity and the acquisition of friends in Facebook. My analysis employs the framework specified in their paper, and extends the findings to a panel data set.

The data employed in the analysis of network growth takes the form of a dynamic panel, with sixteen observations of profile content at weekly intervals. Because the dependent variable is autoregressive (i.e. the observation at time t is influenced by the observation at time t-1), the data is not amenable to longitudinal ordinary least squares modeling. I use econometric techniques to produce estimates robust to panel- level autoregression and heteroskedasticity. The general equation for the panel model is:

Yij =α+β1xij1+. . .+βnxijn+βYnxijn+ij (3.2)

where α is the intercept, β2xij +. . .+βnxij represents a vector of covariates and pre-

dictors, andβYnxijn represents the lagged predictor.

On a college campus, behaviors are shaped by the local network, particularly the dorm network. Localized behavioral norms may cluster, demonstrating patterns of clustered variance when compared to the network on whole. In my empirical analysis of network formation, I observed that the residence hall plays a strong associative factor. Therefore, I cluster the data set by dorms. A technique for modeling panel data that takes into account patterns of clustered group-level variance is latent growth curve modeling, with variance fixed at the dorm level and a lagged dependent variable accounting for the autoregressor.

To explore the effects of dorms, I apply a multi-level model, in which individuals are grouped by residence, and the individual trajectories over time are analyzed as a latent growth curve (LGC). LGC is a variant form of a hierarchical linear model (Bryk and Raudenbush, 2002) that useful in panel modeling. In my analysis, the time variable is nested within the individual, creating a latent slope within individuals. The model I specify is “multi-level” because I have defined three levels. The first level is within residences, the second is individuals within residences, and the third is time within individuals. The general form of the equation is:

Yij =α+β1+β2xijw+. . .+βnxijn+ζ1j +ζ2jxij +ζ3jxij +ij, (3.3)

where α is the intercept, β2xij +. . .+βnxij represents the vector of covariates and

predictors, residence is modeled as a random effect (ζ3j), the individual is modeled

effect (ζ2j). Compared to standard longitudinal linear models, multi-level modeling

offers a number of attractive properties. Under optimal conditions (i.e. complete data), standard longitudinal models and latent growth curve models perform identically. By accounting for variance attributable to configuration, the precision of estimates is increased, assuming there is a meaningful grouping effect. Based on my analysis of network structure, I am able to provide evidence that grouping within dorms is meaningful, and model this effect.

In document Networked information behavior in life transition (Page 134-139)