Coding the Websites

4.4 Discussion

6.2.1 Coding the Websites

During the academic year 2011–2012, graduate student research assistants coded the front page and subpages of 510 state legislator websites for content on policy, service, allocation, and descriptive representation. This produced several measures of legislators’ emphasis on the various dimensions of representation. The research assistants also saved a “snapshot” of each site for replication of my analysis and/or future use by other researchers. In this analysis I focus on two specific types of dependent variable: (1) whether the front page contains at least one content item related to each of the four dimensions and (2) the number of content items on each of the dimensions present on the entire site.4

With this measurement strategy, I define a stronger emphasis on a particular dimension of representation as an increase in the probability of observing a front page item and/or an increase in the total number of items on that dimension. For example, I expect that a legislator who places relatively more emphasis on allocation has a front page item on allocation and/or a relatively large total number of items devoted to allocation. I use both types of dependent variable—front page items and total number of items—to account for the fact that some legislators tend to place most of their content on the front page of the site while others distribute it across several pages.

Coding Rules

Research assistants followed a set of coding rules that were presented in an online entry form. Each coder was given a link to the form and a set of legislator website URLs (divided randomly), then completed the form one time for each website. Intercoder reliability scores indicate consistent measurement across coders. See the appendix for a full report of these intercoder reliability scores, exact wording of the coding rules, and example websites.

The coding rules defined policy content items as any mention of a policy issue, including the presence of an issue in a list of issues, a definition of an issue, text describing the

4_{Coders visited the front page and one level of subpages (e.g., pages that were “one click” away from the}

legislator’s views, recent bill introductions, and/or votes, or a district poll on one or more issues. Service items were coded as any mention of a unique type of assistance the legislator provides through the website or offers to provide after further contact. These typically included useful links about government resources, contact information about government agencies or other public officials, district maps, voter information, information about schol- arships or other educational programs, and offers to assist with specific requests. Allocation items were coded as any mention of a unique funding grant or project secured by the legislator for the district. Finally, descriptive items were coded as any picture or text identifying the legislator with a gender or racial group. This included both explicit cues about groups, such as a Republican women’s caucus or NAACP chapter, or implicit references, such as a picture of the legislator with several black ministers in the district.5

6.2.2 Estimation Strategies

I employ logistic regression to model whether the front page contains an item on each dimension and quasi-Poisson regression to model a count of the site’s items on each dimension.6 _{The lone exception to this is the model of the number of descriptive items. In that}

case the count ranges only from 0 to 4 items, and so I estimate a logit model of whether the site has one or more descriptive items.7 _{I use multilevel models (MLM) in all cases with a}

state-level random intercept to account for unobserved state-level heterogeneity.8

My main independent variables of interest are the institutional, district, and individual factors described above. However, I also control for a number of other factors that could

5_{Although the coding rules are sufficiently general to allow for non-minority gender or racial groups (e.g.,}

men’s groups or white organizations), empirically only minority groups appeared on the websites, even in cases where the legislator was a white male.

6_{The count dependent variables in this analysis all show strong evidence of overdispersion (}_{p <}_0.05),

which makes estimation of a dispersion parameter in the quasi-Poisson model (rather than assuming it equals one in a standard Poisson) important for obtaining valid estimates of uncertainty. Quasi-Poisson regression is very similar to negative binomial regression, with the only difference being the function used to model the variance (see Ver Hoef and Boveng 2007).

7_{See the appendix for summary statistics on all of the dependent variables.}

8_{Logit and quasi-Poisson estimates come from the}

Rpackageslme4andHGLMMM, respectively (RDe-

influence website content, including legislator-specific traits and features of each website. Specifically, I control for legislator gender, race, party, and the natural log of the money the legislator raised in his or her last campaign. I use this variable as a proxy for the legislator’s level of engagement in maintaining his or her website. I expect that legislators who raise more money are, on average, more engaged with using their websites because they have more resources for communicating with the district.9 Additionally, I expect that the dimensions of representation are likely to correlate with one another. Thus, in each

model I control for attention to the other three dimensions.10 _{Finally, I control for the}

natural log of the number of pages on the site. This measure of the site size reflects the fact that there are more chances for a given representation item to appear on a website as the total amount of content increases.

Out-of-Sample Predictive Model Fit

One problem that is important to avoid when evaluating statistical models is overfitting. Accordingly, I assess the performance of my models through out-of-sample prediction via leave-one-out cross-validation (CV). This method involves iteratively removing each observation from the data set, estimating the model on the remaining observations, then comput- ing a predicted value for the omitted observation using the estimates from the reduced-data model. This renders the data used to fit the model (i.e., “training data”) independent of the testing data, which avoids overfitting the model to the sample (see Smyth 2000).

More specifically, I employ CV to compute predicted probabilities for each observation from the logit models and predicted counts from the quasi-Poisson models. I then compare those model predictions to the actual values of each dependent variable. A perfect model’s

9_{Results are unchanged by the inclusion or exclusion of this variable. Results are also unchanged when a}

control for whether the website is a campaign or legislative site is included.

10_{For example, in the logit model of front page policy items, I control for whether the front page has}

a service item, allocation item, and descriptive item. Another strategy would be to estimate these models simultaneously and allow the errors to correlate. In my case, this would require a four-equation estimator, which is not available in standard statistical software (to my knowledge). As a check on this issue, I computed the residual correlations for my models and found them to be very near zero (see the appendix). Furthermore, results from bivariate probit models are substantively similar to what I report here.

predictions would exactly reproduce the dependent variable. This level of precision is not realistic, so I look for predictions that closely approximate the dependent variable as evidence of good fit. Note that using an out-of-sample predictive fit measure provides a difficult test for each model. Generating predictions for observations using the model that included those observations would produce a positive (i.e., optimistic) bias in model fit, because some of the random noise in the sample at hand would affect the parameter estimates used to make those predictions (Akaike 1974; Smyth 2000).

6.3 Results

The research design described above produces two regression models for each of the four dimensions of representation (front page and number of items models). I first present the estimation results graphically, then assess several quantities of interest generated from the models.

In document 4870.pdf (Page 92-95)