• No results found

CHAPTER 5: METHODOLOGY

5.3 Sample Creation

Data from the Kinder, Lyndenberg, and Domini (KLD) index was used to construct

the initial sample of firms for this thesis. The KLD database is acknowledged as the most

commonly used (Deckop et al., 2006; Waldman et al., 2006b), most complete (Hillman &

Keim, 2001) and the best source for information about firm level social performance

(Sharfman, 1996; Waddock, 2003). KLD analysts evaluate corporations on more than 280

data points to arrive at a ratings system designed to provide a snapshot of the company’s

environmental, social and governance related performance every year providing ratings

for every firm along seven different categories including: community, corporate

81

Within these categories, KLD tracks a variety of CSR items that it considers either

areas of strength or concern and assigns these items a binary measure of either “1” or “0”

to demarcate either the presence or absence of the area of strength or concern. For

example, for the Community Relations category, KLD assigns a “1” or “0” to firm level

actions that demonstrate strengths in this area including charitable giving, innovative

giving, non-US charitable giving, support for housing, support for education, indigenous

peoples’ relations, volunteer programs and other. Areas of concern under the community

relations category include investment controversies, negative economic impact, problems

with indigenous peoples' relations, tax disputes and other. Technically, a firm can

therefore earn up to seven “strengths” in community relations as well as five “concerns”.5

The KLD data used in this dissertation cover the period from 1991-2009,

however, the number of firms rated each year has varied. Prior to 2001, KLD focused on

firms listed in the S&P 500 or the Domini 400 Social Index. However, since 2001, KLD

has added CSR ratings for all firms belonging to the Russell 1000 Index and since 2003,

all companies on the Russell 2000 Index such that the most recent KLD data include

social performance information for the 3,000 largest US firms by market capitalization.

Given the longitudinal nature of the research question in this study, the sample

construction proceeded in various steps. First, to construct the initial population, the

corporate social performance information for all firms measured by KLD was

consolidated for the entire 19 year period from 1991-2009. To ensure enough within-firm

5 KLD also provides scores on six ‘exclusionary’ screens which are comprised of concerns related to industry-based involvement in “controversial business issues” such as alcohol, gambling, firearms, military, nuclear power and tobacco. Although some researchers have used these screens as evidence of social issue participation (Hillman & Keim, 2001), they are not categories that are representative of the CSR choices facing firms in most industries and are thus often excluded from aggregated measures of CSP (e.g., Agle et al, 1999).

82

time variability (to model the growth trajectories), this data was then sorted such that only

firms assigned social ratings for fifteen or more years were kept in the sample. This

preliminary screening for longitudinal data yielded a sample of 365 firms and 6,647 firm

year observations. In the second step, the financial information for these 365 firms was

obtained from COMPUSTAT and merged with the corporate social responsibility data

from KLD. This dataset was then manually inspected to ensure data compatibility in

terms of company name, ticker and other key identifiers that may have changed over the

study period. Of the initial 30 unmatched firms, data on 15 companies were ultimately

found in COMPUSTAT, thus only reducing the sample by 15 firms or from 365 to 350

companies. As a last step, the CSR and financial data was merged with CEO

identification information obtained through COMPUSTAT’s Execucomp database. If the

CEO information was not available through Execucomp, missing data was obtained

through other sources as detailed in the measures section below. At this stage, only one

additional firm needed to be eliminated given incompatible data leaving a final data set of

349 firms.

The final sample is thus an unbalanced panel, where the number of firms

measured in each year varies from a low of 303 in 2009 to a high of 347 in 1995,

resulting in 6,334 firm year observations. Although the design was intended to capture

only firms with 15 or more years of data, in the end, observations per firm range from 12

years to 19 years with the average number of years of data per company at a robust 18

years. Within each firm, on average, the number of CEOs over the 19 years is 2.9 so that

the final data set includes information for 1,008 CEOs. The clustered longitudinal design

83

The 349 firms were then assigned an industry classification based on the 4 digit

SIC code as defined by COMPUSTAT. As done in previous research in the CSR area

(Surroca, Tribó & Waddock, 2010; Waddock & Graves, 1997), the industry

classifications were then reduced to 12 primary sectors using their 2 digit SIC. Although

alternate methodologies exist for industry classification (e.g. 5 sectors, 1-digit NAICS

code), the final industry classification used herein was selected in order to best replicate

the most cited study in this area (Waddock & Graves, 1997). Furthermore, this

methodology continues to be used in recent studies (Surroca et al., 2010). The final

breakdown of the number of firms in each industry classification is detailed in Table 5.1

below.

Industry 1

Firm 1 Firm 2… Firm n

Year 1 Y2 … Year n Year 1 Y2… Year n Year 1 Y2… Year n Clusters (Level 3) Units of Analysis (Level 2) Time Points (Level 1)

84

Industry

Table 5.1: Industry Classification

SIC # of Firms # of Firm Years

# % # % Mining/Construction 100-1999 16 4.6% 288 4.5% Food/Textiles/Apparel 2000-2399 24 6.9% 444 7.0% Forrest/Paper/Publishing 2400-2799 32 9.2% 580 9.2% Chemicals/Pharma 2800-2899 38 10.9% 696 11.0% Refining/Rubber/Plastic 2900-3199 7 2.0% 126 2.0% Steel/Heavy Manufacturing 3200-3599 41 11.7% 759 12.0% Computers/Auto/Aero 3600-3999 62 17.8% 1128 17.8% Transportation 4000-4799 11 3.2% 202 3.2% Telephone/Utilities 4800-4999 26 7.4% 474 7.5% Wholesale/Retail 5000-5999 38 10.9% 675 10.7% Financial 6000-6799 33 9.5% 580 9.2% Hotel/Entertainment/Services 6800-9799 21 6.0% 382 6.0% Totals 349 100.0% 6334 100.0%

Various archival sources were used to gather data related to the firm and CEO

independent variables. These, as well as the measures for the dependent variables, are

detailed in the following section.