CHAPTER 5: METHODOLOGY
5.3 Sample Creation
Data from the Kinder, Lyndenberg, and Domini (KLD) index was used to construct
the initial sample of firms for this thesis. The KLD database is acknowledged as the most
commonly used (Deckop et al., 2006; Waldman et al., 2006b), most complete (Hillman &
Keim, 2001) and the best source for information about firm level social performance
(Sharfman, 1996; Waddock, 2003). KLD analysts evaluate corporations on more than 280
data points to arrive at a ratings system designed to provide a snapshot of the company’s
environmental, social and governance related performance every year providing ratings
for every firm along seven different categories including: community, corporate
81
Within these categories, KLD tracks a variety of CSR items that it considers either
areas of strength or concern and assigns these items a binary measure of either “1” or “0”
to demarcate either the presence or absence of the area of strength or concern. For
example, for the Community Relations category, KLD assigns a “1” or “0” to firm level
actions that demonstrate strengths in this area including charitable giving, innovative
giving, non-US charitable giving, support for housing, support for education, indigenous
peoples’ relations, volunteer programs and other. Areas of concern under the community
relations category include investment controversies, negative economic impact, problems
with indigenous peoples' relations, tax disputes and other. Technically, a firm can
therefore earn up to seven “strengths” in community relations as well as five “concerns”.5
The KLD data used in this dissertation cover the period from 1991-2009,
however, the number of firms rated each year has varied. Prior to 2001, KLD focused on
firms listed in the S&P 500 or the Domini 400 Social Index. However, since 2001, KLD
has added CSR ratings for all firms belonging to the Russell 1000 Index and since 2003,
all companies on the Russell 2000 Index such that the most recent KLD data include
social performance information for the 3,000 largest US firms by market capitalization.
Given the longitudinal nature of the research question in this study, the sample
construction proceeded in various steps. First, to construct the initial population, the
corporate social performance information for all firms measured by KLD was
consolidated for the entire 19 year period from 1991-2009. To ensure enough within-firm
5 KLD also provides scores on six ‘exclusionary’ screens which are comprised of concerns related to industry-based involvement in “controversial business issues” such as alcohol, gambling, firearms, military, nuclear power and tobacco. Although some researchers have used these screens as evidence of social issue participation (Hillman & Keim, 2001), they are not categories that are representative of the CSR choices facing firms in most industries and are thus often excluded from aggregated measures of CSP (e.g., Agle et al, 1999).
82
time variability (to model the growth trajectories), this data was then sorted such that only
firms assigned social ratings for fifteen or more years were kept in the sample. This
preliminary screening for longitudinal data yielded a sample of 365 firms and 6,647 firm
year observations. In the second step, the financial information for these 365 firms was
obtained from COMPUSTAT and merged with the corporate social responsibility data
from KLD. This dataset was then manually inspected to ensure data compatibility in
terms of company name, ticker and other key identifiers that may have changed over the
study period. Of the initial 30 unmatched firms, data on 15 companies were ultimately
found in COMPUSTAT, thus only reducing the sample by 15 firms or from 365 to 350
companies. As a last step, the CSR and financial data was merged with CEO
identification information obtained through COMPUSTAT’s Execucomp database. If the
CEO information was not available through Execucomp, missing data was obtained
through other sources as detailed in the measures section below. At this stage, only one
additional firm needed to be eliminated given incompatible data leaving a final data set of
349 firms.
The final sample is thus an unbalanced panel, where the number of firms
measured in each year varies from a low of 303 in 2009 to a high of 347 in 1995,
resulting in 6,334 firm year observations. Although the design was intended to capture
only firms with 15 or more years of data, in the end, observations per firm range from 12
years to 19 years with the average number of years of data per company at a robust 18
years. Within each firm, on average, the number of CEOs over the 19 years is 2.9 so that
the final data set includes information for 1,008 CEOs. The clustered longitudinal design
83
The 349 firms were then assigned an industry classification based on the 4 digit
SIC code as defined by COMPUSTAT. As done in previous research in the CSR area
(Surroca, Tribó & Waddock, 2010; Waddock & Graves, 1997), the industry
classifications were then reduced to 12 primary sectors using their 2 digit SIC. Although
alternate methodologies exist for industry classification (e.g. 5 sectors, 1-digit NAICS
code), the final industry classification used herein was selected in order to best replicate
the most cited study in this area (Waddock & Graves, 1997). Furthermore, this
methodology continues to be used in recent studies (Surroca et al., 2010). The final
breakdown of the number of firms in each industry classification is detailed in Table 5.1
below.
Industry 1
Firm 1 Firm 2… Firm n
Year 1 Y2 … Year n Year 1 Y2… Year n Year 1 Y2… Year n Clusters (Level 3) Units of Analysis (Level 2) Time Points (Level 1)
84
Industry
Table 5.1: Industry Classification
SIC # of Firms # of Firm Years
# % # % Mining/Construction 100-1999 16 4.6% 288 4.5% Food/Textiles/Apparel 2000-2399 24 6.9% 444 7.0% Forrest/Paper/Publishing 2400-2799 32 9.2% 580 9.2% Chemicals/Pharma 2800-2899 38 10.9% 696 11.0% Refining/Rubber/Plastic 2900-3199 7 2.0% 126 2.0% Steel/Heavy Manufacturing 3200-3599 41 11.7% 759 12.0% Computers/Auto/Aero 3600-3999 62 17.8% 1128 17.8% Transportation 4000-4799 11 3.2% 202 3.2% Telephone/Utilities 4800-4999 26 7.4% 474 7.5% Wholesale/Retail 5000-5999 38 10.9% 675 10.7% Financial 6000-6799 33 9.5% 580 9.2% Hotel/Entertainment/Services 6800-9799 21 6.0% 382 6.0% Totals 349 100.0% 6334 100.0%
Various archival sources were used to gather data related to the firm and CEO
independent variables. These, as well as the measures for the dependent variables, are
detailed in the following section.