• No results found

Data Collection and Sample Selection

Chapter 3 Study 1: Corporate Strategy and the Analyst Coverage

3.4. Data Collection and Sample Selection

The data for analyst attributes are obtained from the Institutional Brokers’ Estimate System (hereafter ‘I/B/E/S’), while the data for financial reporting variables and stock price are obtained from Compustat and CRSP respectively. Regression models for all hypotheses are first estimated on a firm-year sample in which the dependent variable represents the number of analysts following a firm in a given year (the ‘aggregate coverage sample’). The data collection and sample derivation of this sample are described in Section 3.4.1. I also report tests of the probability of an individual analyst covering a firm conditional on its strategy and on the analyst following at least one firm in the firm’s industry in the same year (the ‘individual coverage sample’), for which observations

66

represent analyst-firm-year combinations. The data collection and sample derivation for this sample are provided in Section 3.4.2.

3.4.1. The Aggregate Coverage Sample

The derivation of my aggregate coverage sample is described in Table 3.1. The data for my aggregate coverage sample is drawn from U.S firms in the Compustat Annual Fundamental file between 1980 and 2015. Consistent with Bentley et al. (2013), the data collection period begins in year 1980 because most of the data required to calculate the strategy measures are consistently recorded from 1980s. However, the sample used for hypothesis testing starts in 1987, because of the use of 5-year moving window used in the strategy measures and the availability of cash flow statement data from the late 1980s. I follow the approach of Bentley et al. (2013) to construct the sample. The initial sample comprises 379,098 firm-years with non-negative sales and assets. To calculate the strategy measure, I use the historical SIC codes to identify the industry that firms are operating in. However, about 36% of firm-years are missing historical SICs. Instead of excluding these firms like Bentley et al. (2013) and Bentley-Goode et al. (2017), I replace the missing historical SIC codes with firm’s current SIC code to preserve observations. However, because almost half of the missing historical SIC codes (about 45%) occur during the period of 1980 and 1990, this modification causes a relatively small difference between my final sample those reported in Bentley et al. (2013) and Bentley-Goode et al. (2017). I then exclude the utilities and financial industries (with historical SIC 4900-4999 and 6000-6999) to avoid the impact of regulation on firm’s strategic choices. Consistent with Bentley et al. (2013, p. 20), I require at least three years of non-missing observations within the five-year window and six years of consecutive financial years in Compustat database for each of the strategic component used to compute the strategy scores to calculate a five-year rolling average. However, after following this sample selection

67

process, the sample attrition is about 10% greater than in Bentley et al. (2013). The difference is due to the treatment of missing R&D values.16 Bentley et al. (2013) and Bentley-Goode (2017) replace missing R&D expenditures with zero, as is common in prior literature. However, Koh and Reeb (2015) find that about 10.5% of the missing R&D firms possess patent records and this percentage is 14 times greater than the firms actually record a zero R&D expenditure in Compustat. After controlling for missing data for firms without the five-year rolling averages of all six component of the strategy scores, my sample for the strategy measure is 70,402 from 1987 to 2015. Even though the earliest strategy score available calculated is for 1984, 1987 is the earliest year for which data for the cash flow volatility control variable is available, and the first year for which there is five years of prior cash flow data is 1992. Thus, after the calculation of five-year rolling average, the final for observations that have a strategy score is 60,177 from 1992 to 2015. This is because I need to first five years from 1987 to 1991 to calculate rolling average for the STRATEGY score. I also use this sample to construct the industry strategic orientation because the sample selection process for industry strategic orientation is identical to the one for the firm’s strategic score.

Table 3.1 Sample Selection for the Aggregate Coverage Sample

Panel A: Sample Selection from COMPUSTAT for Aggregate Coverage

Descriptions Observations

Compustat data for years 1980 and 2015 (negative sales and assets) 378,779 Less Utilties and Financial Industries (historical SIC 4900-99 amd

6000-999) (106,826)

Less required 5 year rolling average data for STRATEGY measure (196,587) Less missing value for all 6 STRATGY component variables per

company-year (5,063)

Less years control variables first to have data from 1992 (10,126)

Total Observations for STRATEGY composite score dataset

(1992-2015) 60,177

16 I have corresponded with Kathleen Bentley-Goode (nee Bentley) to confirm the treatment of R&D in

68

Panel B: Sample Selection from IBES

Firms with forecasts in the 90-day window prior to earnings

announcement (1991-2015) 93,295

Panel C: Matching of Data from IBES to COMPUSTAT

Number of observations from Compustat (per Panel A) 60,177 Less firms with no analyst following during the 90-day window

prior to earnings announcement (25,187)

Total Observations for STRATEGY composite score with

analyst coverage 34,990

Less: Missing values from control variables (7,758)

Final Sample for the aggregate analyst coverage (firm-years) 27,232

The sample of observations of analyst coverage is obtained from the I/B/E/S Forecast Detail History file. I calculate the number of analysts following a firm within the window commencing 90 days prior to the annual earnings announcement and terminating at the announcement date. There are 93,295 firm-years with analyst coverage during that window between 1992 and 2015.17 Matching the analyst coverage and strategy samples

further reduces the sample available for testing because only relatively large and mature firms receivable analyst coverage and have enough data to compute the strategy score. Finally, after considering control variables, the final sample for the aggregate analyst coverage sample is 27,232 firm-year observations representing 4,088 unique firms.18

17 To maintain consistency with Bentley-Goode et al. (2017), I exclude from my sample firms that are not

followed by any analysts. My results are not substantively affected if I include these observations in the sample as zero-coverage firms.

18 There are other causes of minor differences in my sampling approach relative to Bentley-Good et al.

(2016). Unlike Bentley-Goode et al. (2017), I do not need restrict my sample to firms covered by at least three analysts in order to calculate forecast dispersion, nor do I need to match the sample with data from First Call. I also include additional controls for analyst incentives (e.g. VOLUME) to capture their potential impact on analyst coverage decision.

69

3.4.2.

The Individual Coverage Sample

I present the sample selection of the individual coverage sample in Table 3.2. When constructing the individual coverage sample, the derivation of the strategy score sample is identical to that used in the aggregate coverage sample. The total sample for firms with composite strategy scores is 60,117 (Pane A, Table 3.2).

Table 3.2 Sample Selection for the Individual Coverage Sample

Panel B of Table 3.2 describes the procedures used in constructing the sample that presents individual analysts’ choice to cover a firm. All data are obtained from the I/B/E/S Forecast Detail History File. I first restrict this sample to analyst-firm-years where the analyst follows at least one firm in the industry-year of which the firm-year is a member (1,114,567 analyst-firm-years). For each firm, I then expand and fill the panel in null form, such that analysts who follow at least one other firm in the same industry-year, but do not

Panel A: Sample Selection from COMPUSTAT for Individual Coverage

Descriptions Observations

Total Observations for STRATEGY composite score dataset (1992-

2015) 60,177

Panel B: Sample Selection from IBES

Numbers of individual analysts following a firm per industry-year 1,114,567

Generate sample representing whether an analyst covers a firm per

industry-year (no. of analysts-firm-year) 51,817,229

Panel C: Matching of Data from COMPUSTAT to IBES

Generate sample for representing whether an analyst covers a firm per

industry-year (per Panel B) 51,817,229 Less missing STRATEGY composite score dataset (28,061,763)

Total Observations for STRATEGY composite score with at least one

analyst choose to cover the firm 23,755,466

Less: Missing values from control variables (6,936,844)

70

follow this particular firm, are represented by a null observation for the dependent variable. This results in an intermediate sample of 51,817,229 analyst-firm-year observations. After merging with the strategy score and control variable data, the final sample available for hypothesis testing using ‘the individual coverage sample’ is 16,818,622 observations.