Chapter 5. U.K data and variables
55 See Appendix 5.3, Table A5.3.4 for examples o f the relationship between X4 and the published profit figure.
5.3. Data Collation and Sample Selection
All data used in this study were extracted from Datastream except consensus earnings forecasts from I/B/E/S. Firstly, all possible U.K. industrial companies were collected from Datastream - the sample includes the dead firms in order to avoid survivorship bias. The company list initially consists of total 2,641 firms over the sample period from
63 For industry-year specific persistence parameters, I use FTSE Level 3 classification. FTSE Level 3 industry groups are Resources (RSR), Basic industries (BIN), General industries (GIN), Cyclical consumer goods (CGD), Non-cyclical consumer goods (NCG), Cyclical services (CSV), Non-cyclical services (NSV), Utilities (UTL), and Information technology (IMT).
Chapter 5. U.K. data and variables
1969 to 1998. The 30-year period is chosen to get as much firm-year data as possible. Another reason to set the starting year as 1969 is that missing values (especially earnings data) are more common prior to the late 1960s in Datastream. Before obtaining total firm-year observations, I deleted some observations that have missing earnings, book value, or stock price, which are core variables in this study.
Furthermore, I found some clear errors that Datastream made, even though most of them are trivial and random. These trivial and random errors may have little effect on my results because this study is conducted with a large number of firm-year observations. However, I corrected as many errors as I could. First of all, I corrected large critical errors that Datastream made while adjusting the published numbers to arrive at DS 210, even though the corrections are likely to have little effect on the final results. Some critical errors were found in DS 981 (adjustments to operating profit) that has a large effect on DS 210. Firstly, I collected 39 firm-years for which the absolute value of DS 981 is greater than 100% of the absolute value of DS 625 (earned for ordinary), and 32 firm-years that have large negative DS 981 less than minus £10 million. I then investigated 11 firm-years from each set. Among 22 firm-years, 2 (Lasmo, 31/12/94 and Cadbury Schweppes, 31/12/94) have completely wrong numbers in DS 981.64 But, fortunately, I can conclude that these errors do not occur systematically.
On the other hand, I corrected cases in which values were reported for DS 1083, DS
64 DS 981 o f Lasmo and Cadbury were -222,000 and -114,800, respectively. But I found that they should be respectively about -17,000 (Total exceptional profit for the year, note 3) and 23,000 (Exceptional item — Spain restructuring costs, note 2) from the Financial Statements.
1094 and DS 1097 in the pre-FRS 3 regime, for which these items should not be available. Especially, many dead companies had large numbers on DS 1094. Also, 7 firm-year observations had non-zero numbers on DS 193 in the post-FRS 3 regime, although after the introduction of FRS 3 in the U.K. GAAP, DS 193 has been effectively abolished (i.e., zero). Thus, I set these numbers to zero.
Finally, I found that some DS 210 and DS 182 that have zero values do not actually represent zero earnings. They represent missing values so that these cases were deleted as well.651 also found other data entry errors. Total assets (DS 392) should not be zero and current liabilities (DS 389) should not be negative. Also, there were some missing values in DS 376 (current assets), DS 375 (cash and equivalent), DS 389, DS 381 (current taxation), DS 136 (depreciation) and DS 187 (dividends). So I collated these data entry errors with the numbers in the financial statement and corrected them. Appendix 5.4 summarises the errors that Datastream had made.
After collating and correcting data, total 30,277 firm-year observations were obtained. Among these, 449 cases (1.5%) had negative book values so I deleted those cases, because some versions of my data analysis require book value as a scaling variable. Therefore, available data from Datastream during 1969 to 1998 after eliminating negative book values are 29,828 firm-years. This is one of the primary data sets that is used for estimation of RI persistence parameters.66 Because RI is defined in terms of
65 These cases were only found during 1969 to 1971.
66 Note that because RI is defined in terms o f lagged book value and AR(1) RI regression equation is used for the estimation o f RI parameters, total observations available for RI persistence parameters are 25,187. It means that the first and second observations o f each firm can not be used for the purpose o f RI parameter estimation.
Chapter 5. U.K. data and variables
lagged book value, total observations for the calculation of RI is 27,435, and are available for the periods 1970-1998. Then, this data set is merged with I/B/E/S analysts' earnings forecasts data. Because I/B/E/S provides analysts' earnings forecasts for U.K. firms only for 1990 onwards, total observations for the calculation of 01 are significantly reduced to 8,346. This is another primary data set that is used for the estimation of 01 persistence parameters.67 Note here that analysts' earnings forecasts made in respect of year t+1 are matched with RI realizations for year t for the purpose of estimating 01 at year t. Finally, 6,835 observations from 1991 to 1998 are used for the purpose of the estimation of firms' intrinsic values, because the 1989 01 parameter cannot be estimated and the 1990 01 parameter is estimated with a small number of observations. Table 5.5 shows details of U.K. sample construction.
Also Appendix 5.8 shows the distribution of firm-year observations. As shown in Panel A, total firm-year observations vary according to variables included in the analysis. It means that the number of firm-year observations vary depending on the number of lags in the dependent and/or independent variables. Thus, if pooled AR(1) and AR(4) analyses based upon the residual income variable are conducted, total firm-year observations that can be used are 25,187 and 19,753, respectively. Note also that many firms have a small number of observations. In Panel A (B), 878 (546) firms have data points less than or equal to 5 (3) that is about 37% of the total firms in each set. The average number of observations per firm is 12.5 (5.7) for the periods 1969-1998 (1989- 1998).
67 Note also that because AR(1) 01 regression equation is used for the estimation o f 01 parameters, total observations available for 01 persistence parameters are 6,875. It means that the first observations o f each firm can not be used for the purpose o f 01 parameter estimation.