Data collection process - Chapter Five: Data collection and Sample analysis

Chapter Five: Data collection and Sample analysis

5.1. Data collection process

This section starts with an explanation of the criteria used to identify US corporate bonds that are included in the final sample. It also provides a discussion of the databases used to extract the required data for the estimation of explanatory variables.

Corporate bond data are extracted from Thomson Financial Datastream database. We apply the following criteria for the identification of corporate bonds to be included in the sample:

65 | P a g e - Bonds are issued by corporations but not Treasury.

- Bonds are actively trading. Bonds that were actively traded but either defaulted or matured before the end of December 2007 are also included in the sample to avoid any sample bias. - Bonds are denominated in the US$ and are traded in New York stock exchange. This criterion helps limit the search only to the US bonds, although it does not guarantee that the sample includes some Eurobonds.

- Bonds do not have any embedded option or any other feature such as sinking fund, convertibility, warranty, etc.,

- Credit spread data are provided for at least 24 consecutive months.

- Bonds have more than one year to maturity. Amihud and Mendelson (1991) find that due to high transaction costs for Treasury bonds close to maturity, investors tend to lock them away in portfolios making them less tradable. Since corporate bonds are known to be less liquid than Treasury securities, it is expected that corporate credit spreads will not vary significantly especially in the last remaining year.

The initial search results in 2030 corporate bonds. The initial data collected for these bonds are month-end credit spreads for the period covering January 2001 to December 2007, the last ten S&P rating changes, bond’s issue and redemption dates, its coupon rate, issue size, industry code and issuing firm’s code. Based on the information about the borrower, the sample is again filtered to exclude bonds issued by various trusts and mutual funds. For each bond code, the issuer code is found. The issuer code is then used to collect issuer’s financial related data. Issuers’ sample includes 353 companies. The sample is further screened for bonds issued by companies whose equity data are not available from Datastream. Additionally, the sample is cleaned from bonds whose ratings are not consistent with their spreads history assuming that there have been input

66 | P a g e

errors in rating history data. Bonds with negative spreads over the considered period are also excluded from the sample assuming that these are genuine errors in data calculations or input. Based on a preliminary analysis of the distribution of observed monthly spreads we decide to exclude from the sample bonds that have extremely high spreads.

The final sample of credit spreads includes a panel data set with a total number of 30771 observations represented by 421 corporate bonds. Panel data are generally classified either as balanced or unbalanced data. While balanced panel sets consist of observations for each individual unit at any time period under consideration, unbalanced panels lack information for some individual units over time. The final sample includes 274 “alive” bonds and 147 bonds which either defaulted or matured at some point during the sampling period. Hence, this data set can be categorised as an unbalanced panel for analysis purposes. Since most of the models employed in this study are based on balanced panel data samples, the final sample is further limited to 261 bonds.

Information on daily equity prices of the rated firms included in the final sample is sourced from Thomson Financial Datastream using firms’ equity codes. These codes are further used to check whether a high proportion of bonds in the final sample are issued by a few large firms. Although we find cases of firms issuing more than 5 bonds, such cases are limited to a few firms which as reported in other studies are limited to bonds of upper end of investment grade classifications. Datastream database also provides a general industry coding for each firm. These codes help classify bonds into six main groups according to firm’s industry, namely into industrial, utility, transport, bank, insurance and other financial bonds.

Our discussion in previous chapters (chapter three and four) suggests that explanatory variables for credit spreads can be grouped into three main categories: interest rate sensitive or term- structure variables (risk-free rate and the slope of the yield curve), liquidity related variables

67 | P a g e

(issue size, maturity, coupon) and equity market related variables (firms’ equity returns, S&P 500, Fama and French factors, CBOE volatility index). Information for each of these variables is collected from various available databases. While information on liquidity variables and S&P 500 index are gathered from Thomson Financial Datastream, data for CBOE volatility index are

sourced from CBOE14s website. Monthly yields of Treasury bonds are collected from US Federal

Reserve’s webpage for a period ranging from January 2001 to December 200715. Time series for

Fama and French (1993) systematic risk factors (SML and HML) are collected directly from Prof.

Kenneth French’s website16.

In document Determinants of U.S. corporate credit spreads. (Page 70-73)