The previous section introduces the leverage effect as a stylized fact of equity market returns. So far, the support that we provided is based on the inspection of financial data plots only. Using pandas, we can also base such analysis on a more formal, statistical
ground. The simplest approach is to use (linear) ordinary least-squares regression (OLS). In what follows, the analysis uses two different data sets available on the Web:
EURO STOXX 50
Historical daily closing values of the EURO STOXX 50 index, composed of European blue-chip stocks
VSTOXX
Historical daily closing data for the VSTOXX volatility index, calculated on the basis of volatilities implied by options on the EURO STOXX 50 index
It is noteworthy that we now (indirectly) use implied volatilities, which relate to
expectations with regard to the future volatility development, while the previous DAX analysis used historical volatility measures. For details, see the “VSTOXX Advanced Services” tutorial pages provided by Eurex.
We begin with a few imports: In [62]: import pandas as pd
from urllib import urlretrieve
For the analysis, we retrieve files from the Web and save them in a folder called data. If
there is no such folder already, you might want to create one first via mkdir data. We
proceed by retrieving the most current available information with regard to both indices: In [63]: es_url = ‘http://www.stoxx.com/download/historical_values/hbrbcpe.txt’
vs_url = ‘http://www.stoxx.com/download/historical_values/h_vstoxx.txt’
urlretrieve(es_url, ‘./data/es.txt’) urlretrieve(vs_url, ‘./data/vs.txt’) !ls -o ./data/*.txt
# Windows: use dir
Out[63]: -rw––- 1 yhilpisch 0 Sep 28 11:14 ./data/es50.txt -rw––- 1 yhilpisch 641180 Sep 28 11:14 ./data/es.txt -rw––- 1 yhilpisch 330564 Sep 28 11:14 ./data/vs.txt
Reading the EURO STOXX 50 data directly with pandas is not the best route in this case.
A little data cleaning beforehand will give a better data structure for the import. Two issues have to be addressed, relating to the header and the structure:
There are a couple of additional header lines that we do not need for the import. From December 27, 2001 onwards, the data set “suddenly” has an additional semicolon at the end of each data row.
The following code reads the whole data set and removes all blanks:[26] In [64]: lines = open(‘./data/es.txt’, ‘r’).readlines()
lines = [line.replace(‘ ‘, ”) for line in lines]
With regard to the header, we can inspect it easily by printing the first couple of lines of the downloaded data set:
In [65]: lines[:6] Out[65]: [‘PriceIndices-EUROCurrency\n’, ‘Date;Blue-Chip;Blue-Chip;Broad;Broad;ExUK;ExEuroZone;Blue-Chip;Broad\ n’, ‘;Europe;Euro-Zone;Europe;Euro-Zone;;;Nordic;Nordic\n’, ‘;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF\n’, ‘31.12.1986;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n’, ‘01.01.1987;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n’]
The above-mentioned format change can be seen between lines 3,883 and 3,990 of the file. From December 27, there suddenly appears an additional semicolon at the end of each data row:
In [66]: for line in lines[3883:3890]: print line[41:], Out[66]: 317.10;267.23;5268.36;363.19 322.55;272.18;5360.52;370.94 322.69;272.95;5360.52;370.94 327.57;277.68;5479.59;378.69; 329.94;278.87;5585.35;386.99; 326.77;272.38;5522.25;380.09; 332.62;277.08;5722.57;396.12;
To make the data set easier to import, we do the following: 1. Generate a new text file.
2. Delete unneeded header lines.
3. Write an appropriate new header line to the new file.
4. Add a helper column, DEL (to catch the trailing semicolons).
5. Write all data rows to the new file.
With these adjustments, the data set can be imported and the helper column deleted after the import. But first, the cleaning code:
In [67]: new_file = open(‘./data/es50.txt’, ‘w’) # opens a new file
new_file.writelines(‘date’ + lines[3][:-1] + ‘;DEL’ + lines[3][-1])
# writes the corrected third line of the original file
# as first line of new file
new_file.writelines(lines[4:])
# writes the remaining lines of the orignial file
new_file.close()
Let us see how the new header looks:
In [68]: new_lines = open(‘./data/es50.txt’, ‘r’).readlines() new_lines[:5] Out[68]: [‘date;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF;DEL\n’, ‘31.12.1986;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n’, ‘01.01.1987;775.00;900.82;82.76;98.58;98.06;69.06;645.26;65.56\n’, ‘02.01.1987;770.89;891.78;82.57;97.80;97.43;69.37;647.62;65.81\n’, ‘05.01.1987;771.89;898.33;82.82;98.60;98.19;69.16;649.94;65.82\n’]
It looks appropriate for the import with the read_csv function of pandas, so we continue:
In [69]: es = pd.read_csv(‘./data/es50.txt’, index_col=0,
parse_dates=True, sep=‘;’, dayfirst=True)
In [70]: np.round(es.tail())
Out[70]: SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF DEL date
2014-09-22 3096 3257 347 326 403 357 9703 565 NaN 2014-09-23 3058 3206 342 321 398 353 9602 558 NaN 2014-09-24 3086 3244 344 323 401 355 9629 560 NaN 2014-09-25 3059 3202 341 320 397 353 9538 556 NaN
The helper column has fulfilled its purpose and can now be deleted: In [71]: del es[‘DEL’]
es.info()
Out[71]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 7153 entries, 1986-12-31 00:00:00 to 2014-09-26 00:00:00 Data columns (total 8 columns):
SX5P 7153 non-null float64 SX5E 7153 non-null float64 SXXP 7153 non-null float64 SXXE 7153 non-null float64 SXXF 7153 non-null float64 SXXA 7153 non-null float64 DK5F 7153 non-null float64 DKXF 7153 non-null float64 dtypes: float64(8)
Equipped with the knowledge about the structure of the EURO STOXX 50 data set, we can also use the advanced capabilities of the read_csv function to make the import more
compact and efficient:
In [72]: cols = [‘SX5P’, ‘SX5E’, ‘SXXP’, ‘SXXE’, ‘SXXF’, ‘SXXA’, ‘DK5F’, ‘DKXF’]
es = pd.read_csv(es_url, index_col=0, parse_dates=True, sep=‘;’, dayfirst=True, header=None, skiprows=4, names=cols)
In [73]: es.tail()
Out[73]: SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF 2014-09-22 3096.02 3257.48 346.69 325.68 403.16 357.08 9703.33 564.81 2014-09-23 3057.89 3205.93 341.89 320.72 397.96 352.56 9602.32 558.35 2014-09-24 3086.12 3244.01 344.35 323.42 400.58 354.72 9628.84 559.83 2014-09-25 3059.01 3202.31 341.44 319.77 396.90 352.58 9537.95 555.51 2014-09-26 3063.71 3219.58 342.30 321.39 398.33 352.71 9558.51 556.57
Fortunately, the VSTOXX data set is already in a form such that it can be imported a bit more easily into a DataFrame object:
In [74]: vs = pd.read_csv(‘./data/vs.txt’, index_col=0, header=2, parse_dates=True, sep=‘,’, dayfirst=True) vs.info()
Out[74]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 4010 entries, 1999-01-04 00:00:00 to 2014-09-26 00:00:00 Data columns (total 9 columns):
V2TX 4010 non-null float64 V6I1 3591 non-null float64 V6I2 4010 non-null float64 V6I3 3960 non-null float64 V6I4 4010 non-null float64 V6I5 4010 non-null float64 V6I6 3995 non-null float64 V6I7 4010 non-null float64 V6I8 3999 non-null float64 dtypes: float64(9)
Table 6-6 contains the parameters of this important import function. There are a multitude of parameters, the majority of which default to None; object, of course, is nondefault and
has to be specified in any case.
Table 6-6. Parameters of read_csv function
object String File path, URL, or other source
sep String, default “,” Delimiter to use
lineterminator String (one character) String for line breaks
quotechar String Character for quotes
quoting Integer Controls recognition of quotes
escapechar String String for escaping
dtpye dtype/dict dict of dtype(s) for column(s)
compression “gzip”/“bz2” For decompression of data
dialect String/ csv.Dialect CSV dialect, default Excel
header Integer Number of header rows
skiprows Integer Number of rows to skip
index_col Integer Number of index columns (sequence for multi-index)
names Array-like Column names if no header rows
prefix String String to add to column numbers if no header names
na_values List/ dict Additional strings to recognize as NA, NaN
true_values List Values to consider as True
false_values List Values to consider as False
parse_dates
Boolean/list, default FalseWhether to parse dates in index columns or multiple columns
keep_date_col Boolean, default False Keeps original date columns
dayfirst Boolean, default False For European date convention DD/MM
thousands String Thousands operator
comment String Rest of line as comment (not to be parsed)
decimal String String to indicate decimal, e.g., “.” or “,”
nrows Integer Number of rows of file to read
iterator Boolean, default False Return TextFileReader object
chunksize Integer Return TextFileReader object for iteration
skipfooter Integer Number of lines to skip at bottom
converters Dictionary Function to convert/translate column data
verbose Boolean, default False Report number of NA values in nonnumeric columns
delimiter String Alternative to sep, can contain regular expressions
encoding String Encoding to use, e.g., “ UTF-8”
squeeze Boolean, default False Return one-column data sets as Series
na_filter Boolean, default False Detect missing value markers automatically
usecols Array-like Selection of columns to use
tupleize_cols Boolean, default False Leave a list of tuples on columns as is
To implement the regression analysis, we only need one column from each data set. We therefore generate a new DataFrame object within which we combine the two columns of
interest, namely those for the major indexes. Since VSTOXX data is only available from the beginning of January 1999, we only take data from that date on:
In [75]: import datetime as dt
data = pd.DataFrame({‘EUROSTOXX’ :
es[‘SX5E’][es.index > dt.datetime(1999, 1, 1)]}) data = data.join(pd.DataFrame({‘VSTOXX’ :
vs[‘V2TX’][vs.index > dt.datetime(1999, 1, 1)]}))
We also fill missing values with the last available values from the time series. We call the
fillna method, providing ffill (for forward fill) as the method parameter. Another
option would be bfill (for backward fill), which would however lead to a “foresight”
issue:
In [76]: data = data.fillna(method=‘ffill’) data.info()
Out[76]: <class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 4034 entries, 1999-01-04 00:00:00 to 2014-09-26 00:00:00 Data columns (total 2 columns):
EUROSTOXX 4034 non-null float64 VSTOXX 4034 non-null float64 dtypes: float64(2)
In [77]: data.tail()
Out[77]: EUROSTOXX VSTOXX 2014-09-22 3257.48 15.8303 2014-09-23 3205.93 17.7684 2014-09-24 3244.01 15.9504 2014-09-25 3202.31 17.5658 2014-09-26 3219.58 17.6012
Again, a graphical representation of the new data set might provide some insights. Indeed, as Figure 6-7 shows, there seems to be a negative correlation between the two indexes:
In [78]: data.plot(subplots=True, grid=True, style=‘b’, figsize=(8, 6))
However, to put this on more formal ground, we want to work again with the log returns of the two financial time series. Figure 6-8 shows these graphically:
In [79]: rets = np.log(data / data.shift(1)) rets.head()
Out[79]: EUROSTOXX VSTOXX 1999-01-04 NaN NaN 1999-01-05 0.017228 0.489248 1999-01-06 0.022138 -0.165317 1999-01-07 -0.015723 0.256337 1999-01-08 -0.003120 0.021570
In [80]: rets.plot(subplots=True, grid=True, style=‘b’, figsize=(8, 6))
Figure 6-8. Log returns of EURO STOXX 50 and VSTOXX
We have everything together to implement the regression analysis. In what follows, the EURO STOXX 50 returns are taken as the independent variable while the VSTOXX returns are taken as the dependent variable:
In [81]: xdat = rets[‘EUROSTOXX’] ydat = rets[‘VSTOXX’]
model = pd.ols(y=ydat, x=xdat) model
Out[81]: ––––––––-Summary of Regression Analysis–––––- –––
Formula: Y ~ <x> + <intercept> Number of Observations: 4033 Number of Degrees of Freedom: 2 R-squared: 0.5322
Adj R-squared: 0.5321 Rmse: 0.0389
F-stat (1, 4031): 4586.3942, p-value: 0.0000 Degrees of Freedom: model 1, resid 4031
–––––––—Summary of Estimated Coefficients––––– –––
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
–––––––––––––––––––––––— –––
x -2.7529 0.0406 -67.72 0.0000 -2.8326 -2.6732
intercept -0.0001 0.0006 -0.12 0.9043 -0.0013 0.0011
–––––––––––End of Summary–––––––– –––
Obviously, there is indeed a highly negative correlation. We can access the results as follows:
In [82]: model.beta
Out[82]: x -2.752894 intercept -0.000074 dtype: float64
This input, in combination with the raw log return data, is used to generate the plot in Figure 6-9, which provides strong support for the leverage effect:
In [83]: plt.plot(xdat, ydat, ‘r.’)
ax = plt.axis() # grab axis values
x = np.linspace(ax[0], ax[1] + 0.01)
plt.plot(x, model.beta[1] + model.beta[0] * x, ‘b’, lw=2) plt.grid(True)
plt.axis(‘tight’)
plt.xlabel(‘EURO STOXX 50 returns’) plt.ylabel(‘VSTOXX returns’)
Figure 6-9. Scatter plot of log returns and regression line
As a final cross-check, we can calculate the correlation between the two financial time series directly:
In [84]: rets.corr()
Out[84]: EUROSTOXX VSTOXX EUROSTOXX 1.000000 -0.729538 VSTOXX -0.729538 1.000000
Although the correlation is strongly negative on the whole data set, it varies considerably over time, as shown in Figure 6-10. The figure uses correlation on a yearly basis, i.e., for 252 trading days:
In [85]: pd.rolling_corr(rets[‘EUROSTOXX’], rets[‘VSTOXX’], window=252).plot(grid=True, style=‘b’)