Data - Application to High-Frequency Stock Price Duration Data

3.5 Application to High-Frequency Stock Price Duration Data

3.5.1 Data

We choose 9 highly liquid stocks and a stock index ETF in our empirical analysis to demonstrate our findings, namely AIG, CVX, GM, INTC, JPM, PFE, T, VZ, WMT and SPY. The data is obtained from the Trade and Quote (TAQ) dataset.9 The sample period ranges from 1 Jan 2016 to 31 Dec 2016. The trade dataset consists of prices and trading volumes, timestampped to milliseconds. To support the argument of regime shifts in intraday volatility, we apply our MS-ACI model to the point process of absolute price change events. The point process is constructed as follows: We start with an observed transaction price series P(t∗_j,d), in whicht∗_j refers to the arrival time of the j-th transaction in dayd, the subscript j=1 :J_d is the transaction counter for day d with a total of J_d transactions at day d, and d=1 :D is the index for trading days for a total of D trading days. An arbitrary price change threshold δ (typically in multiples of ticks) is chosen in order to construct the ‘price event’, where the cumulative price change from the previous event is equal or larger than δ. Specifically, the event time ti,d for event i on day d is obtained by the following algorithm:

1. From j=1 for every d, sett0,d=t_1,d∗ . Set the value of δ. 2. Let ti,d= inf

t∗_j_,_d>ti−1,d {|P(t∗_j,d)−P(t_i₋1,d)| ≥δ}. 3. Stop ift_I_d,d≤t_J∗_d_,d and tId+1,d>t ∗ Jd,d. The process {tδ

i,d}i=1:Id,d=1:D records the arrival times of each price event, and is

known as the δ-related absolute price change process, in which Id is the number of price events in day d. An instantaneous volatility measure can be constructed based on the conditional intensity representation of this point process, as proposed by Engle and Russell (1998) and Gerhard and Hautsch (2002):

σ_δ2(ti,d) =λδ(ti,d|Fti,d)

h _δ

P(t_i,d)

, (3.44)

in which λδ(ti,d|Fti,d) is the conditional intensity of {t

i,d} as defined in (3.1). Notice that the volatility process is proportional to the conditional intensity process. The choice of δ is crucial in constructing this volatility estimator, because it de- termines the (random) sampling frequency of the raw dataset. Generally, a small δ samples the raw dataset more frequently, which provides more precise volatility

9_{The dataset is cleaned according to Holden and Jacobsen (2014) and Barndorff-Nielsen, Hansen,}

Lunde, and Shephard (2009). The stocks are chosen for illustrative purpose. We have generated results for up to 30 highly liquid stocks, and they are available upon request.

estimates, but can lead to a very noisy price event process due to market microstructure noise. On the contrary, a large δ samples the raw dataset more sparsely, and the resulting price event process is less affected by market microstructure noise, at the expense of the precision of the volatility estimates. We construct the point process monthly and chooseδ to be the minimum multiples of $0.005 that on average samples the raw data every 5 minutes.10 Tse and Yang (2012) show that under this sampling frequency, intensity-based volatility estimates performs very well against the RV estimates. Moreover, this choice of δ results in similar sample sizes across different securities, enabling an easier cross-sectional comparison. Other choices of δ are available. For example, Nolte, Taylor, and Zhao (2018) suggests to set δ to be three times the average bid-ask spread of the previous sampling period. However, this requires information on the intraday quoted prices, and the cross-sectional sample size will differ significantly. We present the δ used for all 120 stock-month datasets in Figure C.3 in Appendix C.7.

For demonstration purposes it is more convenient to analyse the statistical prop- erties of the duration representation of the absolute price change process defined as xδ

i,d =ti,dδ −tiδ−1,d, which will be referred to as price durations. To investigate the regime-switching volatility-volume relationship based on the price duration data, we compute a volume measure by the log cumulative trading volume within each price duration, denoted by lnVolδ

i,d. It is well documented that empirical price durations and the price duration based covariates exhibit diurnal patterns (Andersen and Bollerslev, 1997b; Engle and Russell, 1998). To ensure that our results are not driven by these time deterministic effects, we filter out the seasonality pattern from the raw price durations and duration based volume measure, and obtain their deseasonalized versionsln ˙xδ

i,d andlnVol˙ δ

i,d. The detailed deseasonalization procedure is presented in Appendix C.4. Yearly descriptive statistics for the deseasonalized price durations and volume are presented in Table 3.3 below. Table 3.3 shows that, firstly, the mean duration is roughly 300 seconds which corresponds to our choice of δ. The distribution of duration is skewed to the right and has over-dispersion, as the mean is much larger than the median, and the standard deviation is larger than the mean for almost every stock. The minimum duration is not exactly zero but smaller than 0.01. The minimum volume, however, can be zero because it is in log. The log volume distribution is very symmetric. For comparison we also present the yearly descriptive statistics for raw price durations and volumes in Table C.1 in Appendix C.7. Comparing Tables 3.3 and C.1, it is clear that the mean duration and volume

10_{The length of the sample window is chosen to contain sufficient data and trading days to allow}

for a reliable seasonality estimate, but not to an excessive extent so that the estimation time of the model is manageable and potential intertemporal parameter instability can be avoided.

3.5 Application to High-Frequency Stock Price Duration Data | 119

do not change too much. The main differences are changes in the maxima and that the deseasonalized variables have less standard deviation, which is expected as the intraday variations have been removed from the variables.

Table 3.3 Yearly Descriptive Statistics for x˙δ

i,d and lnVol˙ δ i,d ˙ xδ i,d lnVol˙ δ i,d

Ticker Obs. Mean σ Min Median Max Mean σ Min Median Max

AIG 17925 321.11 346.17 0.00 212.80 5383.79 10.52 1.21 0.00 10.56 17.87 CVX 18014 316.99 334.58 0.00 211.34 3692.72 10.77 1.20 0.00 10.81 18.64 GM 17240 327.96 375.66 0.00 203.13 5801.45 11.24 1.12 0.00 11.27 16.80 INTC 16541 339.97 384.96 0.00 214.40 6015.84 11.79 1.24 2.30 11.83 19.52 JPM 17270 331.75 367.36 0.00 212.47 6267.21 11.61 1.09 0.00 11.64 17.47 PFE 17416 340.70 455.78 0.00 203.95 12422.01 12.03 1.28 1.33 12.07 19.10 SPY 18530 313.15 397.74 0.00 185.29 9944.04 13.52 1.01 3.89 13.56 17.17 T 17008 335.65 386.79 0.00 211.85 6575.75 11.83 1.18 0.00 11.86 18.40 VZ 16853 336.58 365.23 0.00 217.63 4564.45 11.27 1.21 0.00 11.30 18.54 WMT 17082 336.05 380.97 0.00 209.53 4774.98 10.83 1.19 0.00 10.87 18.43

Note: The table presents the descriptive statistics for the deseasonalized price durationsx˙δ

i,d and the deasonalized

volumelnVol˙ δ_i_,_dfrom 30 securities for the year 2016. Obs. denotes the total number of observations.σis the standard deviation.

To briefly describe the intraday volume-volatility relationship, we regress lnVol˙ δ_i,d on

ln ˙xδ

i,d using a linear regression model:

lnVolδ_i,d=b0+b1ln ˙xδi,d+εi,d (3.45) Intuitively, the regression describes the elasticity between duration and volume, asb₁ is the percentage increase in volume per 1 percent increase in price duration. If there is a regime-switching relationship between volume and volatility, we should be able to observe different estimates of the parameters and of the R2 for different subsamples of the dataset. As a preliminary descriptive analysis, we split the data into two subsamples based on calendar time: the first subsample includes all price durations within the first hour of a trading day (t_i,d≤3600s), and the second subsample consists of the rest of the observations. We present three examples in Figure 3.2.

From Figure 3.2, it is evident that for individual stocks, the volume-duration relationship for observations during the first hour of the trading day is very different from those from the rest of the day. For observations in the first hour, the estimated R2 is much smaller. We can see from the right panel of the graph that for the rest of the day, lnVol˙ δ_i,d and ln ˙xδ

i,d are highly linearly dependent and cluster symmetrically along the regression line. During the first hour, the regression line deviates from the cluster, which is driven by observations that have disproportionally large volume in short price durations. However, we cannot observe this effect for the SPY, as from

Figure 3.2 Scatter plots with regression line for AIG 2016-03, INTC 2016-08, SPY 2016-05

Panel 1: AIG 2016-03

The first hour of the trading day

log price duration

lo g v o lu m e -5 -2.5 0 2.5 5 7.5 10 0 2.5 5 7.5 10 12.5 15 17.5

Rest of the trading day

log price duration

lo g v o lu m e -5 -2.5 0 2.5 5 7.5 10 0 2.5 5 7.5 10 12.5 15 17.5 Panel 2: INTC 2016-08

The first hour of the trading day

log price duration

lo g v o lu m e -8 -4 0 4 8 4 6 8 10 12 14 16 18

Rest of the trading day

log price duration

lo g v o lu m e -8 -4 0 4 8 4 6 8 10 12 14 16 18 Panel 3: SPY 2016-05

The first hour of the trading day

log price duration

lo g v o lu m e -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Rest of the trading day

log price duration

lo g v o lu m e -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Note: Three 1-by-2 graphs from top to bottom: AIG 2016-03, INTC 2016-08, SPY 2016-05. X-axis: ln ˙xδ

i,d. Y-axis: ˙

3.5 Application to High-Frequency Stock Price Duration Data | 121

the figure, the regression outputs are very close for the two subsamples.

To give a complete picture of our dataset, we denoteR2₁, theR2from the regression on the first hour subsample, and R2₂, the R2 from the other subsample. We plot Rˆ2₂−_Rˆ2 1 for all10×12stock-month datasets in Figure 3.3. From Figure 3.3, it is evident that the average R2 difference is between 0.2 to 0.4 for all individual securities except SPY, whose average R2 difference is much closer to zero.

Figure 3.3 R2 Difference for the volume-duration regressions in the first hour of the day and the rest of the day

Note: The figure plotsRˆ2₂−Rˆ2₁obtained from regression (3.45) for10×12stock-month datasets, withRˆ2₁the estimated R2 from the observations in the first hour of the trading day andRˆ2₂ from the rest of the observations. Each black dot represents theR2_{difference for one stock-month dataset. The vertical black dashed lines split observations from}

each stock, and between two vertical red lines, theR2 _{differences are ordered chronologically. The horizontal red}

dots represents the yearly averageR2_{difference for each stock.}

Concluding our data section, we have found a regime-switching volume-volatility relationship for all the individual stocks considered in our analysis by a simple static OLS regression. However, the simple OLS regression has two major drawbacks: (1) it does not consider the dynamics of the intraday volatility and (2) dividing the data at the first hour is somewhat arbitrary in the sense that there can be informed traders trading at other times during the day, and uninformed traders can also trade within the first hour. This motivates our MS-ACI model as it is tailored for capturing the dynamics of intraday volatility through a fully parametrized specification. Moreover, the regime identification is based solely on the correlation between volume and volatility, without any arbitrarily chosen threshold.

In document Point process based high frequency volatility estimation:theory and applications (Page 137-142)