ABSTRACT
ZHENG, HAIQING. Essays on Modeling of Volatility, Duration and Volume in High-frequency Data. (Under the direction of Denis Pelletier.)
This dissertation includes three essays. The first essay proposes a bivariate Autoregressive
Conditional Duration model for the durations between trades of two assets. Market
microstruc-ture theory suggests that the transactions of two related stocks could be influenced by the same
financial market information, and therefore their durations between trades may be correlated.
We derive the likelihood of two trade arrival processes with two Gumbel’s bivariate
exponen-tial distributions, the Gaussian copula (of elliptical family) and Frank copula (of Archimedean
family) modeling the dependence between two point processes. The empirical analysis on 14
pairs of stocks demonstrates the wide existence of the correlation between trade durations for
different stocks.
The second essay develops a new continuous-time framework to jointly model stock return
and duration between trades for irregularly spaced high-frequency financial data with noise. We
assume that the returns follow a stochastic volatility model, the durations follow an exponential
distribution conditional on a latent intensity and the two latent variables, log-variance and
log-intensity, follow a bivariate Ornstein-Uhlenbeck process. In the derived discretized model,
durations between trades determine the parameter dynamics. The returns and volatilities have
influence on durations and vice versa. We also model market microstructure noise, and therefore
we can estimate volatility of the unobserved efficient price change. We apply this model to
tick-by-tick stock trade data and find that volatility and intensity have strong persistence and are
contemporaneously positively correlated. A Monte Carlo study points out that more accurate
measurement of volatility can be obtained by conditioning on observed duration between trades
in addition to conditioning on the returns.
The third essay extends the continuous-time framework in the second essay to study the
crucial to the analysis of information dynamics in asset markets. We apply the model to NYSE
TAQ intraday data for the BAC and WMT stocks and find that the same latent process or
information flow drives volatility and trading volume together. We also investigate the
volatil-ity estimation when trading volume is incorporated, and we find slight improvements on the
©Copyright 2012 by Haiqing Zheng
Essays on Modeling of Volatility, Duration and Volume in High-frequency Data
by Haiqing Zheng
A dissertation submitted to the Graduate Faculty of North Carolina State University
in partial fulfillment of the requirements for the Degree of
Doctor of Philosophy
Economics
Raleigh, North Carolina
2012
APPROVED BY:
Peter Bloomfield Atsushi Inoue
Jeffrey Scroggs Walter Thurman
Denis Pelletier
DEDICATION
BIOGRAPHY
The author was born the second of three children to a modest family and grew up with
car-ing parents and grandparents in Nanchang, Jiangxi, China. She got her Bachelor’s degree from
Huazhong University of Science and Technology. Later, she received her Master’s degree in
Eco-nomics from Kansas State University. Following graduation, she immediately enrolled at North
Carolina State University to pursue doctorate in economics with a graduate minor in
mathe-matics. Her research interests include financial econometrics, risk management and quantitative
ACKNOWLEDGEMENTS
It is with immense gratitude that I acknowledge the support and help of my dissertation
com-mittee. I have had the privilege of working with such a great committee of knowledgeable and
experienced professors. Especially, I would like to gratefully and sincerely thank my advisor,
Dr. Denis Pelletier. This dissertation would not have been possible without his guidance,
under-standing, patience and caring. He has always inspired me with creative ideas and been there to
listen and give advice. His knowledge and intelligence led me to overcome many difficulty
situa-tions to finish this dissertation. I am thankful to Dr. Atsushi Inoue and Dr. Wally Thurman for
their unending encouragement, valuable and insightful suggestions. I thank Dr. Peter
Bloom-field, Dr. Jeffrey Scroggs and Dr. Mehmet Caner for reading my dissertation and commenting
on my views and helping me understand and enrich my ideas.
I thank the professors at North Carolina State University who have taught me the skills
and techniques that equipped and prepared me to write this dissertation. I would like to thank
my cheerful and friendly fellow students, with whom my graduate study life has been enriched.
Last but not least important, I extend my gratitude to my grandparents, parents and my
TABLE OF CONTENTS
List of Tables . . . vii
List of Figures . . . ix
Chapter 1 Bivariate ACD Model: a Copula Approach . . . 1
1.1 Introduction . . . 1
1.2 Literature Review . . . 2
1.3 Bivariate ACD Model . . . 11
1.3.1 Model . . . 11
1.3.2 Gumbel’s bivariate exponential distributions . . . 12
1.3.3 Copulas . . . 13
1.3.4 Maximum-Likelihood Estimation . . . 18
1.4 Result . . . 24
1.4.1 Data . . . 24
1.4.2 Estimation results . . . 25
1.4.3 Goodness-of-fit test . . . 28
1.4.4 Diurnal effect . . . 30
1.5 Conclusion . . . 31
Chapter 2 Joint modeling of high-frequency price and duration data . . . 46
2.1 Introduction . . . 46
2.2 Model for price and duration . . . 49
2.2.1 Specification in continuous time . . . 49
2.2.2 Discretized version of the model . . . 52
2.2.3 Market micro-structure noise . . . 56
2.2.4 State space representation . . . 57
2.2.5 Using the sign of the return . . . 58
2.3 Estimation . . . 60
2.4 Application to tick-by-tick data . . . 61
2.4.1 Summary statistics . . . 61
2.4.2 Estimation results for individual trades . . . 63
2.4.3 Estimation results for combined trades . . . 65
2.5 Monte Carlo Simulation . . . 67
2.6 Conclusion . . . 68
Chapter 3 Joint modeling of high-frequency price and volume data . . . 82
3.1 Introduction . . . 82
3.2 Model for price and volume . . . 84
3.2.1 Specification in continuous time . . . 84
3.2.2 Discretized version of the model . . . 86
3.2.3 Market micro-structure noise . . . 89
3.2.5 Using the sign of the return . . . 91
3.3 Estimation . . . 92
3.4 Application of the Model . . . 93
3.4.1 Summary Statistics . . . 93
3.4.2 Estimation results for combined trades . . . 94
3.5 Monte Carlo Simulation . . . 98
3.6 Conclusion . . . 99
References . . . .114
Appendix . . . .119
Appendix A . . . 120
A.1 Equations for Computation of log-likelihood . . . 120
A.1.1 Model I of Gumbel’s . . . 120
A.1.2 Model II of Gumbel’s . . . 123
A.1.3 Model with Frank Copula . . . 126
A.1.4 Model with Gaussian Copula . . . 129
LIST OF TABLES
Table 1.1 Descriptive statistics of trade durations (in seconds) . . . 32
Table 1.2 Estimation results for the Gumbel’s I. . . 33
Table 1.3 Estimation results for the Gumbel’s II. . . 34
Table 1.4 Estimation results for the Frank Copula . . . 35
Table 1.5 Estimation results for the Gaussian Copula . . . 36
Table 1.6 Rivers and Vuong test results. . . 37
Table 1.7 Estimation results for the Gumbel’s I with deseasonalized data . . . 37
Table 1.8 Estimation results for the Gumbel’s II with deseasonalized data . . . 38
Table 1.9 Estimation results for the Frank Copula with deseasonalized data . . . 38
Table 1.10 Estimation results for the Gaussian Copula with deseasonalized data . . . 38
Table 2.1 Summary Statistics for the duration and return series for AMD for indi-vidual transactions (1) and for sequences of five transactions (5). The data is over the period July 7, 2005 to July 20, 2005. . . 69
Table 2.2 Estimation results for AMD stock data. . . 70
Table 2.3 Daily volatility measurements for AMD Stock, individual trade. First col-umn is the integrated volatilities estimated from the model with endoge-nous durations. The second column is a basic realized volatility measure-ment using 5 minutes returns. The last column is the ratio of integrated volatility to realized volatility. . . 70
Table 2.4 Daily volatility measurements for AMD Stock. First two columns are the integrated volatilities estimated from the full model (endogenous dura-tions) and partial model (exogenous duradura-tions). The third column is a basic realized volatility measurement using 5 minutes returns. The last column is a realized volatility measurement computed with the method introduced by Hansen and Lunde [2004]. . . 71
Table 2.5 MSE and MAE of the filtered variances for the models where the durations are either endogenous or exogenous. . . 71
Table 2.6 The difference in the simulation results between models with endogenous and exogenous duration. The difference of instantaneous volatility between models with endogenous or exogenous volumes is evaluated by the ratio between mean absolute difference of the instantaneous volatilities and the average instantaneous volatility computed by the model with endogenous durations . . . 72
Table 3.1 Summary Statistics for the return, duration and volume series for BAC for individual transactions (1) and for sequences of forty transactions (40). The data is over the period May 1, 2005 to June 30, 2005. . . 100
Table 3.2 Summary Statistics for the return, duration and volume series for WMT for individual transactions (1) and for sequences of forty transactions (40). The data is over the period May 1, 2005 to June 30, 2005. . . 101
Table 3.4 Estimation results for WMT stock data. . . 102 Table 3.5 Daily volatility measurements for BAC and WMT Stock. First two columns
are the integrated volatilities estimated from the model with endogenous volume and model with exogenous volume. The third column is a basic realized volatility measurement using 5 minutes returns. The last column is a realized volatility measurement computed with the method introduced by Hansen and Lunde [2004]. . . 103 Table 3.6 MAE and MSE of the filtered variances for the model with endogenous
LIST OF FIGURES
Figure 1.1 Case 1: The horizontal lines represent time.t1is the duration between two
consecutive trades of asset 1; t2 is the duration between two consecutive
trades of asset 2. . . 20 Figure 1.2 Case 2:t1 is the duration between two consecutive trades of asset 1;t2 is
the duration between two consecutive trades of asset 2;tm is the elasped time between previous trade of asset 2 and previous trade of asset 1. . . . 20 Figure 1.3 Case 3:t1 is the duration between two consecutive trades of asset 1;t2 is
the elasped time between previous trade of asset 2 and the current trade of asset 1. . . 21 Figure 1.4 Case 4:t1 is the duration between two consecutive trades of asset 1;t2 is
the elasped time between previous trade of asset 2 and the current trade of asset 1; tm is the elasped time between previous trade of asset 2 and previous trade of asset 1. . . 21 Figure 1.5 Case 5:t1 is the duration between two consecutive trades of asset 1;t2 is
the duration between two consecutive trades of asset 2; tt is the elasped time between previous trade of asset 1 and previous trade of asset 2. . . . 22 Figure 1.6 Case 6:t1 is the duration between two consecutive trades of asset 1;t2 is
the elasped time between previous trade of asset 2 and the current trade of asset 1; tt is the elasped time between previous trade of asset 1 and previous trade of asset 2. . . 22 Figure 1.7 Case 7: t1 is the elasped time between previous trade of asset 1 and the
current trade of asset 2;t2is the duration between two consecutive trades
of asset 2; tt is the elasped time between previous trade of asset 1 and previous trade of asset 2. . . 23 Figure 1.8 Case 8: t1 is the elasped time between previous trade of asset 1 and the
current trade of asset 2;t2is the duration between two consecutive trades
of asset 2. . . 23 Figure 1.9 Case 9: t1 is the elasped time between previous trade of asset 1 and the
current trade of asset 2;t2is the duration between two consecutive trades
of asset 2; tm is the elasped time between previous trade of asset 2 and previous trade of asset 1. . . 24 Figure 1.10 Diurnal effect of trade durations . . . 39 Figure 1.11 Histograms of trade durations . . . 40
Figure 2.1 Sample path from a bivariate OU process with a transition matrix A with complex eigenvalues. The transition matrix is equal toA=0−.0050.6 00.007.6 and the eigenvalues are eig(A) =0.006+0.6i
0.006−0.6i
. . . 73 Figure 2.2 Sample path from a bivariate OU process with a transition matrix A with
real eigenvalues. The transition matrix is equal toA= [0.001 0.0015
0.002 0.01 ]and the
eigenvalues are eig(A) = [0.000678174
Figure 2.3 Signature plot for AMD (daily realized volatility against sampling fre-quency in seconds) . . . 75 Figure 2.4 Impulse-response for the model estimated with individual trades . . . 76 Figure 2.5 Plot of the daily trends for duration and volatility. . . 77 Figure 2.6 Filtered estimates of the latent log-trade intensity and log-variance. . . . 78 Figure 2.7 Impulse-response for the model estimated with observations
correspond-ing to five consecutive trades . . . 79 Figure 2.8 Filtered Estimates, AMD (5 trades, 10 days’ data in July 2005). Filtered
estimates are computed based on the estimators. (a) is the plot for esti-mated log intensity for AMD stock during this 10 day period; (b) is the plot for estimated log volatility for AMD stock during this 10 day period. 80 Figure 2.9 Filtered log-variance for the models with endogenous and exogenous
du-rations. Panel (a) presents the series for the ten days of data used to estimate the model while panel (b) presents the results for the first 1500 observations. . . 81
Figure 3.1 Histogram of Volume Plot of BAC and WMT, sequence of forty consec-utive trades . . . 105 Figure 3.2 Signature Plot of BAC and WMT: Plots of daily realized volatility and
sampling frequency in seconds for BAC and WMT . . . 106 Figure 3.3 Impulse-response for the model estimated with observations
correspond-ing to forty consecutive trades, BAC . . . 107 Figure 3.4 Impulse-response for the model estimated with observations
correspond-ing to forty consecutive trades, WMT . . . 108 Figure 3.5 Plot of the daily trends for volume and volatility. . . 109 Figure 3.6 Filtered Estimates, BAC (40 trades, 2 months’ data of May 2005 and
June 2005). Filtered estimates are computed based on the estimators. (a) is the plot for estimated log intensity for BAC stock during this two month period; (b) is the plot for estimated log volatility for BAC stock during this two month period. . . 110 Figure 3.7 Filtered Estimates, WMT (40 trades, 2 months’ data of May 2005 and
June 2005). Filtered estimates are computed based on the estimators. (a) is the plot for estimated log intensity for WMT stock during this two month period; (b) is the plot for estimated log volatility for WMT stock during this two month period. . . 111 Figure 3.8 Filtered log-variance for the model with endogenous volume and model
with exogenous volume, BAC stock. presents the series for the two months of data used to estimate the model while panel (b) presents the results for the first 500 observations. . . 112 Figure 3.9 Filtered log-variance for the model with endogenous volume and model
Chapter 1
Bivariate ACD Model: a Copula
Approach
1.1
Introduction
In the recent fifteen years, the availability of high frequency financial data has brought an
important impact to research in financial econometrics and market microstructure theory. In
this article, we review the literature of realized volatility and financial point process analysis,
present and discuss our bivariate ACD model.
This chapter is organized as follows. Section 1.2 provides literature review. In Section 1.3
we present bivariate ACD models with two Gumbel’s bivariate exponential distributions, the
Gaussian copula and Frank copula. In Section 1.4, we estimated the model and report the
1.2
Literature Review
The literature on the use of high frequency data goes back to Merton [1980], who stresses that
the sum of squared logarithmic changes of the asset price index over fixed intervals will produce
a consistent estimate of the conditional variance of financial asset return if the data is available
at a sufficiently high sampling frequency. However, intraday observations data was not available
at that point in time. Recently, the availability of tick-by-tick data for most exchanges makes
possible the investigation of this proposed methodology to estimate and predict the volatility
of asset returns.
Andersen and Bollerslev [1998] developed a method to compute an ex-post daily volatility
measure based on high frequency intraday data. The daily realized volatility is measured by
aggregating all intraday squared returns over fixed intervals (five minutes for example). This
approach is later elaborated in Andersen et al. [2001] with the theory of quadratic variation,
which we explain below.
An arbitrage-free price process of asset k belongs to the class of semimartingale and is
assumed in this stochastic differential equation form:
dpk(t) =uk(t)dt+σk(t)dWk(t) (1.1)
wherepk(t) is the log of the price;Wk(t) is standard Wiener process; the driftuk(t) is assumed finite and the diffusion σk(t) is positive and assumed squared integrable (i.e.E
h Rt
0σ2k(s)ds
i
< ∞).
Let time t∈[0, T] and the integer h≥1 denote the number of trading days over which the
volatility measures are computed. Prices are sampled m times per day. So T = m∗h. When
h= 1, we consider daily volatility.
The h-period return of asset kat time t,rk,h(t), is defined as:
rk,h(t)≡pk(t)−pk(t−h) =
Z t
t−h
uk(s)ds+
Z t
t−h
while the h-period quadratic variation of assetk,Qvark,h(t) is defined as:
Qvark,h(t)≡[pk, pk]t−[pk, pk]t−h= t
Z
t−h
σk(s)2ds (1.3)
where [pk, pk]t= limkPk→0
Pn
i=1 pti−pti−1 2
and the partitionP is over the interval (0, t);nis
the number of the partitions andn→ ∞whenkPk →0. The right hand side of equation (1.3) is known as the integrated variance. This equality is obtained through the following derivation:
[pk, pk]t = lim
kpk→0
n
X
i=1
pk,ti−pk,ti−1 2
=
Z t
0
dp2k(s)
=
Z t
0
(uk(s)ds+σk(s)dW(s))2
=
Z t
0
σk(s)2ds
because quadratics of dsand the cross term (dW)(ds) are zero and (dW)2 =ds. The h-period quadratic covariation between assetk and j,Qcovkj,h(t), is defined as:
Qcovkj,h(t)≡[pk, pj]t−[pk, pj]t−h =
Z t
t−h
σk(s)σj(s)ds (1.4)
where [pk, pj]t= lim
kPk→0
Pn
i=1(pk,ti−pk,ti−1)(pj,ti−pj,ti−1) in which partitionP is over interval
(0, t).
[pk, pj]t = lim
kPk→0
n
X
i=1
(pk,ti −pk,ti−1)(pj,ti−pj,ti−1)
=
Z t
0
dpk(s)dpj(s)
=
Z t
0
(uk(s)ds+σk(s)dW(s))(uj(s)ds+σj(s)dW(s))
=
Z t
0
If the price follows a jump-diffusion process instead as below:
dpk(t) =uk(t)dt+σk(t)dW(t) +kk(t)dNk(t) (1.5)
where N(t) is a Poisson process uncorrelated with W(t) and k(t) is the jump size, then the
h-period return of asset kat time t,i.e.rk,h(t) is defined as:
rk,h(t) ≡ pk(t)−pk(t−h)
=
Z t
t−h
uk(s)ds+
Z t
t−h
σk(s)dW(s) +
X
t−h≤s≤t
kk(s)4Nk(s) (1.6)
The h-period quadratic variation of asset k,Qvark,h(t), is defined as:
Qvark,h(t) ≡ [pk, pk]t−[pk, pk]t−h
=
Z t
t−h
σk(s)2ds+ X t−h≤s≤t
kk(s)24Nk(s)2 (1.7)
The h-period quadratic covariation between asset k andj,Qcovkj,h(t), is defined as:
Qcovkj,h(t) ≡ [pk, pj]t−[pk, pj]t−h
=
Z t
t−h
σk(s)σj(s)ds+ X t−h≤s≤t
kk(s)kj(s)4Nk(s)4Nj(s). (1.8)
If prices are sampledmtimes per day, thehperiod realized volatility of assetk,Rvark,h(t;m), is defined as:
Rvark,h(t;m)≡ mh
X
i=1
rk,2(m)(t−h+ i
m) (1.9)
and theh-period realized covariance between assetk and j,Rcovkj,h(t;m), is defined as:
Rcovkj,h(t;m)≡ mh
X
i=1
rk,(m)(t−h+ i
m)×rj,(m)(t−h+ i
For sufficiently large m, the realized volatility and covariance provide good approximations
to the quadratic variation and covariation. For allt=h,2h, . . . , T, as m→ ∞, we have:
Rvark,h(t;m) p
→ Qvark,h(t),
Rcovk,h(t;m) p
→ Qcovk,h(t),
i.e. with sufficiently high sampling frequency, realized volatility and covariance converge in
probability to quadratic variation and covariation.
Note that h-period quadratic variation and covariation are closely related to, but distinct
from, the conditional return variance and covariance which standard volatility models focus on.
Specifically,
var(pk(t)|Ft−h) = E(Qvark,h(t)|Ft−h),
cov(pk(t), pj(t)|Ft−h) = E(Qcovkj,h(t)|Ft−h).
The right hand side of the last two equations are expectations with respect toFt−h which is
theσ−algebra generated by all the observed variables until timet−h. The conditional variance
and covariance are different from the quadratic variation and covariation by a zero mean error.
So realized volatility and covariance are unbiased estimators of the conditional return variance
and covariance. For instance, the daily return volatility can be estimated by summing all the
squared intraday returns over fixed intervals.
Barndorff-Nielsen and Shephard [2002] derive the asymptotic distribution of the the
real-ized volatility estimator under the assumption of the stochastic volatility model with a basic
Brownian motion for log prices of stocks:
√ m
√ h
p
2IQh(t)
where the integrated quarticityIQh(t) is defined as:
IQh(t)≡
Z t
t−h
σ4(s)ds (1.12)
Furthermore, they show that m3 Pmk
i=1|r4(t−h+mi )|is a consistent estimator of the integrated quarticity:
m 3
mh
X
i=1
|r4(t−h+ i m)|
p →
Z t
t−h
σ4(s)ds (1.13)
and
√ h
q
2 3
Pmh
i=1|r4(t−h+mi )|
(Rvarh(t;m)−Qvarh(t)) d
→N(0.1) as m→ ∞ (1.14)
The above consistent estimation of realized volatility is obtained under the assumption of
a frictionless market. A model of price formation with market microstructure noise is more
realistic:p=p?+η, wherepis the observed logarithmic price,p? is the logarithmic equilibrium price or true price and η denotes a microstructure contamination which could include bid-ask
spread, price discretization, data recording mistakes, etc.
The study for asymptotically consistent and efficient estimator of quadratic variation based
on realized volatility under the presence of microstructure noise has been very active over the
past few years. These efforts include sparse sampling of Andersen et al. [2001], who suggest a
sampling frequency that is lower than the highest frequency available to balance the trade off
between efficient sampling versus bias-inducing noise to achieve an optimal sampling scheme.
Zhang et al. [2005] propose two scales realized volatility estimator (TSRV). They average RVs
computed from different sub-samples and correct for the remaining bias. Zhou [1996] was the
first to use kernel methods to deal with microstructure effect. His bias correction approach is
based on autocorrelation of the returns in the case of i.i.d. noise. Hansen and Lunde [2006]
extends it to non-i.i.d. noise and Barndorff-Nielsen et al. [2008] develop a kernel-based method
These estimators are still constructed on the basis of sum of squared returns over 5, 10, 15 or
30 minutes intervals. Aggregating the data over a fixed interval, however, causes misspecification
and bias since it is hard to decide what the optimal aggregation level is. What’s more, a key
fea-ture of high frequency data is irregular spacing in time. Tick-by-tick data provides information
that consists of the time when the market event occurs and the associated characteristics. This
information shows the state of the market and is necessary for market microstructure analysis,
modeling volatility and quantification of liquidity. An alternative to an approach of artificially
created equi-distant intervals is to combine the analysis of durations between transactions with
the modelling of the volatility of stock prices. See for example Ghysels and Jasiak [1998], Engle
[2000] and Renault and Werker [2011].
In the paper, we propose to first focus on trade durations. The study of duration falls
into the category of point process analysis. A point process is the stochastic process{ti}i∈Sthat
characterizes random occurrence of certain events along the time axis in dependence of observed
marks and of the past process,S= [0,∞). The sequence of random vectors{xi}i∈S associated with{ti}i∈S are the marks. For example, volume or price can be marks. A sequence{ti, xi}i∈S
is called a marked point process. A process N(t) = P
i≥1
1{ti≤t} associated with {ti}i∈S is called
a counting process.N(t) is a right continuous step function with limits on the right. Selecting
particular points by some rules is called thinning of the point process, for instance, the points
at which the change of the stock price is larger than a certain value.
A key concept in point process is the the conditional intensity. It is defined as:
λ(t|N(t), t1, ...tN(t)) = lim
4t→0
P(N(t+4t)> N(t)|N(t), t1, ...tN(t))
4t (1.15)
The expected number of events in the interval (t1, t2] given Ft1 is conditional expectation
of the integrated intensity function, i.e.
E(N(t2)−N(t1)|Ft1) =E Z t2
t1
λ(s)ds|Ft1
!
where the filtration Ft1 is the σ-algebra generated by all the information available at time
t1 :N(t1), t1, ...tN(t1) and marks observed until time t1. A quantity of interest in the study of
point processes is the integrated intensity function defined as
Λ(t1, t2)≡
Z t2 t1
λ(s)ds. (1.17)
The simplest point process is the homogeneous Poisson process, for which the key feature is
a constant intensity function. We can obtain non-homogeneous Poisson processes by allowing
the intensity to depend on some observed variables. The intensity could depend on the variables
observed at the beginning of the spell or on the current timet. If we relax the assumption that
the duration between subsequent points are exponentially distributed as in Poisson process,
we have the renewal process. The ordinary renewal process has mutually independent intervals
between successive events, but instead of its density being of exponential form, it can be other
distributions, g (e.g.uniform distribution and gamma distribution). The conditional intensity of
a renewal process depends on the backward recurrence time, which is defined as the time elapsed
since the immediately previous point at which the event happened. The backward recurrence
time is left continuous and increases linearly over time and jumps back to zero after next arrival
time. Therefore the conditional intensity changes within a duration, while homogeneous Poisson
process has a constant intensity within a duration.
To add dynamic structure to the sequence of durations distribution, one possibility is to
suppose that each point in the past influences the conditional intensity in a way decaying in
time and that contributions from distinct points add together. We have the class of linear
self-exciting processes and autoregressive point processes. The linear self-self-exciting process defined
for t has:
λ(t|N(t), t1, ...tN(t)) =$+
N(t)
X
i=1
π(t−ti) (1.18)
where each past arrival timeti contributesπ(t−ti) to the intensity at time t.π is the function
And autoregressive process defined for t has:
λ(t|N(t), t1, ...tN(t)) =$+
N(t)
X
i=1
π(tN(t)+1−i−tN(t)−i) (1.19)
Both autoregressive process and linear self-exciting process are dependent on the history
of the processes, however, they are different in the following way. The intensity of linear
self-exciting process is dependent on the distance oft−ti’s, while functions oft−ti’s are independent
of any points and number of events betweentandti, in which i refers to all previous events. The
intensity of autoregressive process is dependent on all past durations between two neighboring
points. See Cox and Isham [1980].
Wold [1948] and later Cox [1955] studied the autoregressive process. The model was later
extended by Gaver and Lewis [1980], Lawrence and Lewis [1980], and Jacobs and Lewis [1977]
as exponential autoregressive moving average EARMA(p,q) model. These models assume that
the durations are conditionally exponentially distributed with a mean that follows an ARMA
process.
There are two intensity based models with covariates relevant to our research that we need
to mention at this point: proportional intensity model and accelerated failure time model. In
the proportional intensity (PI) model, the conditional intensity is given by:
λ(t;xN(t)) =λ0(t) exp(−x0N(t)γ) (1.20)
where λ0(t) is baseline intensity function andxN(t)is a vector of covariates. In the accelerated
failure time (AFT) model the conditional intensity is given by:
λ(t;xN(t)) =λ0(t)
texp(−x0N(t)γ)exp−x0N(t)γ. (1.21)
The covariatesxN(t) can accelerate and decelerate the time to failure.
spaced financial data. They introduce the autoregressive conditional duration (ACD) model.
Letyi=ti−ti−1be the interval between two arrival times. We denote the conditional expected
duration byϕi:
E(yi|yi−1, ..., y1) =ϕi(yi−1, ..., y1;θ)≡ϕi (1.22)
and assume that yi follows
yi=ϕiεi (1.23)
where{εi} ∼i.i.d. with densityp(ε, φ) andE[εi] = 1. The parametersθand φare assumed to
be variation free, meaning that if θ∈Θ andφ∈Φ, then (θ, φ)∈Θ⊗Φ.
Different types of ACD models can be obtained either by the choice of the functional form for
the conditional meanϕi or by the choice of the distribution for εi.The m-memory conditional
intensity suggests that:
ϕi =$+ m
X
j=0
αjyi−j (1.24)
A more general specification is:
ϕi =$+ m
X
j=0
αjyi−j+ q
X
j=0
βjϕi−j (1.25)
which is called an ACD(m,q).
The ACD model can be expressed through its conditional intensity:
λ(t|N(t), t1, ...tN(t)) =λ0
t−tN(t) ϕN(t)+1
!
1
ϕN(t)+1 (1.26)
with baseline intensityλ0 = ps00((tt)), where p0(t) is the density function and s0(t) is the survival
function of ε. A survival function is defined as :
S(t)≡P(T > t) =
Z ∞
t
So we can see that the ACD model belongs to the class of accelerated failure time models.
1.3
Bivariate ACD Model
We propose a bivariate ACD model for the durations between trades of two assets. The
trans-actions of two related stocks would be influenced by the same financial market information so
their durations between trades would be correlated instead of being independent to each other.
With this model, bivariate conditional covariance between trade durations and stock prices and
liquidity measures can be derived.
1.3.1 Model
Lety1,i=t1,i−t1,i−1 be the interval between two trades arrival times for asset 1 andλ1,ibe the expected conditional mean of theith duration for asset 1. Similarly denote byy2,i=t2,i−t2,i−1
the interval between two trades arrival times for asset 2 andλ2,i the conditional expectation of theith duration for asset 2:
E(y1,i|y1,i−1, ..., y1,1) = λ1,i(y1,i−1, ..., y1,1, θ)≡λ1,i (1.28)
E(y2,i|y2,i−1, ..., y2,1) = λ2,i(y2,i−1, ..., y2,1, θ)≡λ2,i (1.29)
The standardized durations are defined as:
ε1,i = y1,i λ1,i
(1.30)
ε2,i = y2,i λ2,i
(1.31)
We assume thatλ1,i and λ2,i are determined by the following equation:
λ1,i = I1+α1y1,i−1+β1λ1,i−1 (1.32)
λ2,i = I2+α2y2,i−1+β2λ2,i−1 (1.33)
in which I, α and β are coefficients, y1,i−1is the i−1 th duration, λ1,i−1 is the expected
conditional mean of the i−1th duration. Restrictions of this model are: α’s, I’s and β’s are
positive to make sure λis positive; α+β less than one is the stationarity condition.
1.3.2 Gumbel’s bivariate exponential distributions
We assumep(ε, γ) is a bivariate exponential distribution. Bivariate exponential usually refers to
bivariate distributions with both marginal distributions being exponential. There are different
kinds of bivariate exponential distributions. Gumbel [1960], in a pioneering paper devoted to
bivariate exponential distribution, introduces a number of bivariate exponential distributions
models. We use two of these models in our study. The main reason for choosing them is the
fact that the conditional survival functions have closed form expressions.
For the first model (Gumbel’s I), the joint density function for yi ={y1,i, y2,i} is:
fy(1)(y) = 1 λ1λ2
e−
y
1
λ1+ y2 λ2+θλyi1
y2 λ2
1 +θy1 λ1
1 +θy2 λ2
−θ
(1.34)
withy1, y2>0 and 0≤θ≤1.
For the second model (Gumbel’s II), the joint density function is equal to
fy(2)(y) = 1 λ1λ2
e−
y
1
λ1+ y2 λ2
n
1 +α(2e−
y1
λ1 −1)(2e−
y2 λ2 −1)
o
(1.35)
For both models, the marginal density function is given by:
fyi(yi) =
1 λie
−yi
λi
, i= 1,2. (1.36)
To help us decide between these two bivariate exponential distributions, it is interesting to
study the correlation values allowed by each distribution. In Gumbel’s I, the parameter that
measures the correlation between the durations of asset 1 and asset 2 isθ. From Gumbel [1960],
we know that the correlation is equal to:
corr(ε1, ε2) = 1−
1 θe
1/θEi
1 θ
where Ei(z) =
Z ∞
1
1 te
−tzdt with 0
6θ61 (1.37)
The correlation is zero for θ = 0 and decreases to a minimum of -0.40365 as θ increases to
1. So the correlation of two durations lies in [−0.40365,0] and cannot be positive, while the correlation between durations of two assets may be positive.
In Gumbel’s II, the parameter that measures correlation is α.
corr(X1, X2) = 14α, since |α|≤ 1, The correlation is −0.25 for α = −1 and increases to a
maximum of 0.25 asαincreases to 1. So the correlation between two durations is between -0.25
and 0.25. Since there are limitations with both Gumbel’s I and Gumbel’s II in modeling the
correlations, in the following section we introduce copulas as a different way to model the joint
distribution in order to cover broader range of correlation.
1.3.3 Copulas
Copulas provide a method of modelling the dependence between two or more random variables.
It is a useful analytic tool in a broad range of researches on the financial areas such as actuarial,
derivative pricing, portfolio management and risk modeling.
Although most of the financial and statistical applications arose in the last fifteen years, the
notion of copula was introduced and theoretical exploration started from 1959 by Abe Sklar in
In this study, we consider only the bivariate copula analysis. Some important definitions
and properties for bivariate copula will be discussed in the following. Most of the definitions
and theorems have analogous multivariate versions.
A two-dimensional copula is a function C mapping from Cartesian product of two closed
intervals [0,1]×[0,1] to interval [0,1] with the following properties:
1. For every u, v in [0,1],
C(u,0) = 0 =C(0, v) (1.38)
and
C(u,1) =u and C(1, v) =v; (1.39)
2. For every u1, u2, v1, v2 in [0,1] such that u1 ≤u2 and v1≤v2
C(u2, v2)−C(u2, v1)−C(u1, v2) +C(u1, v1)≥0. (1.40)
Property 1 says the copula function is called grounded. Property 2 means that it is a 2-increasing
function. The left hand side of equation (1.40) measures the mass or area of the rectangle
[u1, u2]×[v1, v2]. The 2-increasing functions assign non-negative mass to the rectangle in its
domain.
The fundamental Sklar’s theorem in Sklar [1959] which underpins all the copula analyses
says:
Let H be a joint distribution function with margins F and G; then there exists a copula C such
that for all x, y in ¯R,
H(x, y) =C(F(x), G(y)). (1.41)
¯
R denotes the extended real line [−∞,∞]. If F and G are continuous, then C is unique; oth-erwise, C is uniquely determined on RanF×RanG. Conversely, if C is a copula and F and G
are distribution functions, then the function H defined by equation(1.41) is a joint distribution
For the multi-dimensional copula, let H : Rn 7→ (0,1) be a joint distribution function
with margins H1, H2, ..., Hn. Then there exists a copula C : (0,1)n 7→ (0,1) such that for all
x∈Rn, u∈(0,1)n
H(x) =C(H1(x1), ..., Hn(xn)) =C(u) (1.42)
Conversely, if C: (0,1)n 7→(0,1) is a copula, then there exists a joint distribution functionH
with marginsH1, H2, ..., Hn such that for allx∈Rn, u∈(0,1)n
H(x) =H(H1−1(u1), ..., Hn−1(un)) =C(u) (1.43)
Furthermore, ifH1, H2, ..., Hn are continuous, then the copula C is unique.
Then we will discuss the definition of Frechet-Hoeffding bounds. Copulas satisfy the following
inequality:
W(u, v) =max(u+v−1,0)≤C(u, v)≤min(v, z) =M(u, v). (1.44)
for every point (u, v)∈[0,1]×[0,1].
i.e., if X and Y are random variables with a joint distribution H and marginal distribution
F and G, then for all x, y in ¯R,
max(F(x) +G(y)−1,0)≤H(x, y)≤min(F(x), G(y)) (1.45)
for every point (u, v)∈[0,1]×[0,1]. Copulas M and W are called the Frechet-Hoeffding upper
and lower bounds for joint distribution function H with margins F and G.
Similarly to distribution functions, copulas have the notion of density. The density c(u, v)
associated to a copulaC(u, v) is:
c(u, v) = ∂C(u, v)
∂u∂v (1.46)
For continuous random variable x and y, the copula density is related to the density of
h to the product of the marginal density g and f.
c(G(x), F(y)) = h(x, y)
g(x)f(y) (1.47)
So we have the following canonical representation:
h(x, y) =c(G(x), F(y))g(x)f(y) (1.48)
The canonical representation is very useful in deriving the joint density for given marginals.
The two most commonly used measures of association for two random variables are Spearman’s
rho (ρ) and Kendall’s tau (τ). If X and Y are random variables with marginal distribution
functions F and G, respectively, then Spearman’sρis the ordinary correlation coefficient of the
transformed random variables F(X) and G(Y). If X and Y are continuous, whose copula is C,
then the population version of Spearman’s ρ for X and Y is given by
ρ= 12
Z 1
0
Z 1
0
C(u, v)dudv−3 (1.49)
Kendall’sτ is the difference between the probability of concordanceP[(X1−X2)(Y1> Y2)>0]
and the probability of discordanceP[(X1−X2)(Y1 > Y2)<0] for two independent pairs (X1, Y1)
and (X2, Y2). Let (X1, Y1) and (X2, Y2) be continuous and with joint distribution functionsH1
andH2, respectively, with common margins F (ofX1andX2) and G (ofY1 andY2). LetC1 and
C2 denote the copulas of (X1, Y1) and (X2, Y2), respectively, so that H1(x, y) =C1(F(x), G(y))
and H2(x, y) =C2(F(x), G(y)). The population version of τ is:
τ = 4
Z 1
0
Z 1
0
C2(u1, u2)dC1(u1, u2)−1 (1.50)
In this study, we will use Gaussian and Frank copula. Gaussian copula is one of the elliptical
copulas. Elliptical copulas are a class of symmetric and elliptically contoured copulas. Symmetry
of elliptical copula is that one can specify different levels of correlation between the marginals.
The Gaussian copula function is proposed by Lee [1983] for modelling selectivity in the
con-text of continuous distributions. The Gaussian copula has a wide range of applications due to
its flexible property that it allows for equal degrees of positive and negative dependence and
at-tains both Frechet-Hoeffding lower and upper bounds as the dependence parameter approaches
-1 and 1. The Gaussian copula is
C(u1, u2;θ) =
Z Φ−1(u1) −∞
Z Φ−1(u2) −∞
1
2π(1−θ2)(1/2) ×exp
−
(s2−2θst+t2) 2(1−θ2)
dsdt (1.51)
The density of the Gaussian copula is
c(u1, u2;θ) =
1 √
1−θ2 exp
− 1
2(1−θ2)(z
2+w2−2θzw)
×exp
1 2(z
2+w2)
(1.52)
where z= Φ−1(u
1), w= Φ−1(u2), Φ is the cdf of the standard normal distribution and
corre-lation parameter θrestricted to the interval (−1,1).
The Gaussian copula is constructed from a bivariate normal distribution. θ is Pearson’s
correlation coefficient of Φ−1(F(X)) and Φ−1(G(Y)). When marginal distributions of X and Y are standard normal distribution, we have: X = Φ−1(F(X)), Y = Φ−1(G(Y)); therefore, θ is Pearson’s correlation of X and Y. However in this study, marginal distribution of X and Y are
assumed exponential distribution so thatX= Φ−1(F(X)) andY = Φ−1(G(Y)) don’t hold and θis no longer Pearson’s correlation of X and Y (see Kelly and Krzysztofowicz [1997]). The joint distribution derived with Gaussian copula and normal or non-normal marginal distributions is
called meta-Gaussian. See Abdous et al. [2003], Fang et al. [2002] for detailed discussions.
For Gaussian copula, kendall’s tau is:
τ = 2
πarcsin(θ) (1.53)
is discussed in length in Genest [1987]. It is a member of the single-parameter Archimedean
family of copulas. A copula belongs to the bivariate Archimedean family if it takes the form
C(u, v) =ϕ−1(ϕ(u) +ϕ(v)), (1.54)
whereϕis a generator function. Generator functions are a class of functions ϕ: [0,1]→[0,∞]
with continuous derivatives on (0,1) with the properties ϕ(1) = 0, ϕ0(t) < 0 and ϕ00(t) >0.
Different choices of generator yield several important families of Archimedean copulas, such as
Frank as we mentioned above, Clayton, Gumbel, to name a few.
The Frank copula is popular for some reasons: the dependence in both tails are symmetric;
it permits both positive and negative dependence like the Gaussian copula and Student-t
cop-ula, another member in elliptical family; both lower and upper Frechet-Hoeffding bounds are
attained in its range for dependence parameters close to −∞and ∞. The Frank copula is :
C(u1, u2;θ) =
1
θlog 1 +
(e−θu1−1)(e−θu2−1) e−θ−1
!
(1.55)
The density of the Frank copula is:
c(u1, u2;θ) =
−θ(e−θ−1)e−θ(u1+u2)
((e−θu1−1)(e−θu2−1) +e−θ−1)2 (1.56)
The dependence parameter θ may allow any real value (−∞,∞).
The application of copula theory in this paper is to use copulas to construct joint
distribu-tions generated with given marginals. This follows directly from the canonical representation
shown in equation(1.48); f and g are the densities of exponential distribution and c is the
density of Frank and Gaussian copulas respectively.
1.3.4 Maximum-Likelihood Estimation
L(y1,n, y2,n, ...y1,1,y2,1|θ, γ) =
Pn
i=1logpi(y1,i, y2,i|y1,i−1, y2,i−1, ..., y1,0, y2,0;θ, γ),wherenis
the number of sub-intervals on [0, T]. Intervals are constructed by the trade arrival times of
both asset 1 and asset 2. Every point in time with at least one transaction will generate an
interval. We can see that some intervals will corresponds to incomplete spells for an asset if it
is not transacted but the other asset is. So some intervals can be incomplete spells for asset 1
or 2. When it is complete, the probability for trade occurred is the density f(yi); when it is censored, the probability is the survival function S(yi).
Based on the situations that y1 and y2 are censored or not at t and t−1, there are nine
different cases to compute the joint probability of the observations on each interval. They are
listed below with figures showing the cases and followed by the equations for joint probabilities.
In the figure, ’0’ indicates that there is a trade at the time point, while ’1’ indicates that there
• Case 1:
Figure 1.1: Case 1: The horizontal lines represent time.t1 is the duration between two
consec-utive trades of asset 1;t2 is the duration between two consecutive trades of asset 2.
p=fy(y1=t1, y2 =t2) =fy1(y1=t1)fy2|y1(y2=t2|y1 =t1). (1.57)
• Case 2:
Figure 1.2: Case 2: t1 is the duration between two consecutive trades of asset 1; t2 is the
duration between two consecutive trades of asset 2; tm is the elasped time between previous trade of asset 2 and previous trade of asset 1.
p=fy(y1=t1, y2 =t2|y2 ≥tm) =fy1(y1 =t1)
fy2|y1(y2=t2|y1 =t1) fy2(y2≥tm)
• Case 3:
Figure 1.3: Case 3: t1 is the duration between two consecutive trades of asset 1; t2 is the
elasped time between previous trade of asset 2 and the current trade of asset 1.
p=fy(y1 =t1, y2 ≥t2) =fy1(y1 =t1)fy2|y1(y2 ≥t2|y1 =t1) (1.59)
• Case 4:
Figure 1.4: Case 4:t1is the duration between two consecutive trades of asset 1;t2is the elasped
time between previous trade of asset 2 and the current trade of asset 1;tm is the elasped time between previous trade of asset 2 and previous trade of asset 1.
p=fy(y1=t1, y2 ≥t2|y2 ≥tm) =fy1(y1 =t1)
fy2|y1(y2≥t2|y1 =t1) fy
2(y2 ≥tm)
• Case 5:
Figure 1.5: Case 5: t1 is the duration between two consecutive trades of asset 1; t2 is the
duration between two consecutive trades of asset 2; tt is the elasped time between previous trade of asset 1 and previous trade of asset 2.
p=fy(y1=t1, y2 =t2|y1 ≥tt) =
fy1|y2(y1=t1|y2 =t2) fy1(y1≥tt)
fy2(y2 =t2) (1.61)
• Case 6:
Figure 1.6: Case 6: t1 is the duration between two consecutive trades of asset 1; t2 is the
elasped time between previous trade of asset 2 and the current trade of asset 1;ttis the elasped time between previous trade of asset 1 and previous trade of asset 2.
p=fy(y1=t1, y2 ≥t2|y1 ≥tt) =fy1(y1 =t1)
fy2|y1(y2≥t2|y1 =t1) fy1(y1≥tt)
• Case 7:
Figure 1.7: Case 7: t1 is the elasped time between previous trade of asset 1 and the current
trade of asset 2; t2 is the duration between two consecutive trades of asset 2;ttis the elasped
time between previous trade of asset 1 and previous trade of asset 2.
p=fy(y1≥t1, y2 =t2|y1 ≥tt) =
fy1|y2(y1≥t1|y2 =t2) fy1(y1≥tt)
fy2(y2 =t2) (1.63)
• Case 8:
Figure 1.8: Case 8: t1 is the elasped time between previous trade of asset 1 and the current
trade of asset 2; t2 is the duration between two consecutive trades of asset 2.
• Case 9:
Figure 1.9: Case 9: t1 is the elasped time between previous trade of asset 1 and the current
trade of asset 2;t2 is the duration between two consecutive trades of asset 2;tm is the elasped
time between previous trade of asset 2 and previous trade of asset 1.
p=fy(y1≥t1, y2 =t2|y2 ≥tm) =
fy1|y2(y1≥t1|y2 =t2) fy
2(y2 ≥tm)
fy2(y2 =t2) (1.65)
The derivation of fucntions for log-likelihood computation with two bivariate exponential
distribution models,i.e.Gumbel’s I, Gumbel’s II, Frank copula and Gaussian copula, are given
in the Appendix A.1.
1.4
Result
1.4.1 Data
The high frequency data used in this study are from New York Stock Exchange Trade and Quote
(NYSE TAQ) database. They are available at the NCSU library electronic databases. Although
the title is NYSE TAQ, this database actually contains transactions information for all securities
listed on the following four exchanges: New York Stock Exchange (NYSE), American Stock
Exchange (AMEX), Nasdaq National Market System (NMS) and Nasdaq SmallCap Market.
Only NYSE is responsible for the release of the TAQ to the public.
of transaction, transaction price, trading volume. The quote database consists of time stamped
bid and ask quotes, the volume, and additional information on the validity of the quotes. NYSE
utilizes a combining mechanism of market maker and order book. So the quotes shown in the
quote database can be quotes posted by the market maker, limit order from market participants
or limit order submitted by traders in the trading floor. The time for quotes is not the exact
time when trades occurred. To match these two database, particular rules have to be adopted
to identify the quote for each transaction. Since at the current stage we are investigating the
trade duration, only the data from the trade database are considered here.
Our data are from NYSE TAQ’s consolidated trade database provided by Wharton Research
Data Services. Here the term ”consolidated” refers to the act of consolidating trade records
from four exchanges, while this term is also used as a process of identifying and combining
sub-transactions. Trading records on 09/02/2005 for the stocks of twenty four companies are
currently selected. This date was randomly chosen. TAQ reports trades which occurred during
the Consolidated Tape hours of operation (8:00am to 6:30pm ET from August 2000 and changed
to 4:00am to 6:30pm ET after March 4, 2004). In our study, we choose the regular trading hours
from 9:30 am to 4:00 pm ET.
In the data we see occasions that multiple trades are recorded at the same time and with
the same transaction price. They are trades from multiple buyers or sellers or split-transactions
occurring when the volume of an order on one side of the market is larger than the available
opposing orders. So one order matches against several orders on the opposite side. A common
strategy is to treat zero durations as one single transaction. This strategy is adopted in our
study.
1.4.2 Estimation results
The descriptive statistics of trade durations for each company are listed in Table 1.1.
Stocks of GE, Microsoft, Exxon and Walmart are among the most actively traded, with a
with a mean duration greater than one minute for the date picked. A majority of the companies
(21 out of 24) had a mean duration of less than 10 seconds. Based on the standard deviation
and coeffecient of variation shown in Table 1.1, the durations of stock of Coca Cola, Fisher
Scientific and Leap Wireless are more dispersed than the other stocks studied here. Skewness
is used to measure the asymmetric distribution of a random variable. A distribution which has
negative skewness is said to be left-skewed and a distribution which has positive skewness is
said to be right-skewed. All data used here (as shown in Table 1.1) has positive skewness, which
can be easily visualized in the histograms in Figure 1.11 at the end of the article. In Figure 1.11,
horizontal axes represent durations in seconds and vertical axes stand for frequencies. Minimum
durations of Cisco and Microsoft are zero, because there are trades occurred for these two stocks
at the opening time of market.
The parameters are estimated by maximizing the log-likelihood and the estimation is
per-formed in Matlab. The results for the 14 pairs of assets are showed in Table 1.2 for the model
with Gumbel’s I distribution and in Table 1.3 for the Gumbel’s II distribution.
In Table 1.2, the estimates of parameters from the Gumbel’s bivariate exponential
distri-bution are shown in the sequence of I1, α1, β1, I2 α2, β2, θ, and the standard deviations are
shown in parenthesis below the corresponding parameters. Standard deviations of these
esti-mates suggest that these estiesti-mates are significant.
Comparing the results in Tables 1.2 and 1.3 we see the pattern for the estimates of α, β
and I:αs are close to zero; the βs are close to one and α+β is close to but less than one. α
measures the reaction of expected duration to the last duration, β measure the persistence in
the conditional expectation of trade durations. Since β is close to one, the persistence is high,
i.e.a long duration will be more possibly followed by a long duration and a short duration more
possibly followed by a short duration. The effect of the one period duration takes long time to
die out.
In Table 1.2,θs for all stocks but one pair (WMT&KR) are equal to 0.0001, which correspond
value of 0.0001 suggests that true correlation between the durations is equal to, if not larger
than, zero. In Model II, most of the estimated αs are positive showing a positive correlation
of durations for two assets, except for three pairs (PEP & CCE, F & GM and S & LEAP).
These three pairs have negative correlations according to the values ofα. For PEP & CCE, the
α value of -0.2174 means a correlation of -0.054; for F & GM, the α value of -0.0358 suggests
a correlation of -0.00895 and for S & LEAP, the α value of -0.2102 indicates a correlation of -0.052. The correlations are negative but not large in magnitude.
Here are some scenarios for positive and negative correlation of two durations. For example,
Pepsi Cola and Coca Cola are both in the soft drink industry. When there is good news for Pepsi,
market participants may be more interested in Pepsi and pay less attention to Coca Cola. We
see more trades and shorter durations for Pepsi, while less trades and longer durations for Coca
Cola. So the correlation between these two is negative. When there is good news for one stock
and bad news for the other or good news for a whole soft drink industry, market participants
would buy the one with good news and dump the one with bad news or purchase both stocks.
So the durations for both stocks are shorter when they are traded more frequently. Thus the
correlation between these two stocks is positive. As we can see these two models are restrictive
because they can only measure a certain degree of correlations. Estimates of the correlation
parameters hit the upper bounds for both models.
The results for the 14 pairs of assets are reported in Table 1.4 for the Frank copula model
and in Table 1.5 for the Gaussian copula model. The standard deviations are listed below their
corresponding parameters. All the estimates of parameters reported in the table are significant.
Similar to the estimates from Gumbel’s I and II bivariate exponential distribution models, the
α’s are very small and close to zero andβ’s are close to one, the sums of α and β are close to but less than 1. The value ofβs suggest that the Frank copula and Gaussian copula models are
very persistent. As for the dependence parameter θ, both Frank copula and Gaussian copula
ρ computed for both copulas, we find that all the Spearman’s ρ of Gaussian copula model are
larger than those of Frank copula; and all Kendall’s τ of Gaussian copula model are greater
than those of Frank copula except three pairs of stocks: IBM &MSFT, CSCO &TXN and GE
&BA.
1.4.3 Goodness-of-fit test
We are interested in evaluating these four different bivariate density models. Pearson’s
Chi-Squared test, Anderson-Darling test and the approach involving the distance between estimated
parametric copula and nonparametric empirical copula, among others, were used as model
selection criteria in the literature for the models with copulas. Our case includes two Gumbel’s
bivariate models and two models constructed using copulas.
Rivers and Vuong [2002] proposed a model selection test for two competing nonlinear
dy-namic non-nesting models. This test can be used in very general situation such as: incompletely
specified model, a broad class of estimation methods,and model selection criteria other than
estimation methods. It applies to the dynamic time series data without i.i.d. assumption for
the observations.
Rivers & Vuong test statistic is:
Tn= √
n ˆ σn
(Q1n(ω,ˆγn1)−Q2n(ω,ˆγn2)) (1.66)
whereQin(ω,γˆin) is the mean of negative log-likelihood−(n1) logfni(X1, ...Xn, γi),i= 1,2
ˆ
σ2n= ˆR0nVˆnRˆn, (1.67)
ˆ
Rn= [−1,0,1,0]0, (1.68)
ˆ Vn=
wn0
n n
X
t=1
ˆ
UntUˆnt0 + 1 n
mn
X
τ=1
wnτ n
X
t=τ+1
( ˆUntUˆn,t0 −τ+ ˆUn,t−τUˆnt0 ), (1.69)
ˆ
Unt = L1n(ω,γˆn1)− 1 n
n
X
t=1
logf1(ωt,ˆγn1), ∂L1
n(ω,γˆn1) ∂γ1
n
,L2n(ω,γˆ2
n)− 1 n
n
X
t=1
logf2(ωt,γˆ2n), ∂L2
n(ω,γˆn2) ∂γ2
n
!0
(1.70)
whereL1
n and L2n are the log-likelihood for model 1 and 2 respectively,fi(ωt,γˆni) is the condi-tional probabilities of observations forith model.
The hypotheses are defined as:
H0: limn→∞
√
nQ¯1n(¯γn1)−Q¯2n(¯γn2) = 0 H1: limn→∞
√ n¯
Q1
n(¯γn1)−Q¯2n(¯γn2) =−∞ H2: limn→∞
√ n¯
Q1n(¯γn1)−Q¯2n(¯γn2) = +∞ Let α denote the desired size of the test and zα
2 the value of the inverse standard normal
dis-tribution function evaluated at 1−α
2. IfTn<−zα2, we rejectH0 in favor of H1; IfTn> z
α
2, we
rejectH0 in favor of H2; Otherwise, we accept H0.
The test results are reported in Table 1.6. In this table, t FG is the test statistic for two
competing models Frank copula and Gaussian copula models. F stands for Frank copula model,
G for Gaussian copula model, 1 for Gumbel’s I model and 2 for Gumbel’s II model. Positive
test statistic means the second model is better. For instance, the t FG for AIG &ALL is 5.4640
which suggests Gaussian copula model is better than Frank copula model. From the value of
test statistics, we see in most of the tests, the null hypothesis that the two models are the equal
are rejected at 5% significance level. In most cases, Gaussian copula model is better than Frank
copula and Gumbel’s I models; Gumbel’s II model is better than Gumbel’s I and Frank copula
model. The results of competitions between the pair of Gaussian copula and Gumbel’s II model
and the pair of Frank copula and Gumbel’s I model are mixed. As we know, Gumbel’s II model
is restrictive because it can only measure the linear correlation in [−0.25,0.25], but it turns
out that it is still a good model. It might be the reason that the actual correlation of trade
1.4.4 Diurnal effect
Some studies find durations between trades like other time series data display a cyclic pattern of
changes during certain times of the day. This pattern is called time-of-day effect or diurnal effect.
The trade durations feature a strong diurnal effect, with durations smaller at the opening and
closing time of the market than around noon. During noon, transactions become less frequent
since traders go out for lunch. Some macroeconomic news or news from stocks companies are
released when trades are closed so market participants would rush to buy or sell the stocks
at the starting of the coming trading day. When it is near the close of the market, market
participants would wish to close their positions instead of waiting.
Seasonality plots for trade durations of stocks IBM, MSFT, PEP, GE, BA, CSCO and CCE
are included in Figure1.10. Cubic spline function is used for data smoothing on each interval.
Each node is computed as the average of all the trade durations over the next 2,340 seconds
and the last node is the average of the last 100 durations,i.e.trading time in our data is from
9:30am to 4:00 pm, which is totally 23,400 seconds, so we have 11 nodes. The average of trade
durations over each interval can be taken as value of vertical axis of either left or right endpoint
of interval. Here we choose left endpoint.
From these plots, we can see a clear inverted U shape pattern. The durations during the
start and end of the day are much smaller and there are increases in the durations during the
middle of a day.
Table 1.7, 1.8, 1.9 and 1.10 report the estimated parameters and their standard deviation
for Gumbel’s I, Gumbel’s II, Frank copula and Gaussian copula models using deseasonalized
data for five pairs of stocks. The deseasonalized data is obtained by dividing the raw data with
the piecewise spline function.
The patterns of α and β are quite similar to those from seasonal data and the sums of α
and β pair are close to but less than 1. All the estimated parameters are significant. However,
theβs are smaller than those estimated from seasonal data, which means the model persistence
1.5
Conclusion
We propose a bivariate ACD model for the durations between trades of two assets. With this
model, bivariate conditional covariance between trade durations can be derived. Our results
demonstrate there is correlation for durations between trades of two assets. Gaussian copula
and Gumbel’s II agree with each other in correlation estimator most of the time. Among the four
bivariate exponential distribution used, Rivers and Voung test results indicate that Gaussian
copula and Gumbel’s II are better than Frank copula and Gumbel’s I; however the comparison
between Gaussian copula and Gumbel’s II are mixed. Removal of seasonality from duration data
Table 1.1: Descriptive statistics of trade durations (in seconds)
Companies obs Mean Std CV Skewness Min Max
AIG 6452 3.63 4.29 1.18 3.56 1 66
ALL 5079 4.61 5.57 1.21 3.11 1 79
BA 7326 3.19 3.56 1.12 3.34 1 49
BAC 6652 3.52 3.96 1.13 5.4 1 108
C 7652 3.06 3.61 1.18 12.9 1 161
CCE 1887 12.39 19.19 1.55 4.49 1 300
COP 7687 3.04 3.71 1.22 13.05 1 164
CSCO 8911 2.63 2.61 0.99 3.01 0 30
F 3717 6.29 7.36 1.17 3.4 1 106
GE 9089 2.57 2.33 0.91 2.73 1 23
GM 4786 4.89 6.55 1.34 3.91 1 83
GS 3174 7.37 9.73 1.32 3.31 1 129
GSK 1446 16.18 19.79 1.22 2.5 1 152
IBM 5531 4.23 4.88 1.15 3.58 1 71
KR 3734 6.27 10.13 1.62 14.33 1 373
LEAP 264 88.54 140.88 1.59 2.46 1 934
MSFT 10181 2.3 2.28 0.99 3.3 0 30
PEP 3499 6.69 8.56 1.28 2.97 1 87
PFE 7319 3.2 3.09 0.97 2.57 1 30
S 4293 5.45 6.42 1.18 3.52 1 101
TGT 4806 4.87 6.35 1.30 5.26 1 151
TXN 5607 3.95 4.88 1.24 3.51 1 54
WMT 8212 2.85 2.85 1.00 4.63 1 77
XOM 9063 2.58 2.97 1.15 17.31 1 156
Table 1.2: Estimation results for the Gumbel’s I.
Companies I1 α1 β1 I2 α2 β2 θ
IBM &MSFT 0.0979 0.0468 0.9301 0.0853 0.0736 0.8897 0.0001
(0.0134 ) (0.0036 ) (0.0025 ) (0.0089 ) (0.0044 ) (0.0038 ) (0.0068 )
PEP &CCE 0.0123 0.0181 0.9800 0.0364 0.0189 0.9780 0.0001
(0.0075 ) (0.0014 ) (0.0004 ) (0.0190 ) (0.0018 ) (0.0006 ) (0.0083 )
F &GM 0.2602 0.0721 0.8870 0.0142 0.0267 0.9705 0.0001
(0.0361 ) (0.0064 ) (0.0064 ) (0.0050 ) (0.0014 ) (0.0007 ) (0.0085 )
AIG &ALL 0.0289 0.0387 0.9535 0.2092 0.0500 0.9046 0.0001
(0.0061 ) (0.0022 ) (0.0012 ) (0.0226 ) (0.0048 ) (0.0042 ) (0.0071 )
S&LEAP 0.0399 0.0237 0.9689 7.0293 0.2436 0.6982 0.0001
(0.0103 ) (0.0021 ) (0.0008 ) (2.1282 ) (0.0635 ) (0.0457 ) (0.0078 )
WMT&TGT 0.0178 0.0249 0.9688 0.0670 0.0345 0.9516 0.0001
(0.0040 ) (0.0016 ) (0.0006 ) (0.0112 ) (0.0026 ) (0.0015 ) (0.0057 )
CSCO&TXN 0.0221 0.0412 0.9506 0.1571 0.0573 0.9031 0.0001
(0.0045 ) (0.0022 ) (0.0011 ) (0.0087 ) (0.0030 ) (0.0040 ) (0.0072 )
COP&XOM 0.0531 0.0362 0.9461 0.0843 0.0674 0.9000 0.0001
(0.0071 ) (0.0026 ) (0.0014 ) (0.0097 ) (0.0042 ) (0.0035 ) (0.0056 )
GSK &PFE 0.4220 0.0293 0.9444 0.0789 0.0435 0.9319 0.0001
(0.0935 ) (0.0059 ) (0.0034 ) (0.0100 ) (0.0034 ) (0.0021 ) (0.0083 )
BAC &C 0.0813 0.0425 0.9342 0.0068 0.0137 0.9840 0.0001
(0.0103 ) (0.0032 ) (0.0021 ) (0.0025 ) (0.0009 ) (0.0002 ) (0.0053 )
BAC &GS 0.0812 0.0424 0.9343 0.1758 0.0466 0.9297 0.0001
(0.0103 ) (0.0033 ) (0.0021 ) (0.0285 ) (0.0044 ) (0.0034 ) (0.0076 )
GS &C 0.0068 0.0137 0.9840 0.1757 0.0465 0.9297 0.0001
(0.0063 ) (0.0010 ) (0.0003 ) (0.0230 ) (0.0089 ) (0.0012 ) (0.0041 )
WMT&KR 0.0174 0.0246 0.9699 0.0100 0.0142 0.9840 0.0008
(0.0041 ) (0.0016 ) (0.0005 ) (0.0050 ) (0.0009 ) (0.0003 ) (0.0024 )
GE &BA 0.0172 0.0251 0.9682 0.0054 0.0225 0.9758 0.0001
Table 1.3: Estimation results for the Gumbel’s II.
Companies I1 α1 β1 I2 α2 β2 α
IBM&MSFT 0.1081 0.0424 0.9127 0.0583 0.0455 0.9098 1.0000
(0.0111) (0.0027) (0.0030) (0.0050) (0.0021) (0.0023) (0.0708)
PEP&CCE 0.0102 0.0187 0.9807 0.0374 0.0198 0.9779 -0.2174
(0.0077) (0.0014) (0.0004) (0.0202) (0.0019) (0.0007) (0.0448)
F&GM 0.2630 0.0731 0.8864 0.0141 0.0267 0.9706 -0.0358
(0.0367) (0.0064) (0.0064) (0.0050) (0.0014) (0.0007) (0.0468)
AIG&ALL 0.0242 0.0342 0.9566 0.1760 0.0442 0.9126 0.2815
(0.0052) (0.0018) (0.0010) (0.0189) (0.0041) (0.0036) (0.0490)
S&LEAP 0.0493 0.0270 0.9661 7.3330 0.2397 0.7028 -0.2102
(0.0122) (0.0025) (0.0009) (2.2028) (0.0634) (0.0461) (0.0697)
WMT&TGT 0.0160 0.0185 0.9693 0.0601 0.0268 0.9496 1.0000
(0.0027) (0.0011) (0.0005) (0.0078) (0.0020) (0.0014) (0.0917)
CSCO&TXN 0.0350 0.0306 0.9449 0.1560 0.0440 0.8938 1.0000
(0.0034) (0.0005) (0.0009) (0.0077) (0.0005) (0.0038) (0.1444)
COP&XOM 0.0591 0.0284 0.9379 0.0620 0.0446 0.9120 1.0000
(0.0054) (0.0018) (0.0015) (0.0057) (0.0023) (0.0024) (0.0518)
GSK&PFE 0.3790 0.0199 0.9415 0.0733 0.0347 0.9247 0.9860
(0.0622) (0.0038) (0.0031) (0.0071) (0.0023) (0.0021) (0.0381)
BAC&C 0.0342 0.0221 0.9595 0.0204 0.0183 0.9631 1.0000
(0.0047) (0.0013) (0.0008) (0.0026) (0.0009) (0.0004) (0.0816)
BAC&GS 0.0367 0.0257 0.9588 0.1323 0.0380 0.9375 0.5263
(0.0056) (0.0018) (0.0009) (0.0214) (0.0033) (0.0026) (0.0531)
GS&C 0.0122 0.0130 0.9787 0.0990 0.0320 0.9440 0.8879
(0.0049) (0.0008) (0.0005) (0.0142) (0.0048) (0.0009) (0.0169)
WMT&KR 0.0162 0.0173 0.9705 0.0101 0.0127 0.9813 0.9616
(0.0026) (0.0010) (0.0005) (0.0038) (0.0008) (0.0003) (0.0661)
GE&BA 0.0133 0.0176 0.9710 0.0067 0.0183 0.9739 1.0000