This appendix reproduces algorithm for VPIN computation based on the Tick-Rule trade classification (TB in our classification) as implemented in ELO (2011c) and used in ELO (2011a, 2011b).
1. ALGORITHM FOR COMPUTING THE-VPIN METRIC
In this appendix, we describe the procedure to calculate Volume-Synchronized Probability of Informed Trading, a measure we called the VPIN informed trading metric. Similar results can be reached with more efficient algorithms, such as re-using data from previous iterations, performing fewer steps or in a different order, etc. But we believe the algorithm described below is illustrative of the general idea. One feature of this algorithm to note is that we classify all trades within each one minute time bar as either buys or sells using a tick test. We do not have data which directly identifies trades as buyer- initiated or seller-initiated so some classification procedure is necessary. One could classify each trade separately or one could classify trades in groups of an alternative size (based on either time or volume). Different schemes will lead to different levels of VPIN. We have used a variety of schemes and, as one would expect, cutting the data more finely leads to reduced levels of VPIN– measured trade becomes more balanced when groups of trades in say a one-minute time bar may be classified differently. However, our focus is on how rare a particular VPIN is relative to the distribution of VPINs derived from any classification scheme, and this is unaffected by the classification schemes we have examined (including trade-by-trade as well as groups based on one-tenth of a bucket). We focus on one minute time bars as this data is less noisy, more widely available and easier to work with.
1.1. INPUTS
1. Time series of transactions of a particular instrument (Ti, Pi, Vi) :
a. Ti: Time of the trade.
b. Pi: Price at which securities were exchanged.
c. Vi: Volume exchanged.
2. V : Volume size (determined by the user of the formula). 3. n: Sample of volume buckets used in the estimation.
Pi, Vi, V , n are all integer values. Tiis any time translation, in integer or double format, sequentially
increasing as chronological time passes.
1.2. PREPARE EQUAL VOLUME BUCKETS 1. Sort transactions by time ascending: Ti+1 ≥ Ti, ∀i.
2. Expand the number of observations by repeating each observation Pi as many times as Vi. This
generates a total of I =P
iVi observations Pi.
3. Re-index Pi observations, i = 1, . . . , I.
4. Initiate counter: τ = 0. 5. Add one unit to τ .
6. If I < τ V , jump to step 10 (there are insufficient observations).
a. A transaction i is a buy if either: i. Pi> Pi−1, or
ii. Pi= Pi−1and the transaction i − 1 was also a buy.
b. Otherwise, the transaction is a sell. 8. Assign to variable VB
τ the number of observations classified as buy in step 7, and the variable VτS
the number of observations classified as sell. Note that V = VB τ + VτS.
9. Loop to step 6.
10. Set L = τ − 1 (last bucket is always incomplete or empty, thus it will not be used). 1.3. APPLY VPINs FORMULA
If L ≥ n, there is enough information to compute V P INL= PL τ =L−n+1|VτS−VτS| PL τ =L−n+1(VτS+VτS) = PL τ =L−n+1|VτS−VτS| nV .
B
TC-VPIN Algorithm
This appendix reproduces algorithm for VPIN computation based on the Bulk-Volume Clas- sification (BVC) (TC in our classification) as implemented in ELO (2012c, on-line Appendix) and ELO (2012b).
1. ALGORITHM FOR COMPUTING THE-VPIN METRIC
In this appendix, we describe the procedure to calculate Volume-Synchronized Probability of Informed Trading, a measure we called the VPIN flow toxicity metric. Similar results can be reached with more efficient algorithms, such as re-using data from previous iterations, performing fewer steps or in a different order, etc. But we believe the algorithm described below is illustrative of the general idea. One feature of this algorithm is that we apply a probabilistic approach to classify the volume exchanged within each 1-minute time bars. We cannot expect users to have data which unequivocally identifies trades as buyer-initiated or seller-initiated so some classification procedure is necessary. One could classify each trade separately or one could classify trades in aggregates of an alternative size (based on either time, number of trades or volume bars). Different schemes will lead to different levels of VPIN. We have used a variety of schemes and conclude that data aggregation leads to better flow toxicity estimates than working on raw transaction data.28 Data granularity has an impact on the magnitude
of VPIN levels, however our focus is on how rare a particular VPIN is relative to the distribution of VPINs derived from any classification scheme, and this is unaffected by the classification schemes we have examined (including trade-by-trade as well as groups based on one-tenth of a bucket). We focus on 1-minute time bars as this data is less noisy, more widely available and easier to work with. 1.1. INPUTS
1. Time series of transactions of a particular instrument (Ti, Pi, Vi) :
a. Ti: Time of the trade.
b. Pi: Price at which securities were exchanged.
c. Vi: Volume exchanged.
2. V : Volume size (determined by the user of the formula). 3. n: Sample of volume buckets used in the estimation. 28
Pi, Vi, V , n are all integer values. Ti is any time translation, in integer or double format, sequentially
increasing as chronological time passes.
1.2. PREPARE EQUAL VOLUME BUCKETS 1. Sort transactions by time ascending: Ti+1 ≥ Ti, ∀i.
2. Compute ∆Pi, ∀i.
3. Expand the number of observations by repeating each observation ∆Pias many times as Vi. This
generates a total of I =P
iVi observations ∆Pi.
4. Re-index ∆Pi observations, i = 1, . . . , I.
5. Initiate counter: τ = 0. 6. Add one unit to τ .
7. If I < τ V , jump to step 11 (there are insufficient observations). 8. ∀i ∈ [(τ − 1)V + 1, τ V ] , split volume between buy or sell initiated:
a. VB τ = Pτ V i=(τ −1)V +1Z ∆P i σ∆P b. VS τ = Pτ V i=(τ −1)V +1 h 1 − Z∆Pi σ∆P i = V − VB τ 9. Assign to variable VB
τ the number of observations classified as buy in step 8, and the variable VτS
the number of observations classified as sell. Note that V = VτB+ VτS. 10. Loop to step 6.
11. Set L = τ − 1 (last bucket is always incomplete or empty, thus it will not be used). 1.3. APPLY VPINs FORMULA
If L ≥ n, there is enough information to compute V P INL= PL τ =L−n+1|V S τ−VτS| PL τ =L−n+1(VτS+VτS) = PL τ =L−n+1|V S τ −VτS| nV .