5. METHODOLOGY
5.1. Analysis of methods
When looking at the relationship between short selling and stock returns the most commonly used models are often forms of multiple regression models. Daske et al. (2005) look at the relationship between short selling and news events to find whether short sellers are able to predict these news events and are subsequently able to predict returns. They use a multiple regression model. Aitken et al. (1998) analyze the relationship between short selling and market reactions to see whether short selling can predict stock returns. Using a multiple regression model. Desai et al. (2002) use a multiple regression model to examine the relationship between short interest levels and stock returns. Christophe et al. (2004) explore the relationship between short selling and earnings announcements with multiple regression.
The appearance of regression models in previous studies is not surprising. Regression is a very common tool for analyzing the relationships among different variables, it looks at the relationship between a dependent variable and one or more independent variables. A regression can find directional relationships and it can also determine how much the values of these variables are influenced by each other. There are different forms of regression analyses. For example, logistic regression, linear regression and non-linear regression. Logistic regression is used when the dependent variable is non-metric. It could involve a dichotomous variable that only has two outcomes, for example, gender can only be male or female. Logistic regression uses a maximum likelihood estimate to measure the probability that an observation falls within one of the two possible outcomes. Linear regression is used when the dependent variable is metric. Linear regression looks at the relationship between a dependent variable and one or more independent variables. When the model contains one independent variable it is a simple linear regression and when the model contains multiple independent variables it is multiple linear regression. Linear regression is one of the first and most common used regression models. One of the most common linear regression models is the Ordinary Least Squares (OLS) method, which fits the model using a least squares
method that aims to minimize the sum of square differences between the observed and predicted values of the model. A non-linear regression is a form of regression where the data cannot be fitted according to a linear model and have to be fitted using a non-linear model. When a model is non-linear there are many possible solutions to fit the model which can make it difficult to find the best one. Looking at previous studies it is evident that OLS is one
29 of the most frequent methods of analysis. OLS has the advantage that it is fairly simple to use. However, there are several assumptions that have to be met to be able to produce relevant results. Firstly, OLS requires that the observations are independent of each other. This means that the error term of one observation should not be able to predict the following observation, this often occurs with time series data. Secondly, there needs to be a linear relationship between the dependent variable and each of the independent variables. Thirdly, the data in OLS needs to show homoscedasticity. This means that the error terms should show equal variance along the model line. Fourthly, the data must not show multicollinearity. This means that the independent variables are not allowed to be correlated with each other. Finally, the error terms must be approximately normally distributed.
When looking at the relationship between short selling and stock volatility the most common methods of research are various forms of regressions such as time series and panel
regressions. We see multiple regression models used by Diether et al. (2009), Christophe et al. (2010), and Saffi and Sigurdsson (2011).
However, some conditional volatility models are also popular. Conditional volatility models take into account that volatility is not constant throughout time. These models use
assumptions that volatility is conditional on some additional factors, which can in theory give them the potential to measure volatility more accurately. Two popular forms of conditional volatility models are Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models. Henry and McKenzie (2006) use an ARCH model to look at the relationship between trading volume and volatility in the Chinese stock market. An ARCH model is a statistical model used for time series data it focuses on the variance of the error terms and assumes that the error terms are
heteroskedastic. It is assumed that the variance of the error terms is affected by the variances of the error terms of previous periods, this is called auto regression (AR). Furthermore, when the variances of the error terms are influenced by other variables it is considered conditional. A GARCH model is used by Baklaci et al. (2016) to look at the causality between short selling and volatility. A GARCH model is very similar to an ARCH model. However, instead of assuming an autoregressive model for the variances of the error terms, the GARCH model assumes an autoregressive moving average (ARMA) model. The difference between an AR and an ARMA model is that an AR model measures error variance based on its own output from the past, whereas an ARMA model measures error variance based on its own output and input from the past.
30