Autoregressive model - Empirical framework

4.5 Empirical framework

4.6.1 Autoregressive model

For the first test, the sentiment indicators are based on all market reports which have explicitly discussed the U.K. commercial real estate market. Only those from the collected articles that belong to the Capital Markets, CRE, Investment or Office category within England, Scotland, Wales and Northern Ireland have been considered. This has reduced the number of reports significantly.

A total of four sentiment indicators have been constructed. In all cases, the indicator represents the overall average from all sentiments per document. So, each indicator is based on the mean value of positive and negative words per document.

For the first analysis, I use the IPD Total Return Index for all properties as the dependent variable. Table 4:6 illustrates the results of the four textual sentiment indicators in the AR (1)

model. The base model which only relies on the lagged version of the dependent variable reaches an R-squared value of 0.586. The only independent variable is highly significant at a 1%

level, while the constant is insignificant. The base model uses a total of 43 observations. Running the standard statistical tests, I encountered heteroscedasticity in the base model. Therefore, the reported errors are robust and control for the presence of heteroscedasticity.

Looking at the four textual sentiment indicators, only the TM indicator is able to produce a significant coefficient at the 1% level. Unexpectedly, the sign is negative. Meaning that an increase in the sentiment has a negative influence on the total return. Different to the base model all four textual sentiment models show highly significant independent variables. For the TM model, the R-squared value lies at 0.796, which is a significant improvement upon the base model. Even though the remaining models failed to produce significant sentiment coefficients, they also show significantly higher R-squared values, ranging between 0.689 (BING) and 0.712 (NRC). All textual sentiment indicators enter the autoregressive model with three lags. This number has been estimated by reducing the AIC.

For comparison reasons, I have further added the previously constructed sentiment indicators. It can be seen that only the macroeconomic sentiment measure is able to produce a significant sentiment coefficient at the 10% level. Again, the coefficient has a negative sign which is unexpected at this stage. The constant for all three models remains insignificant. These indicators have also entered the model with different lags. Comparing the R-squared values, both the macroeconomic (0.637) and the Google Trends measure (0.598) show a marginal improvement on the base model.

The second analysis tests those indicators which have been constructed with the help of all office market reports. As described before the number of reports has been dropped to 619.

Table 4:7 illustrates the results of the autoregressive model. The dependent variable is now the IPD total return index for office properties. The overall results have been improved compared to the previous analysis. The coefficient of the independent variable in the base model is highly significant at the 1% level. The constant, however, remains insignificant. The R-squared value is now 0.636.

Looking at the textual sentiment indicators, the results for the four coefficients have been improved. The coefficients of the AFINN and the BING model are highly significant at the 1%

level. The TM model shows a significance at the 5% level. Only the latter model has all components significant. Comparing the R-squared values the TM model once more produced the highest value at 0.833. Both the AFINN and the BING model have an R-squared value of

0.721. Similar to the above-presented results, all significant coefficients have a negative sign, which is somewhat surprising.

Again, the previously constructed sentiment indicators have been added. Different to the textual sentiment indicators no improvement upon the first analysis can be observed. Only the macroeconomic indicator is significant at the 5% level. The model reaches an R-squared value of 0.675, which when compared to the textual sentiment indicators is somewhat marginal in terms of improvement.

The last point, which is worth mentioning, is the fact that all indicators enter the model with at least one lag. This seems reasonable since the market reports are a description of the past.

Most of them are further not published immediately but more than a quarter behind the described market development.

Table 4:6 - Result for the AR (1) model: overall commercial document corpus

VARIABLES Labels Base Model Macroeconomic Sentiment Office Sentiment Google Trends AFINN BING NRC TM

dlipdtrall = L, IPD total return all properties (first

differences of the log) 0.761*** 0.625*** 0.743*** 0.716*** 0.607*** 0.614*** 0.610*** 0.542***

Robust standard errors in brackets *** p<0.01, ** p<0.05, * p<0.1

Note 4.11: The table shows the result of the overall commercial real estate corpus for the U.K. The dependent variable is the IPD total return index for all properties. The textual sentiment indicators use 897 market reports including the following categories: capital markets, CRE, investment and office.

Table 4:7 - Result for the AR (1) model: all office related market reports

VARIABLES Labels Base Model Macroeconomic Sentiment Office Sentiment Google Trends AFINN BING NRC TM

dlipdtroff = L, IPD total return all offices (first differences

of the log) 0.795*** 0.728*** 0.746*** 0.756*** 0.765*** 0.764*** 0.766*** 0.622***

Robust standard errors in brackets *** p<0.01, ** p<0.05, * p<0.1

Note 4.12: The table shows the result for the office corpus for the U.K. The dependent variable is the IPD total return index for all offices. The textual sentiment indicators use 619 market reports.

The last autoregressive model uses the IPD total return index for all offices in the City of London. The results have once more slightly improved upon the first two models, although the base model still does not provide a significant constant and the R-squared value has improved up to 0.64. The independent variable remains highly significant.

Looking at the textual sentiment indicators again the AFINN, the BING and the TM model have significant sentiment coefficients. This time, however, no model produces a significant constant. The AFINN and the BING model with their highly significant sentiment coefficients outperform the TM and the remaining models. The AFINN model reaches an R-squared value of 0.744 followed by the BING model (0.742). The contribution of the TM model is this time a bit smaller, and the goodness of fit measure only reaches a value of 0.713. Despite the inadequate model specification, the NRC model also outperforms the base model. This time the AFINN and the BING model reveal the expected sign, while the remaining models still have a negative impact on the dependent variable.

Comparing the indirect sentiment measures to the textual sentiment measures, it can be seen that this time two of the three models are significant. The macroeconomic sentiment model has a highly significant coefficient at the 1% level and reaches an R-squared value of 0.714. The second significant model (5% level) is the Google Trends model with an R-squared of 0.662.

While before all sentiment induced models entered the model with at least one lag, this time both the AFINN and the BING model show the smallest AIC value with no lag.

Table 4:8 - Result for the AR (1) model: all office related market reports for London

VARIABLES Labels Base Model Macroeconomic Sentiment Office Sentiment Google Trends AFINN BING NRC TM

dltret_office_city = L, IPD total return all offices in the City of

London (first differences of the log) 0.799*** 0.710*** 0.756*** 0.787*** 0.558*** 0.655*** 0.764*** 0.748***

Robust standard errors in brackets *** p<0.01, ** p<0.05, * p<0.1

Note 4.13: The table shows the result for the office corpus for London. The dependent variable is the IPD total return index for all offices in the City of London. The textual sentiment indicators are based on 150 market reports.

To conclude, the analysis of the three different sub corpora has shown that the focus on a more precise topic within the documents has helped to improve the statistical values. All sentiment induced models were able to outperform the base model. While for the first two the best results have been achieved by using the TM model, the last has shown further improvement of the other models: AFINN and BING. The NRC model, on the other hand, did not produce any significant coefficient. The comparison of the different sentiment indicators has further shown that those indicators, which are based on indirect sentiment measures, fail to outperform the textual sentiment indicators. This result was not entirely expected but does provide an interesting observation.

In document Essays on sentiment: an analysis of the commercial real estate market (Page 171-178)