A Framework for Fitting and Analyzing Stochastic Processes

Software Support

CHAPTER 9. SOFTWARE SUPPORT

9.4. A Framework for Fitting and Analyzing Stochastic Processes

0.001 0.01 0.1 1

0 2 4 6 8 10

probability

queue length

Trace Dist ARTA MAP CHEP

(a) Queue length distribution for the server

1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1

0 1 2 3 4 5 6 7 8

probability

queue length

Trace Dist ARTA MAP CHEP

(b) Queue length distribution for one interface of a router

Figure 9.9.: Queue length distributions for the model from Figure 9.7

distribution of the server’s network interface. Again, it is visible that correlated events increase the probability of a larger queue length and that the stochastic processes pro-vide a better approximation of the tail of the queue length distribution than uncorre-lated arrivals.

For the first router the load is distributed between several different interfaces con-nected to the different clients and thus, the effect of the correlated packets is not as significant as for the server. Nevertheless, one can see from Figure 9.9(b) that the maximal queue length increases also for the interfaces of the router.

The two example models demonstrate how stochastic processes can easily be inte-grated into simulation models using theArrivalProcessmodule. The second exam-ple from Figure 9.7 shows that the module can be incorporated into existing network models with only slight modifications of the model and thus, it can act jointly with the existing frameworks for network modeling.

Moreover, the results from the example models clearly demonstrate the importance of incorporating autocorrelation into input models. In particular, the results of the ARTA processes with exponential marginal distribution compared to the same inde-pendent exponential distribution show that adding a few lags of autocorrelation might help to improve the quality of simulation models significantly.

9.4. A Framework for Fitting and Analyzing Stochastic Processes

In the previous sections several additions to the existing toolkit^ProFiDo[12, 14] have been presented that allow for an easier integration of stochastic processes into simu-lation models. Figure 9.10 shows the complete framework including the newly added software.^ProFiDotakes a trace as input and can fit one or several stochastic processes or distributions to the trace. Supported are PH distributions, MAPs, ARMA processes, ARTA processes and the newly added CHEPs and CAPPs. ^ProFiDoincludes tools for the visualization of statistical properties like density or distribution functions, lag-k autocorrelation coefficients or (joint) moments. Moreover,^ProFiDosupports the two

CHAPTER 9. SOFTWARE SUPPORT

Trace

ProFiDo PH Fitting MAP Fitting

ARMA Fitting ARTA Fitting CHEP Fitting CAPP Fitting

Result Visualization Statistical Tests

Analysis

Simulation (OMNeT++)

Numerical Solution Techniques (NSOLVE)

Figure 9.10.:^ProFiDoframework for fitting and analyzing stochastic processes

sample Kolmogorov-Smirnov and Pearson’s Chi-Squared tests to examine, if two sam-ples (e.g. a trace and another trace generated by a distribution fitted to the first trace) originated from the same distribution. Once the processes with the desired properties have been fitted, they can be used in simulation models by importing them with the OMNeT++module for Arrival Processes presented above. Additionally, MAPs can be exported to be used with the tool^NSOLVE[39, 122] that contains numerical solution techniques for large Markov chains. Thus, ProFiDo provides a complete framework for fitting stochastic processes, for assessing the quality of the fitted models and for easily integrating them into simulation models.

Chapter

10

Conclusions

In this work several existing approaches to model correlated traffic data have been as-sessed for their suitability as simulation input models. To overcome some of the iden-tified weaknesses and disadvantages of these approaches, the ARTA approach, which uses an autoregressive base process and relies on the inversion of the cumulative dis-tribution function, has been extended in several ways. To be able to capture a larger number of empirical autocorrelation coefficients ARTA processes were enabled to use an ARMA base process instead of an AR process. For arbitrary acyclic PH distribution and the subclass of Hyper-Erlang distributions, for which the inverse cumulative distri-bution function cannot be computed efficiently in general, a novel approach to combine an ARMA base process with PH distributions has been developed, which uses the base process to choose an elementary series of the APH distribution. To increase the possi-ble range of autocorrelation that these novel processes denoted as CHEPs and CAPPs can capture, several transformations of the APH distribution have been proposed. The theoretical work resulted in a fitting tool for CHEPs and CAPPs and a module for the simulation framework ^OMNeT++to easily include stochastic processes into simulation models. Both tools have been integrated into the toolkit ^ProFiDo, which provides a complete framework for fitting, analyzing and simulating stochastic processes. In an extensive empirical study the suitability of CHEPs and CAPPs for capturing the behav-ior of synthetically generated and real traces has been assessed and the two novel pro-cesses have been compared with other existing approaches. It was shown that CHEPs and CAPPs are able to adequately capture both distribution and autocorrelation of a trace. The ability of capturing the behavior of the traces was confirmed by results from several queueing models.

The work presented in this thesis can be extended in several directions.

The first two ideas aim at improving or modifying the proposed fitting algorithms for CHEPs and CAPPs.

Recall, from Chapter 7 that the order of the ARMA base process was not automati-cally determined, but instead the best base process from a given set of base processes with different orders was selected. A heuristic that automatically determines the best base process order would avoid fitting base processes with different orders that are disregarded later anyway.

For fitting the base process the empirical autocorrelations of the trace have been

CHAPTER 10. CONCLUSIONS

used. Since the equations for computing arbitrary joint moments for CHEPs and CAPPs have been given in this work as well, it would be possible to fit a CHEP or CAPP according to joint moments instead of autocorrelation coefficients. On the other hand, the results obtained in Chapter 8 suggest, that this would result in processes that can capture the joint moments almost exactly but fall short of capturing other charac-teristics not used for fitting.

Additionally, the proposed approaches can be generalized in several ways. In [24]

the ARTA approach was generalized to model multivariate random processes with a vector autoregressive base process, called VARTA. The base process is then defined as

Z_t = α1Z_t−1+ α2Z_t−2+ . . . + αpZ_{t− p}+ t

where in the k-variate case each Zt is a random vector with k components, i.e. Zt = (Z1,t, Z_2,t, . . . , Z_k,t) and the i-th variate of the VARTA time series with marginal distribu-tion FYiis generated by Yi,t= F_Y⁻¹_i (Φ(Zi,t). The same approach can of course be applied for CHEPs and CAPPs resulting in correlated multivariate Phase-type processes.

It is known that the autocorrelation is in some cases not sufficient to describe the dependencies such that additional measures have to be considered. In [21] the VARTA approach is extended to account for these cases and copula-based multivariate input models are proposed. Similar extensions should be possible for CAPP processes as well.

Recall, that for computing the autocorrelation of a CHEP or CAPP only the mean values of the elementary series have to be known. Hence the ARMA base process could be used to generate correlated random numbers from a mixture of arbitrary dis-tributions instead of Erlang branches or Hypo-Exponentially distributed elementary series, as long as all mean values from the distributions are known. Of course, fitting these mixture of (potentially different) distributions to a trace is much more difficult than fitting a single distribution only and split it into series like it was done in this work for Hyper-Erlang distributions and acyclic Phase-type distributions.

Appendix

A

Notations

In document Fitting simulation input models for correlated traffic data (Page 160-164)