A Learning Framework - Yin_unc_0153D

ML algorithm

inter-packet gaps at

sender and receiver for pstreams

x

Feature

Vectors

B

gfor pstreams

y

learned

Model

inter-packet gaps

for new pstreams

x

estimated

y

est

B

gfor new pstreams

expected

y

exp

Evaluate

model

2. Training Phase

3. Testing Phase

1. Data Collecting

Phase

Figure 6.1: Machine Learning Framework For Bandwidth Estimation

Below, we describe the key components of this machine-learning-based framework for bandwidth estimation. As plotted in Fig. 6.1, a typical machine learning process consists of three phases. In the data collecting phase, we conduct experiments and produce multiple data instances(X, Y), which become the input of the next training phase. The input set is referred to as a “training set.” This training set will be fed into a machine learning algorithm that will automatically investigate the relationship ofXtoY and generate a learned model. Finally, in the testing phase, we evaluate the quality of the learned model using another set of data, different from the training set; this is referred to as a “testing set.” We feed itsXvector to the resultant model and compare the model output with that of theY vector in the testing set.

6.1 Input Feature Vector

The input feature vector for a p-stream is constructed from the set of send gaps and receive gaps,_{gs i}and

{gr

i}. Fourier transforms are commonly used in machine learning applications when the input may contain information at multiple frequencies [83, 80], and this condition certainly holds forgr, which are distorted by different sources of noise on a network path. Therefore, we use as a feature vector the Fourier-transformed sequence of send and receive gaps for a p-stream of lengthN:x=F F T(gs

6.1 Output

The outputyof the machine learning framework is the result of bandwidth estimation for each p-stream. Forsingle-ratep-streams, bandwidth estimation can be formulated as a classification problem, which maps the output to a set of discrete values. For bandwidth estimation in our case,y= 1if the probing rate exceeds avail-bw; otherwise,y= 0. Formulti-ratep-streams, bandwidth estimation can be formulated as a regression problem, which maps the output to non-discrete values. In our case,y=Bg.

6.1 Machine-learning Algorithms

We consider several common machine-learning algorithms: ElasticNet [55], RandomForest [84], Ad- aBoost [57], and GradientBoost [58]. Below we briefly describe each of these algorithms.

• ElasticNet: ElasticNet is in the family of linear learning methods, which assume a linear relationship between the input feature vectorsxand the output vectory. In other words,y can be expressed as the sum of polynomial terms ofx. The training task of the ElasticNet algorithm is to figure out the polynomial coefficients that best characterize the relationship betweenxandy. Other algorithms in the linear family include Lasso [56] and Ridge [85]. We chose to apply ElasticNet because it has been claimed to overcome the shortcomings of the other two and to generally outperform the others [55]. • RandomForest: RandomForest, AdaBoost, and GradientBoost all follow the principle of ensembling

simple, relatively inaccurate models (usually decision-tree based [86]) into a more robust model. But RandomForest differs from the other two in how it ensembles the weak models: it separates the training set into multiple non-overlapping subsets by random, and a weak model is trained on each subset. For each inputx, the ensembled RandomForest model then producesyas some weighted mean of the results from all weak models.1

• AdaBoost: As stated before, AdaBoost is also based on multiple weak models. It constructs the combined model by perfecting weak models in an iterative and incremental fashion. It begins with a simple decision-tree model, and in each subsequent iteration another model is trained with the goal of compensating for the shortcomings of previous models. To achieve this goal, each new model 1_{The computation of these weights, which is conducted by the internal algorithm of each learning algorithm, depends on the}

emphasizes the training data on which previous models fell short, instead of using a random subset of data, as RandomForest does. Consequently, after each iteration, the combined model becomes more robust. Finally, the ensembled model computes as the output the weighted average of the results of all ensembled weak models.

• GradientBoost: GradientBoost follows the same methods of ensembling multiple weak models as AdaBoost, but it differs in how it measures the accuracy of a model and how it computes weight coefficients when optimizing the robustness of the combined model at each iteration. AdaBoost relies on exponential loss function and GradientBoost on gradient descent. Please refer to [87, 57, 58] for more detail.

6.1 Data Collection

The success of any machine-learning framework depends heavily on good data collection; the framework must collect data that is both accurate and representative. The knowledge ofABgt allows us to compute an expected valueyexp of the output of the machine-learning framework for both single-rate and multi-rate p-streams.

P-streams are generated with the “Dummy-frame” mechanism described in Chapter 4, which shares the 10 Gbps bottleneck link with cross-traffic of different burstiness levels. Some of these p-streams are used as the “training set” to train each of the learning techniques. The p-streams excluded from the training set serve as the “testing set” on which the quality of learned models will be evaluated. We re-use the p-stream observations from Section 5.3, which were obtained to evaluate BASS, as the source of p-streams for both our training and testing data sets. In each experiment below, we use more than 10 000 p-streams, among which 5 000 are used for training and the rest for testing.

6.1 Training

We implement the training phase with the Python scikit-learn [88] library, which offers abundant interfaces for different machine learning algorithms as well as parameter tuning. We use its automatic parameter tuning feature for all machine learning methods and use 5-fold cross-validation to validate our results.

6.1 Metrics

Each test that is run on a p-stream yields an estimate of the outputy. For a single-rate p-stream, the accuracy of the model is quantified by thedecision error rate, which is the percentage of p-streams for which

y₆=yexp. For multi-rate p-streams, we quantifyrelative estimation errorase= y−_BB_gg.

In document Yin_unc_0153D_17422.pdf (Page 114-117)