Traffic Driven Analysis of Cellular Data Networks

(1)

Traffic

Driven

Analysis

of

Cellular

Data

Networks

Samir R. Das

Computer Science Department Stony Brook University

Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot,

(2)

Mobile Data Usage

Relatively little research on nature of mobile data

traffic. ₂

0.6 EB / month

10.8 EB / month

Forecast of Global Mobile Data Traffic

Source: CISCO VNI Mobile

1 Exabyte = 1 million Terabyte

Higher than the

traffic volume in the entire Global Internet

(3)

3 Traffic Management Modeling and Forecasting Traffic Analysis

(4)

Flow Monitoring Tool SQL Database Packet Flows

Internet Flow Records

Mobility and Session Manager Radio Access Network

Measurement

Infrastructure

(5)

Sample

Results

from

Traffic

Analysis

• Data collected from a nationwide 2G/3G network circa 2007

– About 10K BSes, 1M subscribers.

• Significant traffic imbalance per subscriber and per BS

– 1% of subscribers create more than 60% of load. – 10% of BSes experience more than 50% of load.

• Mobility is generally low

– More than 50% subscribers stick to just one BS daily. – Median radius of gyration is ~1 mile.

(6)

Sample

Results

from

Traffic

Analysis

• Mobility is predictable

– Subscribers are almost always found in their top 2‐3

most visited locations.

– They return to the same location at the same time of

the day with high probability.

• More mobile subscribers tend to generate more traffic.

• Radio resource usage efficiency is very poor

(7)

Functional

Influence

Among

BSes

• _Model_BS_load_as_time_series._Explore_causal

relationships between pairs of time series.

• _Granger_Causality _{– Determines}_whether_one_time

series is useful in forecasting another when using an

autoregressive model.

– Has been used in economics and neuroscience.

• Statistically significant causality exists among

neighboring BSes (roughly among half of the

neighbors).

• _Causality_graph_and_causal_path_{– Make}_a_graph_out_of

causality. Long paths exist in this graph (median = 15

(8)

Modeling

Study

• Model BS traffic loads exploiting any interactions/dependencies

– Exploit tools from machine learning.

– Many possible directions – purely static/spatial,

dynamic/temporal.

• Goals:

– Intellectual – broad understanding of any underlying

structure would help future network architectures.

– Utilitarian – models can help estimation/forecasting.

(9)

• Assume load on n base stations are multi‐variate Gaussian:

• Learn the parameters given a set of training data,

specifically the “inverse covariance matrix” ,

given a set of training data (p observations). • is easier to estimate than and exposes

interesting properties.

Spatial

Modeling

Approach:

Probabilistic

Graphical

Modeling

Mean vector Covariance matrix

(10)

Inverse

Covariance

Matrix:

Properties

• If then load variables and are

conditionally independent, given the rest of the

variables.

• Most problems produce a `sparse’ model.

• Related to probabilistic graphical models (e.g.,

Gaussian Markov Random Field).

1

2

3 4

5

Undirected Graphical Model

‐> Edge

‐> no edge

Graph properties translate to probabilistic (in)dependencies

(11)

Inference

Problem

• _Estimate_load_for_BS_i _given

the load of a subset of BSes S as the conditional mean:

• Broad questions:

– How large should be S? Effort

vs. accuracy tradeoff. – How to choose S? 1 2 3 4 5 Measure only a

subset and estimate the rest.

(12)

First

Solve

the

Learning

Problem

• Learn the inverse covariance matrix from

training data.

• How? Exploit relationship with linear

regression modeling.

– Express load of BS i as a linear function of all other

BS loads and then regress:

– Regression coefficients can be shown to be

directly related to inv. cov. matrix elements.

Y_i 



X_j_ji

(13)

Sparse

Models

• _Sparse_model_‐_>_many_regression_coeffs _are

zero.

• _Reduces_danger_of_over_‐_fitting_(lowering

variance). Also, computationally efficient.

• Introduce a regularization term in regression.

We used “Lasso” .

Empirical error Regularization term modeling penalty

(14)

Regularization

• _Cross_‐_validate_using_additional_training

samples (not used for model creation).

• Use various values of to create different

models.

• Choose the one with max likelihood.

(15)

Data

Processing

• Hourly load of 400 BSes covering 75 x 84 miles area. Includes a busy downtown and surrounding suburbs.

• _No_temporal_dimension_in_model._Create_different

models for for different parts of the day (every 4

hours).

• _Account_for_diurnal_variation_of_load._Use_residuals

(16)

Average

Edge

Length

in

the

Model

Graph

• _Apparent_{spatial/regional}_{significance.}

(17)

Choosing

the

Measured

Set

S

• _Greedy_strategy_{– each}_iteration_picks_the_BS_that

minimizes the error estimate.

(18)

Impact

of

Estimation

Accuracy

on

Applications

• We understand the measurement complexity

(size of S) vs. Error tradeoff.

• But how much accuracy do we need? Need to

turn to applications

• _Studied_two_applications

– Energy Management

(19)

Opportunistic

Traffic

Scheduling

• Similar to Smart Electric Grid – move non‐urgent traffic

from peak to off‐peak periods.

– What is non‐urgent? p2p, large downloads, sync, push, etc.

– Who decides? User agent on mobile. May have multiple levels of priority or have deadlines to aid scheduling.

– Carriers can incentivize such scheduling.

• _Similar_to_{QoS scheduling}_{– but}_at_a_higher_layer_and_at

a longer time scale.

• Two components in System Architecture

– Server (Scheduler) in core network.

(20)

Time Line

2PM 2:30PM 3PM 3:30PM

Creates low-priority flow Deadline=2hr

20

Server (scheduler) in the

(21)

Solving

the

Scheduling

Problem

• _Several_approaches_possible_based_on_how

flows are prioritized.

• _But_for_any_approach,_server_needs_to_be_able

estimate current/future loads at all BSes.

– Also, needs to model/estimate subscriber mobility

(separate problem).

• Poor estimation leads to poor scheduler

(22)

Evaluation

Approach

• Trace‐driven simulator based on a capacity model of BSes.

• Opportunistic scheduling is meant to admit more traffic but with the same network capacity.

• We use the same traffic trace always, but reduce network capacity to demonstrate impact.

• Impact?

– Do low priority flows still finish within a reasonable

time?

(23)

Results

• Low priority flows =

random subset of long‐

lived flows (over 25 mins), about 8% of all flows.

Randomly chosen

deadlines 1 ‐ 4 hours. • Rest high priority.

• Scheduling epoch hourly. • Only a subset of 400 BSes

are measured, rest estimated.

(24)

Conclusions

• Discovering structures in mobile traffic is a

rich area of study.

• Applications in network and resource

(25)

25 Traffic Management Modeling and Forecasting Traffic Analysis

Traffic Driven Analysis of Cellular Data Networks

Traffic

Flow Monitoring Tool SQL Database Packet Flows

Measurement

Sample

Functional

Modeling

Spatial

Inference

Sparse

Processing

Average

Opportunistic

Solving

Results

Questions?