• No results found

Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning

N/A
N/A
Protected

Academic year: 2021

Share "Detection and Mitigation of Cyber-Attacks Using Game Theory and Learning"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Detection and Mitigation of

Cyber-Attacks Using Game Theory

and Learning

João P.

Hespanha

Kyriakos G.

Vamvoudakis

(2)

Mission Cyber-Assets

C

O

As

Mission Model Cyber-Assets

Model Sensor Alerts C orr el at ion Engi ne

Analysis to get up-to-date view of cyber-assets

Analysis to determine dependencies between assets and missions

Impact Analysis

Create semantically-rich view of cyber-mission status Simulation/Live Security Exercises

Predict Future Actions

Analyze and Characterize Attackers

D at a D at a Data D at a D at a Observations: Netflow, Probing, Time analysis

(3)

Outline...

Large matrix games (summary of results)

Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning

(4)

Outline...

Prandini-Milano Prandini Poly. Milano

Bopardikar-UTRC Large matrix games (summary of results)

Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning

(5)

chat software web proxy intrusion detection

system

sequence of intruder actions that compromise database server, not

detected by IDS

Problem statistics of iCTF 2010

over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about 102527 distinct policies

attacker can choose among 10756 – 102616 distinct policies, depending on attacker's level of expertise

attack graph from [J. Wing 2007]

intruder’s target!

Even “trivially small” network security games can lead to games

with very large decision trees

(6)

chat software web proxy intrusion detection

system

sequence of intruder actions that compromise database server, not

detected by IDS

Problem statistics of iCTF 2010

over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about 102527 distinct policies

attacker can choose among 10756 – 102616 distinct policies, depending on attacker's level of expertise

attack graph from [J. Wing 2007]

intruder’s target!

Even “trivially small” network security games can lead to games

with very large decision trees

Developed sample-based approach to solving zero-sum games

Approach provides probabilistic guarantees on the performance of the policies (in terms of security levels)

Results applicable to very general classes of games that can include stochastic actions, partial information, etc.

(7)

services S3, S7 services S4, S5 services S0, S2 service S6 service S1 services S6, S2 service S9 service S8 services S0, S1 services S3, S9 service S2 service S0 service S1 services S3, S8 services S2, S3

Litya receives avg. 314 units for completion of all 4 missions

Level of attacker sophistication

# units received by Litya
 for 1 round of missions

[Option I, no bribes]

# units received by Litya
 for 1 round of missions
[Option I, with bribes]

# units received by Litya
 for 1 round of missions [Option II, with bribes] no service vulnerable

(baseline) 314 314 314

S2

(vulnerable to 38 teams) 240 240 138

S2, S6, S9

(vulnerable to at least 6 team) 79 79 43

S0, S2, S4, S6, S7, S8, S9

(vulnerable to at least 1 team) 11 -738 -1327

all services vulnerable 11 -848 -1917

In crea sin g le vel of at ta ck er sop hi sti cat io n We were able to

Provide Cyber-security office estimates of mission success Take into account the effect of attacks & counter measures Response can be a function of attacker sophistication

Play what-if scenarios (vulnerabilities, information, etc.)

(8)

Outline...

Sinopoli-CMU Y. Mo-Caltech

Large matrix games (summary of results)

Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning

(9)

Detection in Adversarial Environments

How to interpret & access the reliability of “sensors” that have been manipulated? “Sensors” relevant to cyber missions?

•Measurement sensors (e.g., SCADA systems)

•Computational sensors (e.g., weather forecasting simulation engines) •Data retrieval sensors (e.g., database queries)

•Cyber-security sensors (e.g., IDSs)

Domains

• Deterministic sensors:

with n sensors, one can get correct answer as long as m < n/2 sensors have been manipulated

• Stochastic sensors without manipulation:

solution given by hypothesis testing/estimation

• Stochastic sensors with potential manipulation: open problem?

(10)

Problem formulation

X – binary random variable to be estimated

for simplicity

(papers treats general case) Y1, Y2, …, Yn – “noisy” measurements of X produced by n sensors

per-sensor error probability (not necessarily very small)

Z1, Z2, …, Zn – measurements actually reported by the n sensors

at most m sensors attacked

pattack – probability that we are under attack (very hard to know!)

(11)

Result for “small” # of sensors (n<2/p

err

)

Theorem:

The optimal estimator is

go with the majority of the (potentially manipulated) sensor readings

go with the majority, EXCEPT if there is consensus

The optimal estimator is largely independent of pattack (hard to know)

X – binary random variable to be estimated

Y1, Y2, …, Yn – “noisy” measurements of X produced by n sensors Z1, Z2, …, Zn – measurements actually reported by the n sensors

at most m sensors attacked

(12)

Result for “small” # of sensors (n<2/p

err

)

Theorem:

The optimal estimator is

go with the majority of the (potentially manipulated) sensor readings

go with the majority, EXCEPT if there is consensus

The optimal estimator is largely independent of pattack (hard to know)

X – binary random variable to be estimated

Y1, Y2, …, Yn – “noisy” measurements of X produced by n sensors Z1, Z2, …, Zn – measurements actually reported by the n sensors

at most m sensors attacked

pattack – probability that we are under attack (very hard to know!)

This year’s work…

Can we extend this to the estimation of time-varying variables: e.g., the state of a mission!

(13)

Estimation in Adversarial Environments

How to interpret & access the reliability of “sensors” that have been manipulated? “Sensors” relevant to cyber missions?

•Measurement sensors (e.g., SCADA systems)

•Computational sensors (e.g., weather forecasting simulation engines) •Data retrieval sensors (e.g., database queries)

•Cyber-security sensors (e.g., IDSs)

X – constant binary random variable to be estimated Previously…

X(t) – time-varying state variable to be estimated, based on 1.sensor measurements that may have been manipulated 2.system dynamics

E.g., the state of a cyber mission Now…

(14)

Problem formulation

Dynamics can also be formulated as a

discrete-event system using the

Ramadge-Wonham supervisory control framework dynamical evolution of

systems’s state

N measurements

produced by sensor control signals

Under what conditions can one reconstruct the state from (potentially corrupted) sensor measurements? at most M sensors can be manipulated by the attackers

N measurements

(15)

Problem formulation

Theorem:

Exact state reconstruction is possible if and only if

system is observable through every subset of N - 2M measurements state could be reconstructed through only N - 2M

measurements in the absence of attacks 

potential attack at M sensors, effectively “disables” 2M sensors dynamical evolution of

systems’s state

N measurements

produced by sensor control signals

at most M sensors can be manipulated by the attackers

N measurements

reported by sensor

(16)

Estimation algorithms

Algorithm outline:

1. Build an estimate removing by ignoring a set S of M sensors

2. Build additional estimates by removing, in addition, all combinations of M additional sensors

3. If all attacked sensors were in set S, then the estimates in steps 1. and 2. will be consistent (modulo noise)

Gramian-based estimator:

• batch, finite-time estimation • inversion of the observability

matrix at each time step

Observer-based estimator:

• asymptotic estimation

• recursive low-computation algorithm • provably robust with respect to noise on

all sensors

(including non attacked ones)

(all estimates can be constructed without combinatorial complexity, by using finite dimensionality)

(17)

Outline...

Large matrix games (summary of results)

Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning

(18)

Resilient Cyber-Mission Architectures

In complex cyber missions,

• human operators define policies and rules

• computing elements automate processes of distributed resource allocation, scheduling, inventory management, etc.

• self-configuration: automatic configuration of components • self-healing: automatic discovery and correction of faults

• self-optimization: automatic allocation of resources for optimal operation

What is the impact of attacks on this type of automated/optimization process? Can we devise algorithms with built-in attack prediction/awareness capabilities?

(19)

Focus: Distributed Consensus/Agreement

Goal: minimize errors between values of agents and their neighbors

Attacker: maximize errors using stealth attacks (small vi)

peers of agent i

(self-included)

Classical problem in distributed computing:

•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)

•Decision done iteratively & distributed using peer-to-peer communication

value at processor i, iteration k

adjustment on xi by processor i, at iteration k

correct update on adjustment

update on adjustment by attacker

(20)

Focus: Distributed Consensus/Agreement

value at processor i, iteration k

adjustment on xi by processor i, at iteration k

correct update on adjustment

update on adjustment by attacker

Nash equilibrium formulation:

error

min. by us max. by attacker

our updates

(small means smooth)

min. by us max. by attacker

attacker updates (small means stealth)

max. by us min. by attacker

Classical problem in distributed computing:

•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)

•Decision done iteratively & distributed using peer-to-peer communication 2nd order adjustment rule

(21)

Optimal Solution

Optimal Control and Attacker Policies number of peers

Bellman Equation

Under appropriate regularity assumptions (smoothness)

ui* is optimal (minimal) for us

vi* is optimal (maximal) for attacker

Moreover,

• Consensus will be reached asymptotically

• All variables will remain bounded through the transient (in fact, Lyapunov stability) Theoretical results derived for a continuous-time approximation of the algorithms, more suitable for the asymptotic analysis

(22)

Optimal Solution

Optimal Control and Attacker Policies number of peers

Bellman Equation

Under appropriate regularity assumptions (smoothness)

ui* is optimal (minimal) for us

vi* is optimal (maximal) for attacker

Moreover,

• Consensus will be reached asymptotically

• All variables will remain bounded through the transient (in fact, Lyapunov stability) Theoretical results derived for a continuous-time approximation of the algorithms, more suitable for the asymptotic analysis

But…

Bellman equation difficult to solve (curse of dimensionality)

Last year:

•Machine learning based approach to solve this distributed consensus problem

•Restricted to second-order updates (double integrator)

•Global knowledge of the communication graph was required

•Global knowledge of the update rules used by each agent required This year’s work overcomes these 3 limitations…

(23)

Focus: Distributed Consensus/Agreement

Goal: minimize errors between values of agents and their neighbors

Attacker: maximize errors using stealth attacks (small vi)

peers of agent i

(self-included)

Classical problem in distributed computing:

•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)

•Decision done iteratively & distributed using peer-to-peer communication

value at processor i, iteration k+1

correct update

malicious update by attacker

(24)

Focus: Distributed Consensus/Agreement

Nash equilibrium formulation:

error

min. by us max. by attacker

our updates

(small means smooth)

min. by us max. by attacker

attacker updates (small means stealth)

max. by us min. by attacker

Classical problem in distributed computing:

•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)

•Decision done iteratively & distributed using peer-to-peer communication

value at processor i, iteration k+1

correct update

malicious update by attacker

(25)

Focus: Distributed Consensus/Agreement

Nash equilibrium formulation:

error

min. by us max. by attacker

our updates

(small means smooth)

min. by us max. by attacker

attacker updates (small means stealth)

max. by us min. by attacker

Classical problem in distributed computing:

•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)

•Decision done iteratively & distributed using peer-to-peer communication

value at processor i, iteration k+1

correct update

malicious update by attacker

General update rule: Challenges:

• Each agent does not necessarily know the update algorithm used by the other agents (Ai,Bi,Di)

• Each agent does not necessarily know the global graph (just its set of neighbors

N

i )

(26)

Q-learning

We shall use Q-Learning (Watkins, 1989) (a popular method from machine learning)

actions evaluations Generator Tester states or situations of each agent Q-learning:

Model-free machine learning method to learn an action-utility or Q-function, giving the expected utility of taking a given action in a given state.

The learned Q-function directly approximates, the optimal action-value function, independent of the policy being followed.

(27)

Q-function

are the unique symmetric positive definite matrices that solve the game

Q-function:

Each Q-function is quadratic

Instead of the Q-tables (complexity problems) used by Watkins 1989, we shall instead

use appropriate neural networks. But encoding the states and actions (Q-function) properly will be challenging.

Optimal Q-function:

(28)

Actor/Critic Learning Approach

Critic = Model–free (distributed) algorithm to evaluate the current algorithm & estimate attacker actions

Actor = Model–free (distributed) algorithm to enact optimal decisions (based on critic’s findings)

Actor & Critic based on Approximate Dynamic Programming

Neural Network approximation of Q-function (action dependent) & optimal control laws

Explicit representation of policy and value function

Minimal computation to select actions

Can learn an explicit policy Can put constraints on policies Appealing as psychological and neural models

(29)

Learning Structure

Critic Neural Networks to approximate the costs

unknown weights to be learned

basis sets, with local state and control information

Actor Neural networks to approximate the control and adversarial inputs

Bellman equations in integral form

Sampling interval

(30)

Unknown second order dynamics and unknown

graph structure, cost weights and leader information

Learning under attacks key result

Theorem :

Assuming

• The signals are persistently exciting. • The graph is strongly connected.

Then

• The equilibrium points of the closed-loop signals are

asymptotically stable

• The policies converge to a Nash equilibrium

Proposed algorithm. Synchronization of all the

agents is achieved even under attacks and unknown network and agent information.

(31)

Outline...

Large matrix games (summary of results)

Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning

(32)

Challenges to real-time cyber-mission protection:

• cyber assets shared among missions

• cyber asset requirements change over time

• missions can use different configurations of resources • complex network of cyber-asset dependencies

Cyber Missions Complexity

services S0, S2 service S6 service S9 service S8 services S0, S1 services S0, S1 service S2

• Attack on service S0 can result in multiple mission failure

• But, damage only realized if missions follows particular paths

services S5, S6, S7

Cyber Awareness Questions:

• When & where is an attacker most likely to strike?

• When & where is an attacker most damaging to mission completion? • How will the answer depend on

attacker resources? attacker skills? attacker knowledge?

(real-time what-if analysis)

services S5, S6

Mission 1

(33)

Mapping Service Attacks to Mission Damage

• Mission requires multiple services

• Mission reliance on services varies with time Damage equation:

(for service s at time t) attack

resources Potential

damage

Uncertainty equation:

(for service s at time t)

attack resources probability of

realizing damage

equation parameters vary with time as mission progresses

(learned from data in iCTF exercises)

total damage to mission

total attack

resources at time t

maximize constrained by:

Optimal attacks:

In this period: Developed optimization engine that can address 1000’s of variables/constraints in a few milliseconds.

(34)

Multi-Resolution Visualization

Multi-resolution attack analysis

1. High-level attack predictions based on online optimization

2. Potential damage & uncertainty associated with attacks to different services

3. Parameters that determine damage and uncertainty

AlloSphere Integration

• High-level predictions permit fast action • Low-level parameters permits investigating

(35)

Summary of Accomplishments (Y5)

A new notion of observability for systems under attacks.

• A necessary and sufficient condition for a dynamical system to be M-attack observable.

• Two estimation algorithms

 Gramian-based estimator (finite-time convergence)

 Observer-based estimator (asymptotic convergence)

Shielding complex networks from adversarial attacks using Q-learning

Developed Model-Free Dynamic Programming-based solution to obtain resilient algorithms with attack estimates

• Used online Q-learning approach to overcome complexity issues in complex networks

• There is no need for the physical models, nor the network interactions

• Applications to critical infrastructures, e.g. power systems Online optimization for real-time attack prediction

• Develop numerical algorithms for fast (real-time) attack prediction

• Integration with visualization tools developed (more on Tobias Hollerer’s presentation) Focus for future work

Develop new self-configuring event-triggered algorithms for complex networks for decision and control given the presence of jamming network attacks.

Transition to practice including Mixed Human, Manned and Unmanned Teams and Large Networks of Off-Grid Power systems.

(36)

Publications

K. Vamvoudakis, J. Hespanha, B. Sinopoli, Y. Mo. Detection in Adversarial Environments. IEEE Trans. on Automatic Control,

Special Issue on the Control of Cyber-Physical Systems, February 2014. To appear.

K. G. Vamvoudakis, M. F. Miranda, J. P. Hespanha. Asymptotically-Stable Optimal Adaptive Control Algorithm with

Saturating Actuators and Relaxed Persistence of Excitation. IEEE Transactions on Neural Networks and Learning Systems, July 2014. conditionally accepted.

Kenji Hirata, J. P. Hespanha, Kenko Uchida. Real-time Pricing Leading to Optimal Operation under Distributed Decision Makings. In Proc. of the 2014 Amer. Contr. Conf., June 2014.

Daniel Silvestre, Paulo Rosa, J. Hespanha, C. Silvestre. Finite-time Average Consensus in a Byzantine Environment Using Set-Valued Observers. In Proc. of the 2014 Amer. Contr. Conf., June 2014.

K. G. Vamvoudakis, J. P. Hespanha. Online Optimal Switching of Single Phase DC/AC Inverters Using Partial Information. In

Proc. American Control Conference, pp. 2624-2630, Portland, OR, 2014.

Published last year or in press

K. Vamvoudakis , J. Hespanha, Game-Theory based Consensus Learning of Double-Integrator Agents in the Presence of Attackers. Sep 2014. Submitted to journal publication.

K. G. Vamvoudakis, P. J. Antsaklis, W. E. Dixon, J. P. Hespanha, F. L. Lewis, H. Modares, B. Kiumarsi. Autonomy and

Machine Intelligence in Complex Systems: A Tutorial. submitted to American Control Conference, Chicago, IL, 2015. (invited paper)

M. Chong, M. Wakaiki, J. Hespanha. Observability of Linear Systems under attacks. September 2014. Submitted to ACC 2015.

Submitted

Working papers

K. G. Vamvoudakis, J. P. Hespanha. Cooperative Q-learning for Rejection of Persistent Adversarial Inputs in Complex Networks. in preparation for Journal of Machine Learning Research, 2014.

References

Related documents

In the classifi cation mode, the input consists of a set of primitive parts, the connections between them, and the multi-level hierarchy provided by the learning phase.. The

The argument for a direct link between human resource practices and employee performance in R&amp;D team settings hinges on the idea that high-involvement

We will show that the simple polarized incoherent phase observed in Ref.(25) can be captured by perturbation theory and show that its excited states can be reproduced by

4 AUPRs (blue circles) and running times (orange triangles) of GENIE3, when varying the values of the parameters K (number of randomly chosen candidate regulators at each split node

To prevent these types of attacks, core developers of Bitcoin have been relatively successful in the guidance they provide through the Bitcoin Foundations to users through

Feedback loop diagram also describes how p roject risks are interacted to each ot her , which is one of t he most distinguishing feat ures of System Dynamics app roach.. For example

made up the other 600 students and the undergraduate programs included the traditional BSN, RN to BSN, accelerated, and LPN to BSN. At the time of this study, there were 468 students