Detection and Mitigation of
Cyber-Attacks Using Game Theory
and Learning
João P.
Hespanha
Kyriakos G.
Vamvoudakis
Mission Cyber-Assets
C
O
As
Mission Model Cyber-Assets
Model Sensor Alerts C orr el at ion Engi ne
Analysis to get up-to-date view of cyber-assets
Analysis to determine dependencies between assets and missions
Impact Analysis
Create semantically-rich view of cyber-mission status Simulation/Live Security Exercises
Predict Future Actions
Analyze and Characterize Attackers
D at a D at a Data D at a D at a Observations: Netflow, Probing, Time analysis
Outline...
Large matrix games (summary of results)
Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning
Outline...
Prandini-Milano Prandini Poly. Milano
Bopardikar-UTRC Large matrix games (summary of results)
Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning
chat software web proxy intrusion detection
system
sequence of intruder actions that compromise database server, not
detected by IDS
Problem statistics of iCTF 2010
over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about 102527 distinct policies
attacker can choose among 10756 – 102616 distinct policies, depending on attacker's level of expertise
attack graph from [J. Wing 2007]
intruder’s target!
Even “trivially small” network security games can lead to games
with very large decision trees
chat software web proxy intrusion detection
system
sequence of intruder actions that compromise database server, not
detected by IDS
Problem statistics of iCTF 2010
over 7800 distinct mission states (defender observations) over 2500 distinct observations available to the attacker defender can choose among about 102527 distinct policies
attacker can choose among 10756 – 102616 distinct policies, depending on attacker's level of expertise
attack graph from [J. Wing 2007]
intruder’s target!
Even “trivially small” network security games can lead to games
with very large decision trees
Developed sample-based approach to solving zero-sum games
Approach provides probabilistic guarantees on the performance of the policies (in terms of security levels)
Results applicable to very general classes of games that can include stochastic actions, partial information, etc.
services S3, S7 services S4, S5 services S0, S2 service S6 service S1 services S6, S2 service S9 service S8 services S0, S1 services S3, S9 service S2 service S0 service S1 services S3, S8 services S2, S3
Litya receives avg. 314 units for completion of all 4 missions
Level of attacker sophistication
# units received by Litya for 1 round of missions
[Option I, no bribes]
# units received by Litya for 1 round of missions [Option I, with bribes]
# units received by Litya for 1 round of missions [Option II, with bribes] no service vulnerable
(baseline) 314 314 314
S2
(vulnerable to 38 teams) 240 240 138
S2, S6, S9
(vulnerable to at least 6 team) 79 79 43
S0, S2, S4, S6, S7, S8, S9
(vulnerable to at least 1 team) 11 -738 -1327
all services vulnerable 11 -848 -1917
In crea sin g le vel of at ta ck er sop hi sti cat io n We were able to
Provide Cyber-security office estimates of mission success Take into account the effect of attacks & counter measures Response can be a function of attacker sophistication
Play what-if scenarios (vulnerabilities, information, etc.)
Outline...
Sinopoli-CMU Y. Mo-Caltech
Large matrix games (summary of results)
Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning
Detection in Adversarial Environments
How to interpret & access the reliability of “sensors” that have been manipulated? “Sensors” relevant to cyber missions?
•Measurement sensors (e.g., SCADA systems)
•Computational sensors (e.g., weather forecasting simulation engines) •Data retrieval sensors (e.g., database queries)
•Cyber-security sensors (e.g., IDSs)
Domains
• Deterministic sensors:
with n sensors, one can get correct answer as long as m < n/2 sensors have been manipulated
• Stochastic sensors without manipulation:
solution given by hypothesis testing/estimation
• Stochastic sensors with potential manipulation: open problem?
Problem formulation
X – binary random variable to be estimated
for simplicity
(papers treats general case) Y1, Y2, …, Yn – “noisy” measurements of X produced by n sensors
per-sensor error probability (not necessarily very small)
Z1, Z2, …, Zn – measurements actually reported by the n sensors
at most m sensors attacked
pattack – probability that we are under attack (very hard to know!)
Result for “small” # of sensors (n<2/p
err)
Theorem:
The optimal estimator is
go with the majority of the (potentially manipulated) sensor readings
go with the majority, EXCEPT if there is consensus
The optimal estimator is largely independent of pattack (hard to know)
X – binary random variable to be estimated
Y1, Y2, …, Yn – “noisy” measurements of X produced by n sensors Z1, Z2, …, Zn – measurements actually reported by the n sensors
at most m sensors attacked
Result for “small” # of sensors (n<2/p
err)
Theorem:
The optimal estimator is
go with the majority of the (potentially manipulated) sensor readings
go with the majority, EXCEPT if there is consensus
The optimal estimator is largely independent of pattack (hard to know)
X – binary random variable to be estimated
Y1, Y2, …, Yn – “noisy” measurements of X produced by n sensors Z1, Z2, …, Zn – measurements actually reported by the n sensors
at most m sensors attacked
pattack – probability that we are under attack (very hard to know!)
This year’s work…
Can we extend this to the estimation of time-varying variables: e.g., the state of a mission!
Estimation in Adversarial Environments
How to interpret & access the reliability of “sensors” that have been manipulated? “Sensors” relevant to cyber missions?
•Measurement sensors (e.g., SCADA systems)
•Computational sensors (e.g., weather forecasting simulation engines) •Data retrieval sensors (e.g., database queries)
•Cyber-security sensors (e.g., IDSs)
X – constant binary random variable to be estimated Previously…
X(t) – time-varying state variable to be estimated, based on 1.sensor measurements that may have been manipulated 2.system dynamics
E.g., the state of a cyber mission Now…
Problem formulation
Dynamics can also be formulated as a
discrete-event system using the
Ramadge-Wonham supervisory control framework dynamical evolution of
systems’s state
N measurements
produced by sensor control signals
Under what conditions can one reconstruct the state from (potentially corrupted) sensor measurements? at most M sensors can be manipulated by the attackers
N measurements
Problem formulation
Theorem:
Exact state reconstruction is possible if and only if
system is observable through every subset of N - 2M measurements state could be reconstructed through only N - 2M
measurements in the absence of attacks
potential attack at M sensors, effectively “disables” 2M sensors dynamical evolution of
systems’s state
N measurements
produced by sensor control signals
at most M sensors can be manipulated by the attackers
N measurements
reported by sensor
Estimation algorithms
Algorithm outline:
1. Build an estimate removing by ignoring a set S of M sensors
2. Build additional estimates by removing, in addition, all combinations of M additional sensors
3. If all attacked sensors were in set S, then the estimates in steps 1. and 2. will be consistent (modulo noise)
Gramian-based estimator:
• batch, finite-time estimation • inversion of the observability
matrix at each time step
Observer-based estimator:
• asymptotic estimation
• recursive low-computation algorithm • provably robust with respect to noise on
all sensors
(including non attacked ones)
(all estimates can be constructed without combinatorial complexity, by using finite dimensionality)
Outline...
Large matrix games (summary of results)
Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning
Resilient Cyber-Mission Architectures
In complex cyber missions,
• human operators define policies and rules
• computing elements automate processes of distributed resource allocation, scheduling, inventory management, etc.
• self-configuration: automatic configuration of components • self-healing: automatic discovery and correction of faults
• self-optimization: automatic allocation of resources for optimal operation
What is the impact of attacks on this type of automated/optimization process? Can we devise algorithms with built-in attack prediction/awareness capabilities?
Focus: Distributed Consensus/Agreement
Goal: minimize errors between values of agents and their neighbors
Attacker: maximize errors using stealth attacks (small vi)
peers of agent i
(self-included)
Classical problem in distributed computing:
•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)
•Decision done iteratively & distributed using peer-to-peer communication
value at processor i, iteration k
adjustment on xi by processor i, at iteration k
correct update on adjustment
update on adjustment by attacker
Focus: Distributed Consensus/Agreement
value at processor i, iteration k
adjustment on xi by processor i, at iteration k
correct update on adjustment
update on adjustment by attacker
Nash equilibrium formulation:
error
min. by us max. by attacker
our updates
(small means smooth)
min. by us max. by attacker
attacker updates (small means stealth)
max. by us min. by attacker
Classical problem in distributed computing:
•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)
•Decision done iteratively & distributed using peer-to-peer communication 2nd order adjustment rule
Optimal Solution
Optimal Control and Attacker Policies number of peers
Bellman Equation
Under appropriate regularity assumptions (smoothness)
ui* is optimal (minimal) for us
vi* is optimal (maximal) for attacker
Moreover,
• Consensus will be reached asymptotically
• All variables will remain bounded through the transient (in fact, Lyapunov stability) Theoretical results derived for a continuous-time approximation of the algorithms, more suitable for the asymptotic analysis
Optimal Solution
Optimal Control and Attacker Policies number of peers
Bellman Equation
Under appropriate regularity assumptions (smoothness)
ui* is optimal (minimal) for us
vi* is optimal (maximal) for attacker
Moreover,
• Consensus will be reached asymptotically
• All variables will remain bounded through the transient (in fact, Lyapunov stability) Theoretical results derived for a continuous-time approximation of the algorithms, more suitable for the asymptotic analysis
But…
Bellman equation difficult to solve (curse of dimensionality)
Last year:
•Machine learning based approach to solve this distributed consensus problem
•Restricted to second-order updates (double integrator)
•Global knowledge of the communication graph was required
•Global knowledge of the update rules used by each agent required This year’s work overcomes these 3 limitations…
Focus: Distributed Consensus/Agreement
Goal: minimize errors between values of agents and their neighbors
Attacker: maximize errors using stealth attacks (small vi)
peers of agent i
(self-included)
Classical problem in distributed computing:
•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)
•Decision done iteratively & distributed using peer-to-peer communication
value at processor i, iteration k+1
correct update
malicious update by attacker
Focus: Distributed Consensus/Agreement
Nash equilibrium formulation:
error
min. by us max. by attacker
our updates
(small means smooth)
min. by us max. by attacker
attacker updates (small means stealth)
max. by us min. by attacker
Classical problem in distributed computing:
•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)
•Decision done iteratively & distributed using peer-to-peer communication
value at processor i, iteration k+1
correct update
malicious update by attacker
Focus: Distributed Consensus/Agreement
Nash equilibrium formulation:
error
min. by us max. by attacker
our updates
(small means smooth)
min. by us max. by attacker
attacker updates (small means stealth)
max. by us min. by attacker
Classical problem in distributed computing:
•A group of computing elements must agree on a common scalar value x (e.g., priority, resources allocated, inventory decision, database value)
•Decision done iteratively & distributed using peer-to-peer communication
value at processor i, iteration k+1
correct update
malicious update by attacker
General update rule: Challenges:
• Each agent does not necessarily know the update algorithm used by the other agents (Ai,Bi,Di)
• Each agent does not necessarily know the global graph (just its set of neighbors
N
i )Q-learning
We shall use Q-Learning (Watkins, 1989) (a popular method from machine learning)
actions evaluations Generator Tester states or situations of each agent Q-learning:
Model-free machine learning method to learn an action-utility or Q-function, giving the expected utility of taking a given action in a given state.
The learned Q-function directly approximates, the optimal action-value function, independent of the policy being followed.
Q-function
are the unique symmetric positive definite matrices that solve the game
Q-function:
Each Q-function is quadratic
Instead of the Q-tables (complexity problems) used by Watkins 1989, we shall instead
use appropriate neural networks. But encoding the states and actions (Q-function) properly will be challenging.
Optimal Q-function:
Actor/Critic Learning Approach
Critic = Model–free (distributed) algorithm to evaluate the current algorithm & estimate attacker actions
Actor = Model–free (distributed) algorithm to enact optimal decisions (based on critic’s findings)
Actor & Critic based on Approximate Dynamic Programming
Neural Network approximation of Q-function (action dependent) & optimal control laws
Explicit representation of policy and value function
Minimal computation to select actions
Can learn an explicit policy Can put constraints on policies Appealing as psychological and neural models
Learning Structure
Critic Neural Networks to approximate the costs
unknown weights to be learned
basis sets, with local state and control information
Actor Neural networks to approximate the control and adversarial inputs
Bellman equations in integral form
Sampling interval
Unknown second order dynamics and unknown
graph structure, cost weights and leader information
Learning under attacks key result
Theorem :
Assuming
• The signals are persistently exciting. • The graph is strongly connected.
Then
• The equilibrium points of the closed-loop signals are
asymptotically stable
• The policies converge to a Nash equilibrium
Proposed algorithm. Synchronization of all the
agents is achieved even under attacks and unknown network and agent information.
Outline...
Large matrix games (summary of results)
Observability of dynamical systems under attacks to sensors Multi-agent learning under cyber-attack using Q-learning
Challenges to real-time cyber-mission protection:
• cyber assets shared among missions
• cyber asset requirements change over time
• missions can use different configurations of resources • complex network of cyber-asset dependencies
Cyber Missions Complexity
services S0, S2 service S6 service S9 service S8 services S0, S1 services S0, S1 service S2
• Attack on service S0 can result in multiple mission failure
• But, damage only realized if missions follows particular paths
services S5, S6, S7
Cyber Awareness Questions:
• When & where is an attacker most likely to strike?
• When & where is an attacker most damaging to mission completion? • How will the answer depend on
attacker resources? attacker skills? attacker knowledge?
(real-time what-if analysis)
services S5, S6
Mission 1
Mapping Service Attacks to Mission Damage
• Mission requires multiple services• Mission reliance on services varies with time Damage equation:
(for service s at time t) attack
resources Potential
damage
Uncertainty equation:
(for service s at time t)
attack resources probability of
realizing damage
equation parameters vary with time as mission progresses
(learned from data in iCTF exercises)
total damage to mission
total attack
resources at time t
maximize constrained by:
Optimal attacks:
In this period: Developed optimization engine that can address 1000’s of variables/constraints in a few milliseconds.
Multi-Resolution Visualization
Multi-resolution attack analysis
1. High-level attack predictions based on online optimization
2. Potential damage & uncertainty associated with attacks to different services
3. Parameters that determine damage and uncertainty
AlloSphere Integration
• High-level predictions permit fast action • Low-level parameters permits investigating
Summary of Accomplishments (Y5)
A new notion of observability for systems under attacks.
• A necessary and sufficient condition for a dynamical system to be M-attack observable.
• Two estimation algorithms
Gramian-based estimator (finite-time convergence)
Observer-based estimator (asymptotic convergence)
Shielding complex networks from adversarial attacks using Q-learning
• Developed Model-Free Dynamic Programming-based solution to obtain resilient algorithms with attack estimates
• Used online Q-learning approach to overcome complexity issues in complex networks
• There is no need for the physical models, nor the network interactions
• Applications to critical infrastructures, e.g. power systems Online optimization for real-time attack prediction
• Develop numerical algorithms for fast (real-time) attack prediction
• Integration with visualization tools developed (more on Tobias Hollerer’s presentation) Focus for future work
Develop new self-configuring event-triggered algorithms for complex networks for decision and control given the presence of jamming network attacks.
Transition to practice including Mixed Human, Manned and Unmanned Teams and Large Networks of Off-Grid Power systems.
Publications
K. Vamvoudakis, J. Hespanha, B. Sinopoli, Y. Mo. Detection in Adversarial Environments. IEEE Trans. on Automatic Control,
Special Issue on the Control of Cyber-Physical Systems, February 2014. To appear.
K. G. Vamvoudakis, M. F. Miranda, J. P. Hespanha. Asymptotically-Stable Optimal Adaptive Control Algorithm with
Saturating Actuators and Relaxed Persistence of Excitation. IEEE Transactions on Neural Networks and Learning Systems, July 2014. conditionally accepted.
Kenji Hirata, J. P. Hespanha, Kenko Uchida. Real-time Pricing Leading to Optimal Operation under Distributed Decision Makings. In Proc. of the 2014 Amer. Contr. Conf., June 2014.
Daniel Silvestre, Paulo Rosa, J. Hespanha, C. Silvestre. Finite-time Average Consensus in a Byzantine Environment Using Set-Valued Observers. In Proc. of the 2014 Amer. Contr. Conf., June 2014.
K. G. Vamvoudakis, J. P. Hespanha. Online Optimal Switching of Single Phase DC/AC Inverters Using Partial Information. In
Proc. American Control Conference, pp. 2624-2630, Portland, OR, 2014.
Published last year or in press
K. Vamvoudakis , J. Hespanha, Game-Theory based Consensus Learning of Double-Integrator Agents in the Presence of Attackers. Sep 2014. Submitted to journal publication.
K. G. Vamvoudakis, P. J. Antsaklis, W. E. Dixon, J. P. Hespanha, F. L. Lewis, H. Modares, B. Kiumarsi. Autonomy and
Machine Intelligence in Complex Systems: A Tutorial. submitted to American Control Conference, Chicago, IL, 2015. (invited paper)
M. Chong, M. Wakaiki, J. Hespanha. Observability of Linear Systems under attacks. September 2014. Submitted to ACC 2015.
Submitted
Working papers
K. G. Vamvoudakis, J. P. Hespanha. Cooperative Q-learning for Rejection of Persistent Adversarial Inputs in Complex Networks. in preparation for Journal of Machine Learning Research, 2014.