DS-Bench/D-Cloud:
DS Bench/D Cloud:
Dependability Measurement and Evaluation Tool
Yutaka Ishikawa
†, Mitsuhisa Sato
‡, Toshihiro
Hanawa
‡Hajime Fujita
†Takayuki Banzai
‡Hitoshi
Hanawa , Hajime Fujita , Takayuki Banzai , Hitoshi
Koizumi
‡, and Shin’ichi Miura
‡†University of Tokyo and ‡University of Tsukuba
DS-Bench/D-Cloud
DS-Bench/D-Cloud
•
A tool to execute dependability
benchmarks
Anomalydetection Analysis Quick Responsiveness Evidence management EvidencebyUnittest Evidenceby Integration test D-Visor D-Box D-Logger D-Analyzer D-System Monitor D-Case Editor D-Effector Type/Model Checker DS-Bench/ D-Cloud DCase Update D-Case Walker DCase E ti Anomalydetection Analysis Quick Responsiveness Evidence management EvidencebyUnittest Evidenceby Integration test D-Visor D-Box D-Logger D-Analyzer D-System Monitor D-Case Editor D-Effector Type/Model Checker DS-Bench/ D-Cloud DCase Update D-Case Walker DCase E ti– measuring availability, reliability,
performance, and power
consumption under an anomaly
D-CasegrowthcycleDEOS Process
D-Case
Analysis Responsiveness management Update Integrationtest
DEOS Framework䋧Tools
Execution D-Case growthcycle DEOS Process D-Case D-Case growthcycle DEOS Process D-Case
Analysis Responsiveness management Update Integrationtest
DEOS Framework䋧Tools
Execution
consumption under an anomaly
situation
•
Benchmark targets
Test Verification Design Requirement/Environment Changeaccommodatingcycle Systemchangerequests basedonstakeholders’ agreement Implementation Stakeholders’ i / Test Verification Design Requirement/Environment Changeaccommodatingcycle Systemchangerequests basedonstakeholders’ agreement Implementation Stakeholders’ i / Test Verification Design Requirement/Environment Changeaccommodatingcycle Systemchangerequests basedonstakeholders’ agreement Implementation Stakeholders’ i /– Whole system
– Subsystems
• Network driver
Failurereactingcycle Achievementof accountability Causeanalysis Responsiveaction Failureprevention Anomalydetection/ Unexpectedfailure happen requirements/ System environment changes Failurereactingcycle Achievementof accountability Causeanalysis Responsiveaction Failureprevention Anomalydetection/ Unexpectedfailure happen requirements/ System environment changes Failurereactingcycle Achievementof accountability Causeanalysis Responsiveaction Failureprevention Anomalydetection/ Unexpectedfailure happen requirements/ System environment changes• Network driver
• Server processes
•
The results of benchmarks are
Normal operation Normal operation Normal operation D-Case Bottom Structure D-Case Bottom Structure
•
The results of benchmarks are
used as the evidence of
dependability required by D-Case
i th d
i
d t
t h
Evidence Sub-Goal S b Gl Evidence Sub-Goal S b Glin the design and test phases
Evidence Sub-Goal Evidence Sub-Goal
D-Case with DS-Bench/D-Cloud
Client
Content sizes increase. The stakeholder and the
developer decide increasing the network bandwidth
One more network
link is added
WEB Server is
dependable
Web server Client Normal stateThe network link offers at
least 1 6 Gbps under the
Hardware components
are argued
DS-Bench/D-Cloud
Heartbeat Faulty stateleast 1.6 Gbps under the
safe condition
DS-Bench/D-Cloud
reports 1 8 Gbps in
High availability and
HA server is used
DS Bench/D Cloud
reports 1.5 million
requests /sec using
SpecWEB benchmark
Stand-by
reports 1.8 Gbps in
the normal state and
850 Mbps under one
network link fails
High availability and
performance are argued
When the system goes
The server must
DS-Bench/D-Cloud
reports 0.5
milli-second recovery time
When the system goes
down, it must recover
within 1 milli-second
The server must
handle 1 million
requests per second
DS-Bench/D-Cloud DS-Bench/D-Cloud
The network link offers at
least 800Mbps
second recovery time
when faults happen
DS-Bench/D-Cloud 3
reports 1.5 million requests /sec using SpecWEB benchmark
reports 0.5 milli-second recovery time
when faults happen
DS-Bench/D-Cloud reports 920 Mbps
DS-Bench/D-Cloud
reports 920 Mbps
An Overview of DS-Bench/D-Cloud
•
Anomaly situations
– hardware faults, software
bugs, human errors
•
Two Measurement
Environments
Environments
– Target Hardware Specific
Environment
– Cloud Environment
• Virtual Machines
•
Key Features
•
Key Features
– Benchmark Database
– Customization of Anomaly
y
Scenario
– Benchmark Cloud
DS-Bench/D-Cloud 4An Overview of DS-Bench/D-Cloud
• Key Features
Benchmark Database
–
Benchmark Database
– Customization of Anomaly
Scenario
– Benchmark Cloud
•
Benchmark Database
•
Performance Benchmarks (Workload)
•
Performance Benchmarks (Workload),
Measurement Tools, Anomaly Generators,
Anomaly Scripts
R
lt
f d
d bilit b
h
k
•
Results of dependability benchmarks
•
The system maintainer can confirm
whether the behavior observed at the
operation phase has been reported
during some previous anomaly situation.
DS-Bench/D-Cloud 5
An Overview of DS-Bench/D-Cloud
• Key Features
Benchmark Database
– Benchmark Database
–
Customization of Anomaly
Scenario
– Benchmark Cloud
•
Anomaly is constructed with
anomaly generators
and an
anomaly script
written in XML
and an
anomaly script
written in XML.
•
An anomaly script gives instructions on how to
use anomaly generators.
A
l
i t f
li ti
fi ld
•
A new anomaly script for a new application field
is described using anomaly generators.
•
Anomaly scripts may also be used and modified
by other users for other target environments.
An Overview of DS-Bench/D-Cloud
• Key Features
Benchmark Database
•
Considerable computing resources are required
in order to test many dependability benchmarks
– Benchmark Database
– Customization of Anomaly
–
Benchmark Cloud
in order to test many dependability benchmarks.
•
Some benchmarks may run on any computer
that provides a virtual machine environment in
hi h h d
i
l t
t
Benchmark Cloud
which a hardware simulator can operate.
•
A benchmark cloud
is a benchmark execution
environment which uses general purpose
computer resources, in which virtual machines
run, and which can model application-specific
hardware.
•
Dependability benchmarks run in parallel with
the benchmark cloud, and thus the benchmark
cloud contributes to reducing the execution time
cloud contributes to reducing the execution time
of dependability benchmarks.
DS-Bench/D-Cloud 7
Limitations and Intentions
•
The DS-Bench/D-Cloud is only effective for
anticipated anomaly conditions.
– It does not reveal weaknesses of the
system in unexpected anomaly conditions
It does not find the cause of such
D-Visor D-Box D-Logger D-Analyzer D-System Monitor D-Case Editor D-Effector Type/Model Checker DS-Bench/ D-Cloud D-Case Walker D-Visor D-Box D-Logger D-Analyzer D-System Monitor D-Case Editor D-Effector Type/Model Checker DS-Bench/ D-Cloud D-Case Walker
– It does not find the cause of such
conditions.
•
When an unknown cause of a new anomaly
D-Case DEOS Process D-Case Anomalydetection Analysis Quick Responsiveness Evidence management EvidencebyUnittest Evidenceby Integrationtest DCase Update
DEOS Framework䋧Tools
DCase Execution D-Case DEOS Process D-Case D-Case DEOS Process D-Case Anomalydetection Analysis Quick Responsiveness Evidence management EvidencebyUnittest Evidenceby Integrationtest DCase Update
DEOS Framework䋧Tools
DCase Execution
situation is faced,
1. Finding the cause
2 Developing a program to generate the anomaly
Test Verification Design D-Case growthcycle Requirement/Environment Changeaccommodatingcycle Systemchangerequests basedonstakeholders’ agreement
Implementation Verification Test Design D-Case growthcycle Requirement/Environment Changeaccommodatingcycle Systemchangerequests basedonstakeholders’ agreement
Implementation Verification Test Design D-Case growthcycle Requirement/Environment Changeaccommodatingcycle Systemchangerequests basedonstakeholders’ agreement Implementation
2. Developing a program to generate the anomaly,
i.e. anomaly load.
3. Registering the program in the DS-Bench/D-Cloud
database
The range of testing for dependability
in the target system is increased.
Failurereactingcycle Achievementof Causeanalysis Responsiveaction Failureprevention Anomalydetection/ Unexpectedfailure h Stakeholders’ requirements/ System environment changes Failurereactingcycle Achievementof Causeanalysis Responsiveaction Failureprevention Anomalydetection/ Unexpectedfailure h Stakeholders’ requirements/ System environment changes Failurereactingcycle Achievementof Causeanalysis Responsiveaction Failureprevention Anomalydetection/ Unexpectedfailure h Stakeholders’ requirements/ System environment changes
database
DS-Bench/D-Cloud 8
Dependability in other systems is also
increased using this new benchmark
accountability happen Normal operation accountability happen Normal operation accountability happen Normal operation
The Rest of Talk
• Details of DS-Bench
• Details of D-Cloud
DS-Bench/D-Cloud 9
Details of DS-Bench
: Basic Components
Tester
Basic Workflow of DS-Bench
1.
Edit benchmark
configurations
2
E
t th b
h
k
DS-Bench
Controller &
2.
Execute the benchmark
3.
Examine the benchmark
result
4
Compare the result with
Frontend
XML
4.
Compare the result with
previously recorded results
Target Machines
Network Switch
Target Hardware
Specific Environment
XML (Benchmark Results) XML (Benchmark Scenario& Anomaly Script)
Web Interface
Network Switch
(Network Port is controlled via SNMP䋩Benchmark Database
PDU
Anomaly Generator ToolMeasurement Tool Benchmark Results Benchmark Performance B h k Anomaly Script DS-Bench/D-Cloud 11 䋨Power is controlled via HTTP,SNMP䋩 Results Benchmark pDS-Bench: Usage Example
• Providing evidences for D-Case
LinkRefuse:
LinkRefuse:
Shutting off a
network link
Anomaly Load
Gn:
RI2N
RI2N
The network link offers at least
1.6Gbps
under normal situation
800Mbps
under one link failure
D-Case Diagram
En:
DS B
h
lt i
f
hi
d
Sub Goal
Target Machines
DS-Bench result: iperf achieved
around
1.8Gbps
under normal
situation
around
850Mbps
under one
Benchmark Result
iperf: Network bandwidth
measurement
Benchmark Program
2010/11/4 Version DS-Bench/D-Cloud 12
around
850Mbps
under one
link failure
= Evidence
Benchmark Program
Screenshot of DS-Bench
Upper row – Benchmark programs
Lower row – Anomaly loads
y
This border shows
Ti
li
f th
the current time
Time-line of the
benchmark
2010/11/4 Version DS-Bench/D-Cloud 13
Retrieving Data from Benchmark Outputs
• Usually the benchmark output comes with pre-formatted text tables
– In DS-Bench they are automatically converted to machine-readable
In DS Bench, they are automatically converted to machine readable
values, by
user-provided
text cutting rule
---<caption>iperf</caption> <table>
h d
---Client connecting to 10.0.39.14, TCP port 5001 TCP window size: 0.02 MByte (default) ---[ 3] local 10.0.39.13 port 50590 connected with
10 0 39 14 t 5001 <header> ¥[¥s*ID¥]¥s*Interval¥s*Transfer¥s*Bandwidth </header> <rheader> ID,Interval,Transfer,Bandwidth, 10.0.39.14 port 5001
[ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 205 MBytes 1718 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 1.0- 2.0 sec 218 MBytes 1828 Mbits/sec
, , , , </rheader> <data> (¥d+).¥s*(.* sec)¥s*(.* [MGK]Bytes)¥s*(.* .*) </data> <begin> [ ] y
[ ID] Interval Transfer Bandwidth [ 3] 2.0- 3.0 sec 220 MBytes 1847 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 3.0- 4.0 sec 222 MBytes 1866 Mbits/sec
[ ID] Interval Transfer Bandwidth
<begin> ¥[¥s*ID¥]¥s*Interval¥s*Transfer¥s*Bandwidth </begin> <key>0111</key> <value>0111</value> /t bl [ ID] Interval Transfer Bandwidth
[ 3] 4.0- 5.0 sec 220 MBytes 1844 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 5.0- 6.0 sec 222 MBytes 1859 Mbits/sec [ ID] Interval Transfer Bandwidth
[ 3] 6 0 7 0 88 5 MB t 742 Mbit /
</table>
Text Matching Rule Per Each
Benchmark Program
[ 3] 6.0- 7.0 sec 88.5 MBytes 742 Mbits/sec
DS-Bench: Supported Benchmarks and Anomaly Loads
• While benchmark programs and anomaly loads can be added
to the framework by users the following programs are
to the framework by users, the following programs are
supported by default.
Name
Description
Bonnie++
Disk I/O benchmark
cpustress
A program just consuming a lot of CPU time (*)
Hackbench
System benchmark by generating a lot of processes simultaneously
IMB
Intel MPI Benchmark
iperf
Network bandwidth benchmark
LinkRefuse
Shutting off ports on a network switch (*)
g
p
( )
(*)…Mainly used as anomaly loads
LMBench
System performance benchmark
memstress
A program just allocating a lot of memory (*)
MPD
MPI daemon
( )…Mainly used as anomaly loads
MPD
MPI daemon
NetCMD
Adjusting network output bandwidth, injecting packet loss and packet reordering (*)
NPB
NAS Parallel Benchmark
SupplyRefuse
Shutting down power distribution to a target machine (*)
2010/11/4 Version DS-Bench/D-Cloud 15
SupplyRefuse
Shutting down power distribution to a target machine (*)
terminator
A program to unconditionally kill the specified process (*)
DETAILS OF D-CLOUD
Details of D-Cloud:
Operation Flow of D-Cloud
Eucalyptus
Controller
㪪㪼㫋㫌㫇㩷
㪾㫌㪼㫊㫋 㪦㪪
㽴
㽵
㪾㫌㪼㫊㫋㩷㪦㪪
㪛㫀㫊㫋㫉㫀㪹㫌㫋㪼㩷㪭㪤㩷
㫀㫄㪸㪾㪼㫊㩷
Transferfilesfor
configuration,input
data,program,etc.
㽵
㽷
㪫㫉㪸㫅㫊㪽㪼㫉㩷㩷
㫊㪺㪼㫅㪸㫉㫀㫆
㽳
㪛㪼㫊㪺㫉㫀㪹㪼㩷
㽲
㫊㪺㪼㫅㪸㫉㫀㫆
Transferoflog,
snapshot,outputdata
㽺
Startup
Startup
guestOS
DCloud
Controller
㽶
㫊㪺㪼㫅㪸㫉㫀㫆
Tester
C
l t
㽻
Build up
㽸
㽹
T t i j
t f
lt
Complete
guestOS
㽻
VMnodes
Obtainlog,snapshot,
etc
㽼
DS-Bench/D-CloudBuildup
environment
onguestOS
㽸
Test,injectfaults,
createsnapshotalongscenario
㽹
etc.
17Testing Process Using D-Cloud
• User describes configuration file in
XML manner.
Description of configuration file
• Contents of configuration file
– Definitions
p
g
<jobDescription>
<machineDefinition>
Reference
• Machine definition:
Hardware specification
• System definition:
Machine description
</machineDefinition>
<systemDefinition>
Reference
• System definition:
Software, libraries, …
• Injection definition:
Fault type duration of fault
<systemDefinition>
System description
</systemDefinition>
<i j
ti
D fi iti
>
Fault type, duration of fault
– Scenario for test
• Assignment of machine instance
<injectionDefinition>
Fault injection description
</injectionDefinition>
• Designation of execution host
• Command for execution, fault injection
• Timing specification
<testDescription>
Scenario for test
</testDescription>
• Timing specification
Example of System Test Using D-Cloud: HA Server System
D-Case
Diagram
• High Availability Server
G0: HA Server is dependable.Goal
– Load-balancing of
the access to Web servers
S0: …
G-LVS: Load-balancer continue to work when the fault occurs
Sub-Goal
– Load balancer is
redundant with stand-by
operation
to work when the fault occurs.
E-LVS: D-Cloud result:
Lv1 can replace lv0 after heartbeat period.
Evidence
operation.
• Load balancer monitors each
other by heartbeat.
Load balancer (Linux Virtual Server) Web server (Apache) Client Normaly
• The system behavior is
evaluated under the faulty
t t
Normal state
state.
– Obtain the evidence for the
D-Case
lv0 ws0
Heartbeat
– Responses to HTTP requests
are observed on the client.
lv1 ws1
Faulty state
Stand-by
DS-Bench/D-Cloud 19
Example Test Scenario for HA Server(1)
<systemDefinition>
<systemconf>
<name>systemA</name>
<host>
<machineDefinition>
<machine>
<name>LVS</name>
<hostname>
lv0
</hostname>
<machinename>LVS</machinename>
<config>lvconfig</config>
</host>
/
<cpu>1</cpu>
<mem>2048</mem>
<nic>1</nic>
<host>
<hostname>
lv1
</hostname>
<machinename>LVS</machinename>
<config>lvconfig</config>
nic 1 /nic
<id>emi-1D8C0CAA</id>
</machine>
Load balancer(Linux Web server
</host>
<host>
<hostname>
ws0
</hostname>
<machinename>WS</machinename>
<machine>
<name>WS</name>
<cpu>1</cpu>
(Linux Virtual Server) Web server (Apache) Client<config>wsconfig</config>
</host>
<host>
<hostname>
ws1
</hostname>
<cpu>1</cpu>
<mem>2048</mem>
<nic>1</nic>
<id>emi-0ACC0C2D</id>
lv0 ws0<machinename>WS</machinename>
<config>wsconfig</config>
</host>
</systemconf>
<id>emi-0ACC0C2D</id>
</machine>
</machineDefinition>
lv1 ws1</systemDefinition>
DS-Bench/D-Cloud 20Example Test Scenario for HA Server(2)
<injectionDefinition> <injection>
<name>injectionA</name>
<fault>
Types of pre-defined fault injection
<fault>
<location>network</location># fault location = network
<target>eth0</target> # faulty device = eth0
<kind>loss</kind> # fault type = packet loss
<time>50</time> # duration of the fault events 50sec
Devices
Contents
Elements
Hard disk
Error of specified sector badblockyp
p
j
<time>50</time> # duration of the fault events = 50sec.
</fault> </injection> </injectionDefinition>
Specified sector is read-only
Error detection by ECC Received data contains error readonly ecc corrupt slow
S t
d fi iti
errorResponse of disk becomes slow
Network
1bit error of packet2b f k 1bit 2b <testDescription> <run> <name>testA</name> <systemname>systemA</systemname>
System definition
2bit error of packet Error detection by CRC Packet loss
NIC is not responding
2bit crc loss nic syste a e syste /syste a e
<halt when="300">down</halt> # 300sec. later after power-on,
<script> # halt “systemA”
<on>lv0</on>
<putFile>test.sh</putFile> # Put “test.sh” on “lv0”
Memory
bit errorByte at specified address contains error
Bit byte <putFile>test.sh</putFile> # Put test.sh on lv0
<exec>test.sh</exec> # Execute “test.sh” on “lv0” <inject when=“150">injectionA</inject> # 150sec. later after power-on,
</script> # inject “injectionA” event.
</run> </run> </testDescription> </jobDescription>
Injection definition
DS-Bench/D-Cloud 21Screen Shot of D-Cloud System
lv0
lv1
Fault
injection
injection
Example Using D-Cloud with SpecC Device Model
Simulation with SpecC device
SpecC Hardware status
SpecC description
p
model can be realized.
Hardware status Linux console on VM Keypad SpecC simulator (SCRC) Display Linux on VM(QEMU) Keypad Socket Status / Interrupt Control Interrupt signal TCP select() read() User program Keypad Device Driver Display Status / Control Registers Output read() write() TCP DS-Bench/D-Cloud 23 Display
Device Driver Socket
Output buffer
write()