Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
Deploying Server-side File System
Monitoring at NERSC
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
1.2
Contents
1
The Franklin Cray XT4
Cerebro
The Lustre Monitoring Tool
The Lustre Dashboard
2
Data Analysis
Monitoring Specific Tests or Intervals
Data Mining for Average and Aggregate Behavior
3
A Simple Model for I/O
Poisson Distributions
Franklin’s Actual Distribution
Late Breaking News
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
Monitoring the I/O Subsystem
CN
network
10.0
switch
fc
RAID
RAID
OST
OSS
OST
OST
OST
OSS
OST
OST
OST
OST
switch
CN
CN
fc
MDS
Net
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References 1.4
Cerebro
OST
OST
cerebro_metric_lmt_mds.so
cerebro_metric_lmt_ost.so
cerebro_monitor_lmt.so
OSS
/usr/lib/cerebro/*
cerebro_metric_lmt_oss.so
OST
OST
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
LMT
stats
uuid
OSS
OST
/proc/meminfo
/proc/stat
/proc/fs/lustre/obdfilter/*/
OST
OST
filesfree
OST
filestotal
kbytesfree
kbytestotal
numrefs
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
1.6
An OSS Tuple
Cerebro Protocol Version
Host Name
CPU Utilization
Memory Utilization
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
OST Data Values
Cerebro Protocol Version
Host Name
UUID
Bytes Read
Bytes Written
Kbytes Free
Kbytes Used
Inodes Free
Inodes Used
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
1.8
MDS Operations
mysql
>
select * from OPERATION_INFO;
OPERATION_NAME UNITS OPERATION_NAME UNITS
req_waittime
usec
mds_getattr_lock
usec
req_qdepth
reqs
mds_close
usec
req_active
reqs
mds_reint
usec
reqbuf_avail
bufs
mds_readpage
usec
ost_reply
usec
mds_connect
usec
ost_getattr
usec
mds_disconnect
usec
ost_setattr
usec
mds_getstatus
usec
ost_read
bytes
mds_statfs
usec
ost_write
bytes
mds_pin
usec
ost_create
usec
mds_unpin
usec
ost_destroy
usec
mds_sync
usec
ost_get_info
usec
mds_done_writing
usec
ost_connect
usec
mds_set_info
usec
ost_disconnect
usec
mds_quotacheck
usec
ost_punch
usec
mds_quotactl
usec
ost_open
usec
mds_getxattr
usec
ost_close
usec
mds_setxattr
usec
ost_statfs
usec
ldlm_enqueue
usec
...
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
The Lustre Dashboard
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
1.10
Four IOR Tests
0
2000
4000
6000
8000
10000
12000
22:45
22:48
22:51
22:54
22:57
23:00
23:03
23:06
23:09
23:12
23:15
23:18
Data Rate (MB/s)
Time (PDT)
Aggregate OST rates from 2008-07-28 22:45:00
read rate
write rate
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
24 Hours of LMT Data
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Data Rate (MB/s)
read rate
write rate
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References 1.12
Daily Averages
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
07/01
08/01
09/01
10/01
11/01
12/01
01/01
02/01 03/01
Data Rate (GB/s)
Time (PDT)
Average daily rates
Read
Write
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
http://en.wikipedia.org/wiki/Poisson_distribution:
•
λ
k
e
−
λ
•
C
(
m
) =
N
×
f
λ
(
int
(
m/M
))
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References 1.13
http://en.wikipedia.org/wiki/Poisson_distribution:
•
f
λ
(
k
) =
λ
k
e
−
λ
k
!
•
C
(
m
) =
N
×
f
λ
(
int
(
m/M
))
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
Poisson Distribution:
λ
=
2
10
100
1 K
10 K
100 K
1 M
10 M
100 M
count
Poisson distribution
lambda = 2, M = 125MB, N = 250M
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References 1.15
Poisson Distribution:
λ
=
20
10
100
1 K
10 K
100 K
1 M
10 M
100 M
0 0.2 GB 0.4 GB 0.6 GB 0.8 GB 1 GB 1.2 GB 1.4 GB 1.6 GB 1.8 GB 2.0 GBcount
m - The amount of data transferred during 5 second interval.
Poisson distribution
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
250 M LMT Observations
100
1 K
10 K
100 K
1 M
10 M
Count
Distribution of LMT observed rates
read
write
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
1.17
Two weeks of recent observations
10
100
1 K
10 K
100 K
1 M
0
500
1000
1500
2000
2500
Count
MB
Distribution of LMT observed rates
read
write
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
I would like to acknowledge and thank:
Al Chu
The author of Cerebro.
Herb Wartens
The author of the Lustre Monitorining Tool
plug-ins.
Both work at Lawrence Livermore National Lab, which
supported the development of these tools. Both were very
generous with their time as I deployed the software on Franklin.
Deploying
Server-side File
System Monitoring at
NERSC
Andrew Uselton
The Franklin Cray XT4
Cerebro The Lustre Monitoring Tool The Lustre DashboardData Analysis
Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior
A Simple Model for I/O
Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References
1.19
The software is available from:
Both applications are open source and available from
Sourceforge.
Cerebro
http:
//sourceforge.net/projects/cerebro
LMT
http://sourceforge.net/projects/lmt/
If you would like hints and encouragement with
getting this software deployed, contact me:
Andrew Uselton (acuselton@lbl.gov)
If you get results from your deployment that you
would like to share, please do so.