• No results found

Deploying Server-side File System Monitoring at NERSC

N/A
N/A
Protected

Academic year: 2021

Share "Deploying Server-side File System Monitoring at NERSC"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

Deploying Server-side File System

Monitoring at NERSC

(2)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

1.2

Contents

1

The Franklin Cray XT4

Cerebro

The Lustre Monitoring Tool

The Lustre Dashboard

2

Data Analysis

Monitoring Specific Tests or Intervals

Data Mining for Average and Aggregate Behavior

3

A Simple Model for I/O

Poisson Distributions

Franklin’s Actual Distribution

Late Breaking News

(3)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

Monitoring the I/O Subsystem

CN

network

10.0

switch

fc

RAID

RAID

OST

OSS

OST

OST

OST

OSS

OST

OST

OST

OST

switch

CN

CN

fc

MDS

Net

(4)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References 1.4

Cerebro

OST

OST

cerebro_metric_lmt_mds.so

cerebro_metric_lmt_ost.so

cerebro_monitor_lmt.so

OSS

/usr/lib/cerebro/*

cerebro_metric_lmt_oss.so

OST

OST

(5)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

LMT

stats

uuid

OSS

OST

/proc/meminfo

/proc/stat

/proc/fs/lustre/obdfilter/*/

OST

OST

filesfree

OST

filestotal

kbytesfree

kbytestotal

numrefs

(6)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

1.6

An OSS Tuple

Cerebro Protocol Version

Host Name

CPU Utilization

Memory Utilization

(7)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

OST Data Values

Cerebro Protocol Version

Host Name

UUID

Bytes Read

Bytes Written

Kbytes Free

Kbytes Used

Inodes Free

Inodes Used

(8)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

1.8

MDS Operations

mysql

>

select * from OPERATION_INFO;

OPERATION_NAME UNITS OPERATION_NAME UNITS

req_waittime

usec

mds_getattr_lock

usec

req_qdepth

reqs

mds_close

usec

req_active

reqs

mds_reint

usec

reqbuf_avail

bufs

mds_readpage

usec

ost_reply

usec

mds_connect

usec

ost_getattr

usec

mds_disconnect

usec

ost_setattr

usec

mds_getstatus

usec

ost_read

bytes

mds_statfs

usec

ost_write

bytes

mds_pin

usec

ost_create

usec

mds_unpin

usec

ost_destroy

usec

mds_sync

usec

ost_get_info

usec

mds_done_writing

usec

ost_connect

usec

mds_set_info

usec

ost_disconnect

usec

mds_quotacheck

usec

ost_punch

usec

mds_quotactl

usec

ost_open

usec

mds_getxattr

usec

ost_close

usec

mds_setxattr

usec

ost_statfs

usec

ldlm_enqueue

usec

...

(9)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

The Lustre Dashboard

(10)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

1.10

Four IOR Tests

0

2000

4000

6000

8000

10000

12000

22:45

22:48

22:51

22:54

22:57

23:00

23:03

23:06

23:09

23:12

23:15

23:18

Data Rate (MB/s)

Time (PDT)

Aggregate OST rates from 2008-07-28 22:45:00

read rate

write rate

(11)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

24 Hours of LMT Data

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Data Rate (MB/s)

read rate

write rate

(12)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References 1.12

Daily Averages

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

07/01

08/01

09/01

10/01

11/01

12/01

01/01

02/01 03/01

Data Rate (GB/s)

Time (PDT)

Average daily rates

Read

Write

(13)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

http://en.wikipedia.org/wiki/Poisson_distribution:

λ

k

e

λ

C

(

m

) =

N

×

f

λ

(

int

(

m/M

))

(14)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References 1.13

http://en.wikipedia.org/wiki/Poisson_distribution:

f

λ

(

k

) =

λ

k

e

λ

k

!

C

(

m

) =

N

×

f

λ

(

int

(

m/M

))

(15)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

Poisson Distribution:

λ

=

2

10

100

1 K

10 K

100 K

1 M

10 M

100 M

count

Poisson distribution

lambda = 2, M = 125MB, N = 250M

(16)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References 1.15

Poisson Distribution:

λ

=

20

10

100

1 K

10 K

100 K

1 M

10 M

100 M

0 0.2 GB 0.4 GB 0.6 GB 0.8 GB 1 GB 1.2 GB 1.4 GB 1.6 GB 1.8 GB 2.0 GB

count

m - The amount of data transferred during 5 second interval.

Poisson distribution

(17)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

250 M LMT Observations

100

1 K

10 K

100 K

1 M

10 M

Count

Distribution of LMT observed rates

read

write

(18)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

1.17

Two weeks of recent observations

10

100

1 K

10 K

100 K

1 M

0

500

1000

1500

2000

2500

Count

MB

Distribution of LMT observed rates

read

write

(19)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

I would like to acknowledge and thank:

Al Chu

The author of Cerebro.

Herb Wartens

The author of the Lustre Monitorining Tool

plug-ins.

Both work at Lawrence Livermore National Lab, which

supported the development of these tools. Both were very

generous with their time as I deployed the software on Franklin.

(20)

Deploying

Server-side File

System Monitoring at

NERSC

Andrew Uselton

The Franklin Cray XT4

Cerebro The Lustre Monitoring Tool The Lustre Dashboard

Data Analysis

Monitoring Specific Tests or Intervals Data Mining for Average and Aggregate Behavior

A Simple Model for I/O

Poisson Distributions Franklin’s Actual Distribution Late Breaking News Acknowledgements and References

1.19

The software is available from:

Both applications are open source and available from

Sourceforge.

Cerebro

http:

//sourceforge.net/projects/cerebro

LMT

http://sourceforge.net/projects/lmt/

If you would like hints and encouragement with

getting this software deployed, contact me:

Andrew Uselton (acuselton@lbl.gov)

If you get results from your deployment that you

would like to share, please do so.

http://sourceforge.net/projects/cerebro http://sourceforge.net/projects/lmt/

References

Related documents

Because supersonic atomization has mostly been used in the cooling of electronic ships and small components, and the literature has shown that atomization is capable of carrying away

NOTES: NSDUH = National Survey on Drug Use and Health; SAMHSA = Substance Abuse and Mental Health Services Administration; ACASI = audio computer-assisted self interviewing; NHIS

And the angel answered and said unto her The Holy Spirit shall come upon Ioseph thy Spouse, and the power of the Highest shall overshadow thee, O Mary, therefore also that holy

Zavandsky attributed their student achievement gains to the implementation of the following turnaround strategies to their high success rates by: (a) placing effective leaders

Here’s what is inside this book: • Theory to help you understand important tai chi concepts • Warm-up exercises for safe and proper tai chi practice • Fundamentals so your tai

In addition, the IDS devices along the electronic perimeter could form an overlay network (i.e., a virtual private network over the Internet) and function in a distributed

The report includes a brief history of lifeguarding services in the United States; data and findings related to the use of lifeguards in preventing drowning in open water and

Production Accounts : For each institutional sector (except the non-resident sector) the value of output is recorded, intermediate consumption (goods and services used up in