Distributed Storage Caches and Distributed System Performance Analysis Tools. Data Intensive Computing Tutorial SuperComputing 98

(1)

Distributed Storage Caches and Distributed

System Performance Analysis Tools

Data Intensive Computing Tutorial

SuperComputing ‘98

Brian L. Tierney ([email protected]) Future Technologies Group

Lawrence Berkeley National Laboratory Berkeley, CA 94720

(2)

Outline

♦ Overview of Distributed Storage

♦ Data Handling Architecture

♦ Distributed Storage Applications

- Terrain Visualization

- Medical Imaging

- HENP

♦ A Distributed Storage System: The DPSS

♦ Next Generation Architectures

♦ Distributed Systems Performance Analysis

- NetLogger Components

- NTP

(3)

Distributed Storage

Why is distributed storage important for Data Intensive Computing?

• Researchers often are not at the same location as the

data source

• Compute cycles are often not at the same location as

(4)

Distributed Storage

Other Advantages of Distributed Storage:

♦ sharing of resources

♦ fault tolerance / load balancing through replicated data

at multiple sites, where a fault might be:

- host failure

- disk failure

- network failure

- software fault

- network congestion

- excessive CPU load

♦ added flexibility: provides the ability to move the data to

the compute cycles, or move the compute cycles to the data, depending on network speed

(5)

Distributed Storage Total Archive Visualization / Analysis Processing Resources User Site

Remote Access to Large Data Archives

Data Source Partial Replication of Archive Visualization / Analysis Processing Resources WAN Archive Site

(6)

Example Architecture for On-line Data Sources

A prototype architecture has evolved from our experiences:

• distributed satellite image processing in MAGIC testbed

• on-line angiography (x-ray video) systems for Kaiser

Hospital

• simulated on-line HENP detectors

Key features of the architecture:

• very high-speed cache that is distributed, scalable, and

dynamically configurable

• common, low-level, high data rate interface that supports

various application I/O semantics

• high-speed tertiary storage interface

• data cataloguing and access system

(7)

Example Architecture data sour ce visualization and steering applications parallel computational PKI access real-time data cache partition processing scratch partition application data cache partition cache: large, high-speed, network distributed (e.g., MSS high data rate interface

high data rate h.d.r . I/O semantics MSS MSS MSS (e.g., HPSS)

data cataloguing and access system

tertiary storage “data

mover” high data rate

(8)

Example Architecture

♦ Advantages of this architecture:

• first level processing can be done using resources at

the collaborators sites (this type of experiment typically involves several major institutions)

• large tertiary storage systems exhibit substantial

economies of scale, and so using a large tertiary

storage system at, say, a supercomputer center, should result in more economical storage, better access

(because of much larger near-line systems - e.g. lots of tape robots) and better media management.

(9)

Distributed Storage Applications

Background:

♦ We first developed a distributed storage system, called

the DPSS (Distributed Parallel Storage System) as part of the DARPA-sponsored MAGIC Gigabit Network Testbed (see: http://www.magic.net).

♦ The prototype high-speed application for this system

was TerraVision, developed at SRI.

♦ TerraVision uses tile images and digital elevation models

(10)

Distributed Storage Applications

Landscape (large image) represented by image tiles

kept in the DPSS

TerraVision determines and predicts the tiles intersected

by path of travel

TerraVision and the DPSS Cooperate to Visualize Landscape

TerraVision produces a realistic visualization of the landscape Path of travel 17 27 11 12 13 14 15 16 21 22 23 24 25 26 31 32 33 34 35 36 37 41 51 61 71 42 52 62 72 43 44 45 46 47 53 54 55 56 57 63 64 65 66 67 73 74 75 76 77 Human user controls path of travel Multiple DPSS servers at multiple sites operate in

parallel to supply the required tiles to TerraVision

The network delivers image tiles to

(11)

Medical Imaging Data

♦ The DPSS was used with an LBNL/Kaiser Permanente

collaboration focused on connecting remote, on-line instrument systems to “real-time” digital libraries.

♦ System characteristics:

• automatic generation of metadata

• automatic cataloguing of the data and the metadata as

the data is received

• transparent management of tertiary storage systems

where the original data is archived (via Unitree NFS interface)

• facilitation of cooperative research by providing

specified users at local and remote sites immediate as well as long-term access to the data

(12)

♦ System characteristics (cont.):

• mechanisms to incorporate the data into other

databases or documents

• cardio-angiography data was collected directly from a

Phillips scanner in the San Francisco Kaiser hospital Cardiac Catheterization Laboratory — when the data collection for a patient is complete (about once every 20–40 minutes), 500–1000 megabytes of digital video data was sent across the ATM network to the WALDO system at LBNL which makes the data available to

physicians in other Kaiser hospitals

(13)

NTON and BAGNet network

Lawrence Berkeley National Laboratory and

Kaiser Permanente On-line Health Care Imaging Experiment

in the

LBNL WALDO server and DPSS distributed cache for data processing,

cataloguing, and storage Kaiser San Francisco Hospital Cardiac

Catheterization Lab (digital video capture) Kaiser Oakland Hospital (physicians and databases) Kaiser Division of Research

Automatically generated user interfaces providing indexed access to the large data objects

(14)

High Energy and Nuclear Physics Data

♦ Data source: The STAR detector at RHIC (Brookhaven

National Lab).

• This detector puts out a steady state data stream of

20-40 MBytes/second.

♦ This application requires a data handling architecture

capable of supporting the processing and storage of over 2 TB / day:

(15)

HPSS (mass Storage

System) high-speed, distributed random

access cache detector interface archive management 20 MBy/s 20 MBy/s 2-10 MBy/s 2-10 MBy/s 0.1-2 MBy/s 23-30 MBy/s (typically twice the

raw data rate, so 40 MBy/s) analysis interface reconstruction interface reconstruction interface reconstruction interface analysis interface analysis interface

this model provides a standard high-performance data

interface for all sources of data

(16)

HPSS Mass Storage System

Distributed-Parallel Storage System servers

Data Placement Optimization • generate optimal indexing

strategy to organize and re-organize data layout

• manage third-party transfers to implement layout

data writer • implements third-party

transfers to DPSS servers

• implements data placement

data transfer agent • transfers data to

applications

• implements multi-level

block cache management MSS access methods

memory disk network interfaces bit-file movers • implement third-party transfers from MSS cache and tape

PDSF / many small SMP systems

Incorporation of DPSS into NERSC physics

analysis computing cluster

HENP analysis-1 HENP analysis-2 HENP analysis-3

data access (DBMS, e.g. Objectivity) (e.g., OOFS) data presentation block-level transport data access

(e.g. Malon object manager) data access methods

(17)

Current STAF/DPSS Performance

♦ A recent set of experiments were conducted over the

National Transparent Optical Network testbed — eight 2.4 gigabit/sec data channels around the San Francisco Bay.

♦ The application network was IP over OC-12 (622 Mbit/

sec) ATM.

♦ An application running on a Sun Enterprise-4000 SMP at

SLAC (Palo Alto) read data from four distributed disk

servers at LBNL (Berkeley), parsed the XDR records and placed the data into the application memory.

(18)

Distributed Storage Applications LBNL 0 5 Mi. 10 Mi. UC Berkeley to MAGIC testbed

tertiary (mass) storage

DPSS: network parallel disk system

LLNL

SRI Ames

San Jose OC-12 (622 Mb/s)

SLAC Fore ASX-1000

ATM switch NOW UCSF Kaiser Nortel Vector ATM switch OC12 OC12 OC12 OC12 Sprint ATL Burlingame Nortel Vector ATM switch OC12 OC3 OC3 OC12 ACTS SNL Nortel V ector A TM switch OC12 OC12 OC12 OC12 NTON

® 8×2.5 Gbit/s all optical

network testbed ® network circumference is about 300km) Cisco 1010L ATM switch OC12 OC12 Sun E4000 SMP C C C C C _{C C} CPU MEM I/O Ether 100 mbs X ATM OC-12 ATM switch

OC-12 data path OC-3 data path STAF application

platform

LBNL↔NTON↔SLAC

experiments: Hardware configuration

(19)

♦ Results:

• each DPSS server transfer rate is 14.25 MBytes/sec

• OC-12 receiver was able read data from 4 servers in

parallel at 57 MBytes/sec

- this is the rate of data delivered from datasets in a

distributed cache to the remote application memory, ready for analysis algorithms to commence

operation.

• this is equivalent to 4.5 TeraBytes/day!

• latency for a single 64 KByte data block is 25 ms, so

(20)

Distributed Parallel Storage System

Design Goals:

♦ support data-intensive applications

♦ provide very high data throughput

• parallelism at every level, including disk, SCSI bus, network,

and server

♦ high-speed WAN aware

♦ scalable

• throughput and capacity

♦ economical

• use only low-cost commodity hardware components

♦ location transparency

(21)

DPSS DPSS Architecture:

♦ A dynamically configurable network-striped disk array

designed to supply high speed data streams to other processes in the network

♦ Provides the functionality of a single, very large,

random-access, block-oriented I/O device (i.e.: a large virtual disk)

♦ At the user level, it is a semi-persistent cache of named

data-objects

♦ At the storage level, it is a logical block server

♦ Allows client applications complete freedom to

determine optimal data layout, replication, and dynamic reconfiguration

(22)

DPSS Implementation:

♦ zero memory copies of data blocks, and is all user level

code

♦ runs on Solaris, IRIX, DEC Unix, Linux, FreeBSD, Solaris

X86

♦ very large logical address/name space (16 bytes)

♦ highly distributed, a high degree of parallelism at every

level, and highly pipelined

♦ data blocks are declustered across both disks and

servers to maximize parallelism

♦ supports parallel reading and writing

♦ TCP Transport (UDP data transfers also supported)

(23)

DPSS ATM network interface DPSS disk server ATM workstation data blocks

ATM switch single high

bandwidth sink (or source) ATM network (interleaved cell streams representing multiple virtual circuits) DPSS ser v er AT M Client application API

physical block requests

data requests ATM network interface - data structure server

- data set access control - logical name translation - block-level access control DPSS disk server workstation data blocks DPSS disk server workstation data blocks DPSS ser v er AT M data fragment streams returned data stream logical block requests

(24)

DPSS logical to physical mapping disk servers disk servers disk servers b u f f e r various access semantic c l i e n t DPSS requests data

(25)

DPSS generate block map entries disk servers b u f f e r block placemen t c l i e n t

(26)

DPSS

returned data stream (“third-party” transfers directly from

the storage servers to the application) Application

(client) ♦ block storage ♦ block-level access control Disk Servers security context - 1 (system integrity & physical resources)

Application data structure access methods (data structure to logical block-id mappings - e.g.:

♦ JPEG video

♦ multi-res image pyramids

♦ Unix r/w

♦ XDR

data requests

security context - 2 (data use conditions)

Agent-based management of dataset

metadata - locations, state, etc.

Distributed-Parallel Storage System Architecture (data reading illustrated)

Agent-based management of redundant

Masters

Agent-based management of storage server and network state

vis a vis applications

mem buf

physical block

requests logical block

requests

Data Set Manager • user security context

establishment

• data set access control

• metadata

DPSS Master

Request Manager • logical to physical name

translation • cache management DPSS API (client-side library) Resource Manager

• allocate disk resources

• server/disk resource access control

(27)

DPSS

♦ Typical DPSS implementation

- 5 UNIX workstations (e.g. Sun Ultra I0s, Pentium 400)

- 4 - 6 Ultra-SCSI disks on 2 SCSI host adaptors

- a high-speed network (e.g.: ATM or 100 Mbit

ethernet)

• This configuration can deliver an aggregated data

stream to an application at about 500 Mbits/s (62 MBy/s) using these relatively low-cost, “off the shelf”

components by exploiting the parallelism of:

- five hosts,

- twenty disks,

- ten SCSI host adaptors

(28)

DPSS

♦ Sample Costs:

• server host = Sun Ultra 10S: $5000

- throughput = 11 - 14 MB/sec

• disk = 18 GB Ultra-wide Seagate Cheetah: $1500

• Note that cost is mainly dominated by disk price

TABLE 1.

Throughput Capacity Configuration Cost

10 MB/sec 33 GB 1 server, 2 disks $8K

50 MB/sec 165 GB 5 servers, 10 disks $40 K

50 MB/sec 1 TB 5 servers, 64 disks $121 K

100 MB/sec .5 TB 10 servers, 32 disks $98 K

(29)

Other Types of Distributed Storage

Other systems that provide distributed storage capabilities:

• AFS

• DCE / DFS

These system provide full file system functionality, plus security and scalablity.

But, they do not provide enough throughput for data intensive applications:

(30)

Next Generation Architecture

The next step is to add the following components to this data handling architecture:

• security • middleware - Globus • global naming - URN system - SRB from SDSC

(31)

Next Generation Architecture

Archival Storage

(HPSS, Unitree, Unix files, etc.)

“Network aware” Middleware broker (CORBA, Java RMI, SRB, etc.)

Application

(processing, analysis, visualization, etc.)

High Performance Data Intensive Computing Environment

Query Estimator

Placement Optimization Cache/Buffer

(e.g.: DPSS)

Object Data Base

(32)

Next Generation Architecture compute CPU CPU CPU CPU Mem SLAC compute CPU CPU CPU CPU Mem LBNL tertiary storage (HPSS)

Global Middleware Services

uniform access network cache QoS compute resources STAF Analysis) monitoring and management tertiary storage (ADSM) ANL compute CPU CPU CPU CPU Mem cache (DPSS) tertiary storage (HPSS) cache (DPSS)

(33)

(34)

Overview

♦ The Problem:

When building data intensive distributed services, we often observe unexpectedly low network throughput and/or high latency - the reasons for which are usually not obvious.

The bottlenecks can be in any of the following components:

- the applications

- the operating systems

- the device drivers, the network adapters on either the

sending or receiving host (or both)

- the network switches and routers, and so on

♦ The Solution:

Highly instrumented systems with precision timing information and analysis tools

(35)

Motivation

There are virtually no behavioral aspects of widely

distributed applications that can be taken for granted - they are fundamentally different from LAN-based distributed

applications.

• Techniques that work in the lab frequently do not work

in a WAN environment (even a testbed network) To characterize the wide area environment we have

developed a methodology for detailed, end-to-end,

top-to-bottom monitoring and analysis of significant events

involved in distributed systems data interchange.

• This has proven invaluable for isolating and correcting

performance bottlenecks, and even for debugging distributed parallel code.

(36)

NetLogger

♦ We have developed the NetLogger Toolkit: a set of tools

to aid in creating graphs that trace a data request throughout a distributed system.

• NetLogger makes it easy to modify distributed

applications to log interesting events at every critical point in a distributed system.

• NetLogger also includes tools for host and network

monitoring.

♦ The approach is novel in that it combines network, host,

and application-level monitoring to provide a complete view of the entire system

(37)

NetLogger

♦ Logged events are correlated with system behavior to

characters the performance of the system during actual operation

• facilitates bottleneck identification

♦ Using “life-lines” to visualize the data flow is the key to

easy interpretation of the results.

♦ We believe this type of monitoring is a critical

component to building reliable high performance data intensive systems

(38)

NetLogger End Processing Begin Processing End Read Begin Read Request data time NetLogger Event “Life-lines”

(39)

Netlogger Components

♦ Common log format

♦ Application libraries for generating NetLogger Messages

• Can send log messages to:

- file

- syslogd

- host/port (netlogd)

• C, C++ and Java are currently supported

♦ Event Visualization tools

♦ Management Agents

♦ Modified Unix network and OS monitoring tools to log

“interesting” events using the same log format

- netstat, vmstat, and tcpdump modified output results

(40)

NetLogger NetLogger log format:

We are using the IETF draft standard Universal Logger Message (ULM) format:

• a list of “field=value” pairs

• required fields: DATE, HOST, PROG, and LVL

- LVL is the severity level (Emergency, Alert, Error,

Usage, etc.)

• followed by optional user defined fields

NetLogger adds these required fields:

• NL.EVNT, a unique identifier for the event being logged. i.e.:

DPSS_SERV_IN, VMSTAT_USER_TIME, NETSTAT_RETRANSSEG

• NL.SEC, and NL.USEC, which are the seconds and

(41)

NetLogger Sample NetLogger ULM event:

DATE=19980430133038 HOST=foo.lbl.gov

PROG=testprog LVL=Usage NL.EVNT=SEND_DATA NL.SEC=893968238 NL.USEC=55784 SEND.SZ=49332

This says program named testprog on host foo.lbl.gov

performed event named SEND_DATA, size = 49332 bytes, at the time given.

User-defined data elements (any number) are used to store information about the logged event - for example:

NL.EVNT=SEND_DATA SEND.SZ=49332

- the number of bytes of data sent

NL.EVNT=NETSTAT_RETRANSSEGS NS.RTS=2

- the number of TCP retransmits since the previous

(42)

NetLogger NetLogger API:

Open calls:

NLhandle *lp = NULL; /* log to a local file */

lp = NetLoggerOpen(NL_FILE, program_name, log_filename, NULL, 0);

/* log to syslog */

lp = NetLoggerOpen(NL_SYSLOG, program_name, NULL, NULL, 0); /* log to “netlogd” on the specified host/port */

lp = NetLoggerOpen(NL_HOST, program_name, NULL, hostname, DPSS_NETLOGGER_PORT);

/* log to memory, then flush to host/port */

lp = NetLoggerOpen(NL_HOST_MEM, program_name, NULL, hostname, DPSS_NETLOGGER_PORT);

Write the log event:

NetLoggerWrite(lp, "EVENT_NAME", "F1=%d F2=%d F3=%d F4=%.2f", data1, data2, string, fdata);

(43)

NetLogger Sample Code:

/* log to a local file */

lp = NetLoggerOpen(method, progname, log_filename, NULL, 0); while (!done)

{

NetLoggerWrite(lp, "EVENT_START", "TEST.SIZE=%d", size); /* perform the task to be monitored */

done = do_something(data, size); NetLoggerWrite(lp, "EVENT_END"); }

(44)

NLV

NetLogger Visualization Tools

Exploratory, interactive analysis of the log data has proven to be the most important means of identifying problems.

We have developed a tool called nlv (NetLogger

Visualization).

nlv functionality:

• can display several types of NetLogger events at once

• user configurable: which events to plot, and the type of plot

to draw (lifeline, load-line, or point)

• play, pause, rewind, slow motion, zoom in/out, and so on

• nlv can be run post-mortem, or in “real-time”

Other NetLogger tools to analyze log files:

• perl scripts to extract information from log files • gnuplot to graph the results

(45)

(46)

(47)

Network Time Protocol

♦ For NetLogger timestamps to be meaningful, all systems

clocks must be synchronized.

♦ NTP is used to synchronize time of all hosts in the

system.

• NTP is from Dave Mills, U. of Delaware

(http://www.eecis.udel.edu/~ntp/)

♦ Must have NTP running on one or more primary servers,

and on a number of local-net hosts, acting as secondary time servers.

(48)

NTP

♦ Purpose of NTP

• conveys timekeeping information from the primary servers

to other time servers via the Internet

• cross-checks clocks and mitigates errors due to equipment

or propagation failures

♦ Host time servers will synchronize via another peer time

server, based on the following timing values:

• those determined by the peer relative to the primary

reference source of standard time

• those measured by the host relative to the peer

♦ NTP provides not only precision measurements of offset

and delay, but also definitive maximum error bounds, so that the user interface can determine not only the time, but the quality of the time as well.

(49)

Running NTP:

• All hosts run the xntpd daemon, which synchronizes

the clocks of each host both to GPS-based time servers and to each other.

• This allows us to synchronize the clocks of all hosts to

within about 250 microseconds of each other, but...

- systems have to stay up for a significant length of

time for the clocks to converge to 250 µs

- best to have a time server on the same network as all

hosts

- many different sys admins (harder to synchronize

than clocks)

• In practice, clock synchronization of 1ms is good

(50)

DPSS Monitoring Points DPSS master/ name trnslate Writer (output to net)

memory block cache

- recv blk list - search cache disk reader disk reader disk reader disk reader Client request blks receive blks TS-8 TS-1 TS-0 TS-3 TS-5 TS-6 TS-4 TS-7 TS = time stamp DPSS disk server TS-2 DPSS server DPSS server START from other disk servers

(51)

NetLogger Analysis of the DPSS DPSS life-lines:

• each line represents the history of a data block as it

moves through the end-to-end path

• data requests are sent from the application every 200

ms (the nearly vertical lines starting at app_send

monitor point)

• initial single lines fan out as the request lists are

(52)

NetLogger Analysis of the DPSS

♦ Analysis of the data block life-lines shows, e.g. (see next

figure):

A: if two lines cross in the area between start read and end

read, this indicates a read from one disk was faster than a

read from another disk

B: all the disks are the same type, the variation in read times are due to differences in disk seek times

C: average time to move data from the memory cache into the network interface is 8.65 ms

D: the average time in disk read queue is 5 ms

E: the average read rate from four disks is 8 MBy/sec

F: the average send rate (receiver limited, in this case) is 38.5 Mb/sec.

G: some requested data are found in the cache (were read from disk, but not sent in previous cycle because arrival of new request list flushes write queue)

(53)

NetLogger Analysis of the DPSS app_send master_in master_out server_in start_read end_read start_write app_receive TCP_retrans 8000 8200 B: fast disk read: 8 ms

C: 20 block average time to write

blocks to network: 8.65 ms

D: 20 block average time spent in

read queue: 5 ms

F: time for 20 blocks to get from one server

writer to the application reader total: 204 ms, avg: 10.2 ms 38.5 Mb/sec B: typical disk read: 22 ms Time (ms) Monitoring points “iss3.log” “iss2.log” G: cache hits (zero read time)

E: time to read 20 blocks from three disks

total:123 ms, avg: 6.15 ms 8 MBy/sec (63.7 Mb/sec) A net transit name xlate net transit read queue disk read write queue net transit length of the “pipeline” (≈ 60 ms)

(current servers are more than twice this rate)

(current value is about 30ms)

(54)

Two server, ATM LAN Event lifelines

server_in start_read end_read read queue disk read and cache search four parallel disk reads initiated completion of one read triggers the next one

B: fast disk read:

8 ms B: typica 22 fast read milliseconds behavior of one, specific disk

(55)

Other NetLogger Results: A DPSS scaling experiment:

• 10 clients accessing 10 different data sets

simultaneously

• show that all clients get an equal share of the DPSS

(56)

Correct operation of 10 parallel (simultaneous) processes reading 10 different data sets from one DPSS (each row is one process, each group is a request for 10 Mbytes of data,

0 10000 ms 20000 ms 30000 ms 40000 ms 50000 ms 60000 ms

(57)

A Case Study: TCP in the Early MAGIC Testbed The Problem:

♦ DPSS and TerraVision were working well in a LAN

environment, but failing to deliver high (even medium!) data rates to TerraVision when operated in the MAGIC WAN.

♦ We suspected that the ATM switches were dropping

cells, but they reported no cell loss.

♦ Network engineers claimed that the network was working

“perfectly”. For testing they were using network

performance tools such at ttcp and netperf, which are

somewhat useful, but don’t model real distributed

applications, which are complex, bursty, and have more than one connection in and/or out of a given host at one time.

(58)

MAGIC

Control test: 2 servers over ATM LAN

app send master in master out server in start read end read start write app receive TCP retrans TV cache in TV cache out 8000 8200 8400 8600 8800 9000 920 event number time (ms) “iss3.log” “iss2.log”

(59)

MAGIC

♦ TerraVision image tiles are distributed across DPSS

servers on the MAGIC network at the following sites:

- EROS Data Center, Sioux Falls, SD; - Sprint, Kansas City, MO;

- University of Kansas, Lawrence, KS; - SRI, Menlo Park, CA;

(60)

MAGIC MAGIC WAN experiment:

♦ Three disk server configuration DPSS gave the results

shown below:

♦ What was happening in this experiment is that TCP’s

normal ability to accommodate congestion is being defeated by an unreasonable network configuration:

• the final ATM switch where the three server streams come together had a

per port output buffer of only about 13K bytes

• the network MTU (minimum transmission unit) is 9180 Bytes (as is typical

for ATM networks)

• three sets of 9 KBy IP packets were converging on a link with less than

50% that amount of buffering available, resulting in most of the packets (roughly 65%) being destroyed by cell loss at the switch output port

The next generation of ATM switches (e.g.: Fore LC switch modules) had much more buffering: 32K cells (1500 KB), and this problem went away.

(61)

MAGIC

The MAGIC Network Performance Test Configuration

Ft. Leavenworth, KS Lawrence, KS Kansas City , KS Sioux Falls, SD Minneapolis, MN X Sprint TIOC OC-48 OC-12 OC-12 Sprint SONET Network _{700 Km.} ~15 ms latenc y inc luding tw o A T M s witc hes OC-48 ATM switch DPSS (sender 3) Receiver DPSS DPSS master ATM switch NTP server DPSS (sender 1) NTP server ATM switch ATM switch

(62)

MAGIC time (ms) 3,000 4,000 5,000 1,000 2,000 0 master in app send app receive TCP retrans master out server in start write end read start read

Three servers, ATM WAN, SS-10s as servers, tv_sim on SGI Onyx

“tvlog.edc” “tvlog.uswest” “tvlog.tioc” “edc.serv_flush.log” “tioc.serv_flush.log” “uswest.serv_flush.log” “edc.net.tcp.retrans.log” “tioc.net.tcp.retrans.log” “uswest.net.tcp.retrans.log”

(63)

(64)

For more information see:

http://www-didc.lbl.gov/DPSS