• No results found

HPC data becomes Big Data. Peter Braam

N/A
N/A
Protected

Academic year: 2021

Share "HPC data becomes Big Data. Peter Braam"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

HPC data becomes Big Data

Peter Braam

(2)

me

1983 - 2000 Academia

• Maths & Computer Science

Entrepreneur with startups (5x)

• 4 startups sold • Lustre emerged

• Held executive jobs with acquirers

2014 – Independent, advise, research

(3)

Contents

§ Introduction market & key questions

§ Some Big Data problems & Algorithms

§ HPC storage

§ Cloud storage

(4)
(5)

Two Questions

§ Given an HPC storage system, how can it be used

for Big Data Analysis?

§ What storage platforms are candidates to meet

(6)

IDC market data

Fact 2011 2013

% of sites using co-processors 28.2% 76.9% HPC sites performing big data analysis 67% % of compute cycles dedicated to big data 30% % of sites using cloud infrastructure for HPC 18.8% 23.5% Year over year growth in high density

servers ($)

25.5% Year over year growth in servers ($) -6.2%

(7)

Other facts

§ Flash and much faster persistent memory tiers are

inevitably coming.

§ Multiple software challenges arise from this

§ Management of tiers

§ Much faster storage software to keep up with devices

§ Gap between disk and other system performance

continues to increase

§ There is “embedded” processing on servers with

attached storage and client-server processing with clients networked to servers.

(8)
(9)

Big Data Problems – samples

§ Input generally from simulation or sensors

§ Climate modeling – simulate then …

§ Find the hottest day each year in Cape Town

§ Find very low pressure spots (typhoons) on Earth

§ Genomics, Astronomy

§ Find patterns (e.g. strings, galaxies) in huge data sets

§ Pre-process data at TB/sec rates

§ Data management

(10)

Big Data Problems – samples 2

§ Social network, advertising & intelligence

§ Most of these become graph problems, some very hard

§ Non-compliance in stock market transaction logs

§ Replace legacy consumer information data

warehousing with modern analytics

§ Replacements of Teradata / Netezza sometimes difficult

(11)

Wide variations

§ Some problems (e.g. some graph problems) must

be executed in RAM. Graph500 benchmark

2000x speedup in 2.5 years

§ Other problems require many iterations through

disk-resident data

§ Netezza analytics systems use FPGA’s for

(12)

Big Data Algorithms

§ Considerable variation

§ Machine learning

§ Bayesian analysis

§ Indexing, sorting – DB like

§ Graph algorithms

§ Maximal Information Coefficients – generalize

regressions

§ Compressed sensing (aka sparse recovery)

§ Topological Data Analysis

(13)

Ogres …

§ Analogously to Berkeley Dwarfs big data problems

have been classified: see

Understanding Big Data

Applications and Architectures 1st JTC 1 SGBD Meeting

SDSC San Diego March 19 2014 Geoffrey Fox

Judy Qiu

(14)

So…

Given these variations a single architecture is not likely to address all big data problems well.

(15)
(16)

HPC data

§ Traditional model – cluster file system and

§ Single Shared File (with # cores readers / writers)

§ File Per Process (and 1 process per core …)

§ Tightly coupled problems allow little scheduling of

“tasklets” or redistribution of I/O

§ Problems…

§ Throughput == #server nodes x (speed of slowest node)

(17)

Results quite reasonable

§ Systems like Lustre, GPFS, Panasas

§ Use carefully configured and tested hardware

§ Fast networks

§ Deliver 80% of slowest hardware component

§ Pipelines from clients to disk are uniformly wide

§ Servers can deliver ~3GB/sec / controller

§ Achilles heels:

§ Metadata

§ Availability

(18)

A sample of hard cases…

First write then read. Why the gap? Opening & creating files is too slow. Should run >2x faster!

First seen at ORNL in 2006.

Metadata performance on Sequoia and on Cove (50 & 5 SSD drives) Low 1000’s to ~15K ops / sec Maximum seen ever ~50K ops

(19)

HPC hard cases ctd

Larger numbers of concurrent metadata clients are not easy.

Conclusion:

1.  Problems systems like Lustre remain 2.  Sensitivity to uniformly good hardware

3.  Honest data from the users & understanding exists 4.  It has been used at very large scale

(20)

Cloud data into HPC file system

§ Intel’s FastForward project

§ Ingest massive ACG graphs through Hadoop

§ Represent ACG using an HDF5 adaptation layer (HAL) & in Lustre

DAOS objects.

§ Then compute.

Acknowledgement: Figure from Intel’s hpdd.intel.com wiki

(21)
(22)

Hybrid solutions may be best

TACC “Wrangler” system

§ Big Data “companion” to Stampede

§ DSSD storage is PCI connected and has KV interface

§ 120 node Dell cluster with DSSD storage

275M IOPS

Undoubtedly

§ This will solve many big data problems well

§ There will be problems that don’t fit or for which

(23)

Typical Cloud Storage

§ Combines

§ memcached

§ key value stores or DB’s

§ Relational, Distributed Key Value, Embedded Key Value

§ MySql, Cassandra / Hbase, Rocksdb / LevelDB

§ object stores (swift, CEPH, …)

§ Results

§ Read heavy loads from one cluster

§ 100’s of servers serving 10M’s of requests/sec

§  Only the embedded DBs keep up with flash and NVRam

§ Flash means: ~10us / read or write, RAM means <1us

(24)

Manageability

§ AWS elastic cloud – master piece

§ Open source solutions do similar

(25)

Tiered storage

When is tiered storage important?

§ For HPC dumping RAM requires flash cache

§ Likely of increased importance:

§ L1,2,3 – PCM – Flash – Disk – Tape

Tiered storage can use container concept

§ Cache misses fetch a container to faster memory

§ High bandwidth transfers container relatively quickly

§ One time latency – e.g. 1 sec

§ Then speed of faster tiers

(26)

Cloud object stores - CEPH

§ Object is file with an “id” not with a name

§ CEPH manages

§ Removal and addition of storage

§ Failed nodes, racks

§ Quite clever load balancing and data placement

(27)

Cloud objects still to demonstrate

§ HPC bandwidth == #nodes x BW/node

§ only limited testing at scale, no models

§ Not yet clear: how it integrates with tiered storage

(28)

Data layout - placement

§ How to place many stripes?

§ Bottleneck in RAID arrays:

§ Rebuild a drive goes at rate of BW of 1 drive – takes days

§ Parity de-clustering & distributed spare

§ Rebuild at BW of N drives (N = 60 / 600 / 6000?)

§ For e.g. 10+2 redundancy, speedup 60/10, 600/10, etc.

§ Benefit is large 5x – 100x+

§ Algorithms & math is hard: block mappings

(29)

Data layout – erasure codes

§ How to rebuild a single stripe faster

§ Generalizes RAID, Solomon-Reed codes etc.

§ Benefits stripe reconstruction I/O 1-2x

§ Tons of attention and publications

§ If the network is the slowest component this is

(30)
(31)

Conclusions

§ There are many Big Data algorithms

§ There are many cloud storage solutions

§ Big data on HPC – several vendors

§ New specialized solutions (DSSD)

§ More attention for modeling the problems & solutions

(32)

References

Related documents

As we are testing Active Safety Systems, the test scenario will be designed to use the Test Targets to force a particular behaviour from the VUT however we may inadvertently

For pediatric patients, IV attempts should be considered if the patient is presenting with signs and symptoms of dehydration, in need of medications, or in critical condition and is

online student persistence, such as perceived sense of community , social presence , learners' satisfaction , and learner participation and interaction, are integral aspects

levels in water samples consumed by animals in Iğdır It is reported that F occurs enormous amounts in province, Turkey (north of Mount Ağrı), and also to volcanic materials

It will ensure record of all rented machineries and equipment as per project with help of proper daily based data entry.. Resources includes mainly three

This attitude is seen in other countries’ sexuality education programs: For example, youths in other industrialized countries receive unambiguous messages that sexual activity is

BATTERY Green LED flashes (with horn) Horn “chirps” about once a minute ALARM CONDITION Interconnected Series of Smoke/CO Alarms Smoke or CO Red LED flashes rapidly on the unit

 Presented at the Annual Latino Enhancement Cooperative’s Latino Leadership Conference at Indiana University to provide pre-collegiate preparation seminars for high school