• No results found

Data Centric Systems (DCS)

N/A
N/A
Protected

Academic year: 2021

Share "Data Centric Systems (DCS)"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

© 2013 IBM Corporation

Data Centric Systems (DCS)

Architecture and Solutions for High Performance Computing, Big Data

and High Performance Analytics

High Performance Computing with Data Centric Systems

1

(2)

© 2013 IBM Corporation

Data Centric Systems for a new Era

2

(3)

© 2013 IBM Corporation

Powerful Information: Big Data and the New Era of Computing

Dimensions of data growth

Terabytes to exabytes of existing data to process

Structured, unstructured, text, multimedia

Streaming data, milliseconds to

seconds to respond

Uncertainty

from inconsistency, ambiguities, etc.

Variety

Volume Velocity

Veracity

9000 8000 7000 6000 5000 4000 3000

Data volume is on the rise

2010

Volume in Exabytes

Sensors

& Devices

VoIP Enterprise Data Social Media

2015

Big Data and Exascale High Performance Computing are driving many similar

computer systems requirements

3

(4)

© 2014 International Business Machines Corporation 4

Data is Becoming the World’s New Natural Resource

• 1 trillion connected objects

and devices on the planet

generating data by 2015

• 2

.

5 billion gigabytes of data

generated every day

• Data is the new basis of

competitive advantage

(5)

© 2014 International Business Machines Corporation 5

The Challenge is to move from Data to Insight to Decisions

(6)

© 2013 IBM Corporation

Driver: Doing more with models – Real-time decision making – Uncertainty quantification – Sensitivity analysis

– Metadata extraction Driver: Enhanced context

Improves decision making – Incorporate modeling and

simulation for better predictions – Incorporate sensor data

High Performance Analytics

•Unstructured data

•Giga- to petascale data

•Primarily data mining algorithms

High Performance Computing

• Structured data

• Exascale data sets

• Primarily scientific calculations

Evolving

requirements

Big Data Driving Common Requirements

6

Data Centric Systems

(7)

© 2013 IBM Corporation

Architecture

7

(8)

© 2013 IBM Corporation

Big Data Requires a Data-Centric Architecture

Massive Parallelism Persistent Memory

Major impact on hardware, systems software, and application design

Data lives in persistent memory Many CPU’s surround and use Shallow/Flat storage hierarchy

Data-Centric Model

Data lives on disk and tape Move data to CPU as needed Deep storage hierarchy

Compute-Centric Model

input output

(9)

© 2013 IBM Corporation IBM Research

Data Centric Design Principles

Principle 3: Modularity

Principle 2: Compute Everywhere

Principle 4: Application-driven design Principle 1: Minimize data motion

9

Massive data requirements drive a composable architecture for big data, complex

analytics, modeling and simulation. The DCS architecture will appeal to segments

experiencing an explosion of data and the associated computational demands

Principle 5: Leverage OpenPower to Accelerate Innovation

(10)

© 2013 IBM Corporation IBM Research

10

OpenPOWER Foundation

MISSION : The OpenPOWER Consortium’s mission is to

create an open ecosystem, using the POWER Architecture to

share expertise, investment and validated and compliant

server-class IP to serve the evolving needs of customers.

– Opening the architecture to give the industry the ability to innovate across the full Hardware and Software stack

• Includes SOC design, Bus Specifications, Reference Designs, FW OS and Hypervisor Open Source

– Driving an expansion of enterprise class Hardware and Software stack for the data center

– Building a vibrant and mutually beneficial ecosystem for POWER

POWER

CPU GPU Tesla

+

Example:

(11)

© 2013 IBM Corporation

IBM Research

Building collaboration and innovation at all levels

Welcoming new members in all areas of the ecosystem

100+ inquiries and numerous active dialogues underway 30th member just signed

Boards / Systems

I/O / Storage / Acceleration

Chip / SOC

System / Software / Services

Implementation / HPC / Research

(12)

© 2013 IBM Corporation

DCS System Attributes

 Commercially Viable System

 High Performance Analytics (HPA) and HPC markets

 Scale from sub-rack to 100+ rack systems

 Composed of cost effective components

 Upgradeable and modular solutions

 Holistic System Design

 Storage, network, memory, and processor architecture

and design

 Market demands, customer input, and workflow

analysis influence design

 Extensive Software Stack

 Compiler, tool chain, and ecosystem support

 O/S, virtualization, system management

 Evolutionary approach to retain prior investments

 System Quality

 Reliability, availability, serviceability

New Data Centric Model

12

(13)

© 2013 IBM Corporation 13

Software

(14)

© 2013 IBM Corporation

Software Challenges Beyond 2017

 Exascale systems must address science and analytics mainstreams

– Broad range of users and skills

– Range of computation requirements: dense compute, viz, etc

 Applications will be workflow driven

– Workflows involve many different applications

– Complex mix of algorithms,

– Strong Capability Requirements

– UQ, Sensitivity Analysis, Validation and Verification elements

– Usability / Consumability

 Unique large scale attributes must also be addressed

– Reliability

– Energy efficiency

 Underlying data scales pose significant challenge which complicates systems

requirements

– Convergence of analytics, modeling, simulation, visualization, and data

management

14

Data Centric Model

(15)

IBM Research

© 2012 IBM Corporation

Role of the Programming Model in Future Systems

 Recent programming models focus is on node level complexity

 Computation and control are the primary drivers

– Data and communication are secondary considerations

– Little progress on system wide programming models

• MPI, SHMEM, PGAS . . .

 Future, data centric systems will be workflow driven, with computation

occurring at different levels of the memory and storage hierarchy

– New and challenging problems for software and application

developers

 Programming models must evolve to encompass all aspects of the data

management and computation requiring a data-centric programming

abstraction where:

– Compute, data and communication are equal partners

– High level abstraction will provide user means to reason about data

and associated memory attributes

– There will be co-existence with lower level programming models

Data Centric Model

(16)

IBM Research

© 2012 IBM Corporation

 Refine the node-level model and extend to support system wide

elements

 Separate programming model concerns from language concerns

 Revisit holistic language approach – build on the ‘endurance’ of

HPCS languages and evolve the concepts developed there:

– Places, Asynchrony ..

 Investigate programming model concepts that are applicable across

mainstream and HPC

 Focus on delivery through existing languages:

– Libraries, runtimes, compilers and tools

 Strive for cross vendor support

Pursue open collaborative efforts – but in the context of real

integrated system development

Some Strategic Directions – With Pragmatic Choices

Data Centric Model

(17)

© 2013 IBM Corporation

Workflows

17

(18)

© 2010 IBM Corporation IBM Research

Establishing Workload-Optimized Systems Requirements

Region defined by LINPACK

Conceptual View of Data Intensive with

Floating Point Workflow Data Centric

Applications

Compute Centric Applications Floating Point

OPS

Integer OPS Low Spatial

Locality

High Spatial Locality

Conceptual View of Data Intensive-Integer Workflow Conceptual View of Data

Fusion/Uncertainty Quantification Workflow

(19)

© 2013 IBM Corporation 19

DCS Workflows: mixed compute capabilities required

12/9/13

All Source Analytics Oil and Gas

Science

Financial Analytics

4-40

Racks

2-20

Racks

1-5+ Racks

1-100’s Racks

Analytics Reservoir

Graph Analytics

Throughput

Seismic

Value At Risk

Image Analysis

Capability

Massively Parallel Compute Capability:

• Simple kernels,

• Ops dominated (e.g.

DGEMM, Linpack)

• Simple data access patterns.

• Can be preplanned for high performance.

Analytics Capability:

• Complex code

• Data Dependent Code Paths / Computation

• Lots of indirection / pointer chasing

• Often Memory System Latency Dependent

• C++ templated codes

• Limited opportunity for vectorization

• Limited scalability

• Limited threading opportunity

(20)

© 2013 IBM Corporation

Degrees of Parallelism in global scope

Socket DRAM

In cr ea sing A dd re ss Sc op e

Increasing Parallelism

High Bandwidth Memory Local Store

Register / Vector File System addressable Memory

SMP Memory

Disjoint Address Spaces:

I/O space

Accelerator space

Global Names:

S/W conventions for Object naming Example K/V stores

Global Address:

H/W conventions

Coherency mechanisms

Single Thread Vector SIMD with Predication

Single Thread with SIMD Multi Thread SMP Cluster

Distributed Computing:

Active Memory

Active Communication Active Storage

(21)

© 2013 IBM Corporation

 The ability to generate, access, manage and operate on ever larger amounts and varieties of data is changing the nature of traditional technical computing - Technical Computing is evolving, the market is changing

 The extraction from Big Data of meaningful insights, enabling real time and predictive decision making across a range of industrial, commercial, scientific and government domains, requires similar computation techniques that have traditionally been characteristic of Technical Computing

– The era of Cognitive Supercomputers:

• Convergence in many future workflow requirements

– Big Data driven analytics, modeling, visualization, and simulation

 A time of significant disruption - Data is emerging as the “critical” natural resource of this century – An optimized full system design and integration challenge

 Innovation in multiple areas, building off of an open ecosystem working with our OpenPOWER partners:

– System architecture and design, modular building blocks – Hardware technology

– Software enablement – Best in class resilience

 Development of DCS systems in close collaboration with technology providers and partners in multiple areas with focus on:

– Workload-driven co-design

– New hardware and software technologies – Programming models and tools

– Open-source software

21

Summary – Data Centric Systems

References

Related documents

At 20 ° C, some molybdenum compounds dissolve in water, such as ammonium molybdate, ammonium paramolybdate, and sodium molybdate; others are less soluble, such as

The directory outlines mental health services that are commissioned by Tower Hamlets Primary Care Trust and the London Borough of Tower Hamlets, this includes support provided

Target market of the „European PV Storage Market Insights 2014“ are markets with an attractive small system market, which offer potential for storage solutions for new customers

First, we demonstrate that under certain structural assumptions on the design and regression matrix B ∗ , the multivariate group Lasso is always guaranteed to outperform the

AIA California Council, Urban Design Merit Award for Jinjiang River Corridor Master Plan, China.. Los Angeles Business Council, Award of Excellence, New Buildings –

Temporomandibular disorders are often defined on the basis of signs and symptoms, of which the most common are: temporomandibular joint sounds, impaired movement of the

Participants: Erik Gant (Ph.D., freelance critic and Acting Executive Secretary for The Arctic Council Indigenous Peoples’ Secretariat, Copenhagen), Maryam Jafri (artist based

While the current study, to our knowledge, is the first to examine parental mental health in the context of family acculturation, these findings are consistent with previous