• No results found

Unified Performance Data Collection with Score-P

N/A
N/A
Protected

Academic year: 2021

Share "Unified Performance Data Collection with Score-P"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Unified Performance Data Collection with

Score-P

Bert Wesarg1)

With contributions from

Andreas Knüpfer1), Christian Rössel2), and Felix Wolf3) 1)ZIH TU Dresden, 2)FZ Jülich, 3)GRS-SIM Aachen

(2)

SC‘12: Workshop on Extreme-Scale Performance Tools

Fragmentation of Tools Landscape

• Several performance tools co-exist

• Separate measurement systems and output formats • Complementary features and overlapping functionality • Redundant effort for development and maintenance • Limited or expensive interoperability

• Complications for user experience, support, training

Vampir VampirTrace OTF Scalasca EPILOG / CUBE TAU TAU native formats Periscope Online measurement

(3)

SC‘12: Workshop on Extreme-Scale Performance Tools

SILC Project Idea

• Start a community effort for a common infrastructure

– Score-P instrumentation and measurement system

– Common data formats OTF2 and CUBE4

• Developer perspective:

– Save manpower by sharing development resources – Invest in new analysis functionality and scalability

– Save efforts for maintenance, testing, porting, support, training

• User perspective:

– Single learning curve

– Single installation, fewer version updates – Interoperability and data exchange

• SILC project funded by BMBF

• Close collaboration PRIMA project funded by DOE

(4)

SC‘12: Workshop on Extreme-Scale Performance Tools

Partners

• Forschungszentrum Jülich, Germany

• German Research School for Simulation Sciences, Aachen, Germany

• Gesellschaft für numerische Simulation mbH Braunschweig, Germany

• RWTH Aachen University, Germany

• Technische Universität Dresden, Germany • Technische Universität München, Germany • University of Oregon, Eugene, USA

(5)

SC‘12: Workshop on Extreme-Scale Performance Tools

Score-P Functionality

• Provide typical functionality for HPC performance tools • MPI, OpenMP, and hybrid parallelism (and serial)

• Enhanced functionality (OpenMP 3.0, CUDA, highly scalable I/O)

• Instrumentation (various methods)

• Flexible measurement without re-compilation:

– Basic and advanced profile generation

– Event trace recording

– Online access to profiling data

(6)

SC‘12: Workshop on Extreme-Scale Performance Tools

Non-functional requirements

• Portability: all major HPC platforms • Scalability: petascale

• Low measurement overhead

• Easy and uniform installation through UNITE framework • Robustness and QA

(7)

SC‘12: Workshop on Extreme-Scale Performance Tools

Score-P Architecture

Score-P instrumentation wrapper

Application (MPI×OpenMP×CUDA)

Vampir Scalasca TAU Periscope

Compiler Compiler OPARI 2 POMP2 CUDA CUDA User User PDT TAU

Score-P measurement infrastructure

Event traces (OTF2) Call-path profiles (CUBE4, TAU)

Online interface

Hardware counter (PAPI, rusage)

PMPI

(8)

SC‘12: Workshop on Extreme-Scale Performance Tools

Score-P Release History and Roadmap

• 1.0 (Jan 2012)

– Baseline with MPI and OpenMP support

– Bugfix releases: 1.0.1 (Apr 2012) and 1.0.2 (Jun 2012)

• 1.1 (Oct 2012)

– Basic CUDA support – OpenMP Task profiling – ARM support

• 1.2 (Jun 2013)

– New generic trace events for RMA records (will be used for MPI 3.0 one-side, ARMCI, SHMEM, CUDA, any PGAS)

– HMPP, OmpSs, UPC? – Pthreads

– I/O?

(9)

SC‘12: Workshop on Extreme-Scale Performance Tools

Future Features and Management

• Scalability to maximum available CPU core count • Support for sampling, binary instrumentation

• Support for new programming models, e.g. PGAS • Support for new architectures

• Ensure a single official release version at all times which will always work with the tools

• Allow experimental versions for new features or research • Commitment to joint long-term cooperation

(10)

SC‘12: Workshop on Extreme-Scale Performance Tools

Performance Tools

Right now, the following performance tools support Score-P:

• Periscope – online bottleneck search

• Scalasca – call path profiling and wait-state analysis

• Vampir – trace visualization

(11)

SC‘12: Workshop on Extreme-Scale Performance Tools

• Scalable distributed tree-like architecture • Iterative on-line profiling using Score-P • Automatic search for performance

inefficiencies:

– MPI wait states – Single core stalls

– OpenMP overheads

– OpenMP scalability

• New BSD license

www.lrr.in.tum.de/periscope

(12)

SC‘12: Workshop on Extreme-Scale Performance Tools

• Scalable performance-analysis toolset for parallel codes • Automatic event trace analysis to identify potential

performance bottlenecks

– Focus on communication & synchronization

• Programming models

– MPI, OpenMP

– Future: support for PGAS and accelerators

www.scalasca.org

(13)

SC‘12: Workshop on Extreme-Scale Performance Tools

Vampir – Interactive Performance Analysis

Highly scalable interactive performance analysis with event trace visualization • As complement to

automatic analysis • Allows to explore

and examine all

aspects of dynamic parallel behavior

• Very scalable Server version

(14)

SC‘12: Workshop on Extreme-Scale Performance Tools

TAU Performance System®

• Profiling and tracing toolkit with automatic instrumentation, measurement and analysis support

• Uses Score-P for profiling and tracing using native OTF2 generation capability

• Fortran, C++, C, UPC, Java, Python, Chapel

• MPI, Pthreads, OpenMP, CUDA, OpenCL, OpenSHMEM, and a combination of threads and MPI

• Goal: to support all HPC platforms, compilers and runtime systems. Widely ported.

• 3D profile browser, ParaProf, data mining and cross experiment analysis tool, PerfExplorer, and performance database technology (PerfDMF)

• Interfaces with Score-P, PAPI, Vampir, MAQAO, and Scalasca • BSD style license

(15)

SC‘12: Workshop on Extreme-Scale Performance Tools

The Whole Ensemble

CUBE4 report Online interface Instr. target application Score-P PAPI Scalasca wait-state analysis OTF2 traces TAU PerfExplorer

Periscope TAU ParaProf

CUBE

(16)

SC‘12: Workshop on Extreme-Scale Performance Tools

The Whole Ensemble

CUBE4 report Online interface Instr. target application

Score-P

PAPI Scalasca wait-state analysis CUBE4 report OTF2 traces TAU PerfExplorer

Periscope TAU ParaProf

CUBE

(17)

SC‘12: Workshop on Extreme-Scale Performance Tools

Extreme Scale Performance Monitoring

• We need an extreme-scale performance monitor for extreme-scale performance tools

• This can’t be done alone • So …

(18)

SC‘12: Workshop on Extreme-Scale Performance Tools

Visit

www.score-p.org

References

Related documents

This research paper brings forward a non-standard convertible zero-coupon bond endowed with a set of distinctive features attached to it so as to strengthen the

In view of this, Nigeria should chose between encouraging the mad rush for university paper qualifications otherwise known as degrees and the proliferation of

Thus, agents buy commodities in spot markets and receive signals from commodity prices and asset payments, that allow them to improve the private information about the realization

Per cent of regional GDP 0 10 20 30 40 50 Arctic Russia Arctic Finland Arctic Sweden Arctic Norway Green- land Faroe Islands Iceland Arctic Canada Alaska 100 Per cent Other

Keywords: Cooperative Extension programs, Cooperative Extension, Extension, Health and Wellness Framework, ECOP Action Teams, health, public health, health care

For this reason, the initial values of the process parameters such as the input and output thicknesses, rolling speeds (AV) and thickness reductions (R) are extracted under

[r]

How much integration of residential and non-residential land uses is visible in this segment?. No integration A little integration Some integration A lot of integration Level