• No results found

Chapter 2. Architecture and technical overview

2.1 The IBM POWER7 processor

2.1.2 POWER7 processor core

Each POWER7 processor core implements aggressive out-of-order (OoO) instruction execution to drive high efficiency in the use of available execution paths. The POWER7 processor has an Instruction Sequence Unit that is capable of dispatching up to six

instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to the Instruction Execution units. The POWER7 processor has a set of twelve execution units as follows:

򐂰 2 fixed point units

򐂰 2 load store units

򐂰 4 double precision floating point units

򐂰 1 vector unit

򐂰 1 branch unit

򐂰 1 condition register unit

򐂰 1 decimal floating point unit

The caches that are tightly coupled to each POWER7 processor core are:

򐂰 Instruction cache: 32 KB

򐂰 Data cache: 32 KB

򐂰 L2 cache: 256 KB, implemented in fast SRAM

Technology POWER7 processor

Die size 567 mm2

Fabrication technology 򐂰 45 nm lithography

򐂰 Copper interconnect

򐂰 Silicon-on-Insulator

򐂰 eDRAM

Components 1.2 billion components (transistors) offering the equivalent function of 2.7 billion (For further details see 2.1.6, “On-chip L3 cache innovation and Intelligent Cache” on page 32)

Processor cores 8 Max execution threads core/chip 4/32

L2 cache core/chip 256 KB/2 MB On-chip L3 cache core/chip 4 MB/32 MB DDR3 memory controllers 2

SMP design-point 32 sockets with IBM POWER7 processors Compatibility With prior generation of POWER processor

2.1.3 Simultaneous multithreading

An enhancement in the POWER7 processor is the addition of the SMT4 mode to enable four instruction threads to execute simultaneously in each POWER7 processor core. Thus, the instruction thread execution modes of the POWER7 processor are as follows:

򐂰 SMT1: single instruction execution thread per core

򐂰 SMT2: two instruction execution threads per core

򐂰 SMT4: four instruction execution threads per core

SMT4 mode enables the POWER7 processor to maximize the throughput of the processor core by offering an increase in processor-core efficiency. SMT4 mode is the latest step in an evolution of multithreading technologies introduced by IBM. The diagram in Figure 2-3 shows the evolution of simultaneous multithreading.

Figure 2-3 Evolution of simultaneous multithreading

The various SMT modes offered by the POWER7 processor allow flexibility, enabling users to select the threading technology that meets an aggregation of objectives such as

performance, throughput, energy use, and workload enablement.

Intelligent Threads

The POWER7 processor features

Intelligent Threads

that can vary based on the workload demand. The system either automatically selects (or the system administrator can manually select) whether a workload benefits from dedicating as much capability as possible to a single thread of work, or if the workload benefits more from having capability spread across two or four threads of work. With more threads, the POWER7 processor can deliver more total capacity as more tasks are accomplished in parallel. With fewer threads, those workloads that need very fast individual tasks can get the performance they need for maximum benefit.

Σ

Multithreading evolution

Thread 1 Executing

Thread 0 Executing No Thread Executing

FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL

1995 Single thread out of order

FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL 1997 Hardware multithreading FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL 2003 2 Way SMT FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL 2009 4 Way SMT Thread 3 Executing Thread 2 Executing

2.1.4 Memory access

Each POWER7 processor chip has two DDR3 memory controllers each with four memory channels (enabling eight memory channels per POWER7 processor). Each channel operates at 6.4 Gbps and can address up to 32 GB of memory. Thus, each POWER7 processor chip is capable of addressing up to 256 GB of memory.

Figure 2-4 gives a simple overview of the POWER7 processor memory access structure.

Figure 2-4 Overview of POWER7 memory access structure

2.1.5 Flexible POWER7 processor packaging and offerings

POWER7 processors have the unique ability to optimize to various workload types. For example, database workloads typically benefit from very fast processors that handle high transaction rates at high speeds. Web workloads typically benefit more from processors with many threads that allow the break down of Web requests into many parts and handle them in parallel. POWER7 processors uniquely have the ability to provide leadership performance in either case.

POWER7 processor 4-core and 6-core offerings

The base design for the POWER7 processor is an 8-core processor with 32 MB of on-chip L3 cache (4 MB per core). However, the architecture allows for differing numbers of processor cores to be active; 4-cores or 6-cores, as well as the full 8-core version.

The L3 cache associated with the implementation is dependant on the number of active cores. For a 6-core version, this typically means that 6 x 4 MB (24 MB) of L3 cache is

Note: In some POWER7 processor-based systems, one memory controller is active with

four memory channels being used.

Advanced Buffer ASIC Chip Memory Controller Memory Controller

POWER7 processor chip

Dual integrated DDR3 memory controllers

High channel and DIMM utilization

Advanced energy management

RAS advances

Eight high-speed 6.4 Gbps channels

New low-power differential signalling

New DDR3 buffer chip architecture

Larger capacity support (32 GB/core)

Energy management support

RAS enablement

DDR3 DRAMs

P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core

Advanced Buffer ASIC Chip Advanced Buffer ASIC Chip Memory Controller Memory Controller Memory Controller Memory Controller

POWER7 processor chip

Dual integrated DDR3 memory controllers

High channel and DIMM utilization

Advanced energy management

RAS advances

Eight high-speed 6.4 Gbps channels

New low-power differential signalling

New DDR3 buffer chip architecture

Larger capacity support (32 GB/core)

Energy management support

RAS enablement

DDR3 DRAMs

P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core

Optimized for servers

The POWER7 processor forms the basis of a flexible computer platform and can be offered in a number of guises to address differing system requirements.

The POWER7 processor can be offered with a single active memory controller with four channels for servers where higher degrees of memory parallelism are not required. Similarly, the POWER7 processor can be offered with a variety of SMP bus capacities appropriate to the scaling-point of particular server models.

Figure 2-5 shows the various physical packaging options that are supported with POWER7 processors.

Figure 2-5 Outline of the POWER7 processor physical packaging