Chapter 2. Architecture and technical overview
2.1 The IBM POWER7 processor
2.1.2 POWER7 processor core
Each POWER7 processor core implements aggressive out-of-order (OoO) instruction execution to drive high efficiency in the use of available execution paths. The POWER7 processor has an Instruction Sequence Unit that is capable of dispatching up to six
instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to the Instruction Execution units. The POWER7 processor has a set of twelve execution units as follows:
2 fixed point units
2 load store units
4 double precision floating point units
1 vector unit
1 branch unit
1 condition register unit
1 decimal floating point unit
The caches that are tightly coupled to each POWER7 processor core are:
Instruction cache: 32 KB
Data cache: 32 KB
L2 cache: 256 KB, implemented in fast SRAM
Technology POWER7 processor
Die size 567 mm2
Fabrication technology 45 nm lithography
Copper interconnect
Silicon-on-Insulator
eDRAM
Components 1.2 billion components (transistors) offering the equivalent function of 2.7 billion (For further details see 2.1.6, “On-chip L3 cache innovation and Intelligent Cache” on page 32)
Processor cores 8 Max execution threads core/chip 4/32
L2 cache core/chip 256 KB/2 MB On-chip L3 cache core/chip 4 MB/32 MB DDR3 memory controllers 2
SMP design-point 32 sockets with IBM POWER7 processors Compatibility With prior generation of POWER processor
2.1.3 Simultaneous multithreading
An enhancement in the POWER7 processor is the addition of the SMT4 mode to enable four instruction threads to execute simultaneously in each POWER7 processor core. Thus, the instruction thread execution modes of the POWER7 processor are as follows:
SMT1: single instruction execution thread per core
SMT2: two instruction execution threads per core
SMT4: four instruction execution threads per core
SMT4 mode enables the POWER7 processor to maximize the throughput of the processor core by offering an increase in processor-core efficiency. SMT4 mode is the latest step in an evolution of multithreading technologies introduced by IBM. The diagram in Figure 2-3 shows the evolution of simultaneous multithreading.
Figure 2-3 Evolution of simultaneous multithreading
The various SMT modes offered by the POWER7 processor allow flexibility, enabling users to select the threading technology that meets an aggregation of objectives such as
performance, throughput, energy use, and workload enablement.
Intelligent Threads
The POWER7 processor features
Intelligent Threads
that can vary based on the workload demand. The system either automatically selects (or the system administrator can manually select) whether a workload benefits from dedicating as much capability as possible to a single thread of work, or if the workload benefits more from having capability spread across two or four threads of work. With more threads, the POWER7 processor can deliver more total capacity as more tasks are accomplished in parallel. With fewer threads, those workloads that need very fast individual tasks can get the performance they need for maximum benefit.Σ
Multithreading evolution
Thread 1 Executing
Thread 0 Executing No Thread Executing
FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL
1995 Single thread out of order
FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL 1997 Hardware multithreading FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL 2003 2 Way SMT FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL 2009 4 Way SMT Thread 3 Executing Thread 2 Executing
2.1.4 Memory access
Each POWER7 processor chip has two DDR3 memory controllers each with four memory channels (enabling eight memory channels per POWER7 processor). Each channel operates at 6.4 Gbps and can address up to 32 GB of memory. Thus, each POWER7 processor chip is capable of addressing up to 256 GB of memory.
Figure 2-4 gives a simple overview of the POWER7 processor memory access structure.
Figure 2-4 Overview of POWER7 memory access structure
2.1.5 Flexible POWER7 processor packaging and offerings
POWER7 processors have the unique ability to optimize to various workload types. For example, database workloads typically benefit from very fast processors that handle high transaction rates at high speeds. Web workloads typically benefit more from processors with many threads that allow the break down of Web requests into many parts and handle them in parallel. POWER7 processors uniquely have the ability to provide leadership performance in either case.
POWER7 processor 4-core and 6-core offerings
The base design for the POWER7 processor is an 8-core processor with 32 MB of on-chip L3 cache (4 MB per core). However, the architecture allows for differing numbers of processor cores to be active; 4-cores or 6-cores, as well as the full 8-core version.
The L3 cache associated with the implementation is dependant on the number of active cores. For a 6-core version, this typically means that 6 x 4 MB (24 MB) of L3 cache is
Note: In some POWER7 processor-based systems, one memory controller is active with
four memory channels being used.
Advanced Buffer ASIC Chip Memory Controller Memory Controller
POWER7 processor chip
Dual integrated DDR3 memory controllers
High channel and DIMM utilization
Advanced energy management
RAS advances
Eight high-speed 6.4 Gbps channels
New low-power differential signalling
New DDR3 buffer chip architecture
Larger capacity support (32 GB/core)
Energy management support
RAS enablement
DDR3 DRAMs
P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core
Advanced Buffer ASIC Chip Advanced Buffer ASIC Chip Memory Controller Memory Controller Memory Controller Memory Controller
POWER7 processor chip
Dual integrated DDR3 memory controllers
High channel and DIMM utilization
Advanced energy management
RAS advances
Eight high-speed 6.4 Gbps channels
New low-power differential signalling
New DDR3 buffer chip architecture
Larger capacity support (32 GB/core)
Energy management support
RAS enablement
DDR3 DRAMs
P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core P7 Core
Optimized for servers
The POWER7 processor forms the basis of a flexible computer platform and can be offered in a number of guises to address differing system requirements.
The POWER7 processor can be offered with a single active memory controller with four channels for servers where higher degrees of memory parallelism are not required. Similarly, the POWER7 processor can be offered with a variety of SMP bus capacities appropriate to the scaling-point of particular server models.
Figure 2-5 shows the various physical packaging options that are supported with POWER7 processors.
Figure 2-5 Outline of the POWER7 processor physical packaging