e200z335 Core Complex Overview
1.6 Microarchitecture Summary
The e200z3 processor has a four-stage pipeline for instruction execution. 1. Instruction fetch
2. Instruction decode/register file read/effective address calculation 3. Execute/memory access
4. Register writeback
These stages are pipelined, allowing single-clock instruction throughput for most instructions. The integer execution unit consists of a 32-bit arithmetic unit, a logic unit, a 32-bit barrel shifter, a mask-insertion unit, a condition register manipulation unit, a count-leading-zeros unit, a 32× 32 hardware multiplier array, result feed-forward hardware, and support hardware for division.
Most arithmetic and logical operations are executed in a single cycle with the exception of the divide instructions. A count-leading-zeros unit operates in a single clock cycle.
The instruction unit contains a program counter incrementer and a dedicated branch address adder to minimize delays during change-of-flow operations. Sequential prefetching is performed to ensure a supply of instructions into the execution pipeline. Branch target prefetching is performed to accelerate taken branches. Prefetched instructions are placed into an instruction buffer capable of holding six instructions. Conditional branches that are not taken and not folded execute in a single cycle. Branches with successful target prefetching that are not folded have an effective execution time of 1 cycle. All other taken branches have an execution time of 2 clocks.
Memory load and store operations are provided for byte, half-word, word (32-bit), and double-word data with automatic zero or sign extension of byte and half-word load data as well as optional byte reversal of data. These instructions can be pipelined to allow effective single-cycle throughput. Load and store multiple word instructions allow low-overhead context save and restore operations. The load/store unit (LSU) contains a dedicated effective address adder to optimize effective address generation.
The condition register unit supports the condition register (CR) and condition register operations defined by the architecture. The CR consists of eight 4-bit fields that reflect the results of certain operations generated by instructions such as move, integer and floating-point compare, arithmetic, and logical instructions. The CR also provides a mechanism for testing and branching.
Vectored and autovectored interrupts are supported by the CPU. Vectored interrupt support is provided to allow multiple interrupt sources to have unique interrupt handlers invoked with no software overhead.
8 System call 34 SPE round exception
9 Unit unavailable
1 Vector to [p_rstbase[0:19]] || 0xFFC.
2 Autovectored external and critical input interrupts use this IVOR. Vectored interrupts supply an interrupt vector offset directly.
Table 1-3. Exceptions and Conditions (continued)
The SPE category supports vector instructions operating on 16- and 32-bit integer and fractional data types. The vector and scalar floating-point instructions operate on 32-bit IEEE-754 single-precision floating-point formats, and support single-precision floating-point operations in a pipelined fashion. The 64-bit GPRs are used for source and destination operands for all vector instructions, and there is a unified storage model for single-precision floating-point data types of 32 bits and the normal integer type. Low-latency integer and floating-point add, subtract, multiply, divide, compare, and conversion operations are provided, and most operations can be pipelined.
1.6.1
Instruction Unit Features
The e200z3 instruction unit implements the following:
• 64-bit fetch path that supports fetching of two 32-bit or up to four 16-bit VLE instructions per clock • Instruction buffer that holds up to seven sequential instructions
• Dedicated PC (program counter) incrementer supporting instruction fetches
• Branch processing unit with dedicated branch address adder and branch target buffer (BTB) supporting single-cycle execution of successfully predicted branches
• Target instruction buffer that holds up to two prefetched branch target instructions
1.6.2
Integer Unit Features
The integer unit supports single-cycle execution of most integer instructions: • 32-bit AU for arithmetic and comparison operations
• 32-bit LU for logical operations
• 32-bit priority encoder for count-leading-zeros function • 32-bit single-cycle barrel shifter for static shifts and rotates • 32-bit mask unit for data masking and insertion
• Divider logic for signed and unsigned divide in 6–16 clocks with minimized execution timing • 32 × 32 hardware multiplier array that supports single cycle 32 × 32 > 32 multiply
1.6.3
Load/Store Unit (LSU) Features
The e200z3 LSU supports load, store, and load multiple/store multiple instructions: • 32-bit effective address adder for data memory address calculations
• Pipelined operation supports throughput of one load or store operation per cycle
• Dedicated 64-bit interface to memory supports saving and restoring of up to two registers per cycle for load multiple and store multiple word instructions
1.6.4
Memory Management Unit (MMU) Features
The MMU is an implementation of the embedded.MMU category of the Power ISA, with the following feature set:
• 32-bit effective-to-real address translation • 8-bit process identifier (PID)
• 16-entry, fully associative TLB (8-entry in the e200z335)
• Support for multiple page sizes from 4 Kbytes to 256 Mbytes (4 Kbyte to 4 Gbyte in the e200z335) • Hardware assist for TLB miss exceptions
• Software managed by tlbre, tlbwe, tlbsx, tlbsync, and tlbivax instructions • Entry flush protection
• Byte ordering (endianness) configurable on a per-page basis
1.6.5
System Bus (Core Complex Interface) Features
The features of the core complex interface are as follows: • Independent instruction and data buses
• Advanced microcontroller bus architecture (AMBA) and advanced high-performance bus (AHB2.v6)-Lite protocol
• 32-bit address bus plus attributes and control on each bus • Instruction interface has 64-bit read data bus
• Data interface has separate unidirectional 64-bit read data bus and 64-bit write data bus • Pipelined, in-order accesses for both buses.
1.6.6
Nexus 32+ Module Features
The Nexus 3 (Nexus 2+ in e200z335) module provides real-time development capabilities for e200z3 and e200z335 processors in compliance with the IEEE-ISTO Nexus 5001-2003 standard. This module provides development support capabilities without requiring the use of address and data pins for internal visibility.
A portion of the pin interface (the JTAG port) is shared with the OnCE/Nexus1 unit. The IEEE-ISTO 5001-2003 standard defines an extensible auxiliary port, which is used in conjunction with the JTAG port in e200z3 and e200z335 processors.
1.7
Legacy Support of PowerPC Architecture
This section provides an overview of the architectural differences and compatibilities of the e200z3 core compared with the original PowerPC architecture. The two levels of the e200z3 core programming environment are as follows:
• User level—This defines the base user-level instruction set, registers, data types, memory conventions, and the memory and programming models seen by application programmers.
• Supervisor level—This defines supervisor-level resources typically required by an operating system, the memory management model, supervisor-level registers, and the exception model. In general, the e200z3 core supports the user-level architecture from the original PowerPC architecture. The following sections are intended to highlight the main differences. For specific implementation details refer to the relevant chapter.
1.7.1
Instruction Set Compatibility
The following sections describe the user and supervisor instruction sets.
1.7.1.1 User Instruction Set
The e200z3 core family executes legacy user-mode binaries and object files except for the following: • The e200z3 core supports vector and scalar single-precision floating-point operations. These
instructions have different encoding than the original definition of the PowerPC architecture. Additionally, the e200z3 core uses GPRs for floating-point operations, rather than the FPRs defined by the UISA. Most porting of floating-point operations can be handled by recompiling. • String instructions are not implemented on the e200z3 core; therefore, trap emulation must be
provided to ensure backward compatibility.
1.7.1.2 Supervisor Instruction Set
The supervisor-mode instruction set in the original PowerPC architecture is compatible with the e200z3 core with the following exceptions:
• The MMU architecture is different, so some TLB manipulation instructions have different semantics.
• Instructions that support BATs and segment registers are not implemented.
1.7.2
Memory Subsystem
Both the Power ISA and the original version of the PowerPC architecture provide separate instruction and data memory resources. The e200z3 core provides optional additional cache control features, including cache locking. Note that the core implementations described in this document do not implement caches.
1.7.3
Interrupt Handling
Exception handling is generally the same as that defined in the original version of the PowerPC architecture for the e200z3 core, with the following differences:
• The Power ISA defines a new critical interrupt, providing an interrupt nesting. The critical interrupt includes critical input and watchdog timer time-out inputs.
• The debug interrupt, originally implementation-specific, is now included in the Power ISA. It defines the Return from Debug Interrupt instruction, rfdi, and two debug save/restore registers, DSRR0 and DSRR1.
• Processors built on the Power ISA can use IVPR and the IVORs to set exception vectors
individually, but they can be set to the address offsets defined in the OEA to provide compatibility. • Unlike the original version of the PowerPC architecture, the Power ISA does not define a reset
vector; execution begins at a fixed virtual address, 0xFFFF_FFFC. The e200z3 allows this to be hard-wired to any page.
• Some Power ISA and e200z3 core-specific SPRs are different from those defined in the original PowerPC architecture, particularly those related to MMU functions. Much of this information has been moved to the new exception syndrome register (ESR).
• Timer services are generally compatible. However, the Power ISA defines a decrementer auto-reload feature, and two critical-type interrupts—the fixed-interval timer and the watchdog timer interrupts—all of which are implemented in the e200z3 core.
An overview of the interrupt and exception handling capabilities of the e200z3 core can be found in
Section 1.5, “Interrupts and Exception Handling.”
1.7.4
Memory Management
The e200z3 core implements a straightforward virtual address space that complies with the Power ISA MMU definition, which eliminates segment registers and block address translation resources. The Power ISA defines resources for multiple, variable page sizes that can be configured in a single implementation. TLB management is provided with new instructions and SPRs.
1.7.5
Reset
Cores built on the Power ISA do not share a common reset vector with the original PowerPC architecture. Instead, at reset, fetching begins at address 0xFFFF_FFFC. In addition to the Power ISA reset definition, the EIS and the e200z3 core define specific aspects of the MMU page translation and protection
mechanisms. Unlike the original PowerPC core, as soon as instruction fetching begins, the e200z3 core is in virtual mode with a hardware-initialized TLB entry.
1.7.6
Little-Endian Mode
Unlike the original PowerPC architecture, where little-endian mode is controlled on a system basis, the Power ISA allows control of byte ordering on a memory-page basis. Additionally, the little-endian mode used in the Power ISA is true little-endian byte ordering (byte invariance).