lec 2-arm cortex - processor core

(1)

(2)

Processor core

• _{The processor core implements the ARMv7-M architecture. It has the following main features:}

• Thumb-2 (ISA) subset consisting of all base Thumb-2 instructions, 16-bit and 32-bit.

• Harvard processor architecture enabling simultaneous instruction fetch with data load/store. • Three-stage pipeline.

• Single cycle 32-bit multiply. • Hardware divide.

• Thumb and Debug states. • Handler and Thread modes. • Low latency ISR entry and exit.

• _{— Processor state saving and restoration, with no instruction fetch overhead. Exception vector is fetched}

from memory in parallel with the state saving, enabling faster ISR entry. — Support for late arriving interrupts.

— Tightly coupled interface to interrupt controller enabling efficient processing of late-arriving interrupts. — Tail-chaining of interrupts, enabling back-to-back interrupt processing

• _{without the overhead of state saving and restoration between interrupts.}

• Interruptible-continued LDM/STM, PUSH/POP. • ARMv6 style BE8/LE support.

(3)

(4)

Registers

The processor contains:

• 13 general purpose 32-bit registers

• Link Register (LR)

• Program Counter (PC)

• Program Status Register, xPSR

(5)

Memory interface

The processor has a Harvard interface to enable simultaneous instruction fetches with data load/stores. Memory accesses are controlled by:

• A separate Load Store Unit (LSU) that decouples load and store operations from

the Arithmetic and Logic Unit (ALU).

• A 3-word entry Prefetch Unit (PFU).

One word is fetched at a time. This can be two Thumb instructions, one word-aligned Thumb-2 instruction, or the upper/lower halfword of a

halfword-aligned Thumb-2 instruction with one Thumb instruction, or the lower/upper halfword of another halfword-aligned Thumb-2 instruction. All fetch

addresses from the core are word aligned. If a Thumb-2 instruction is

halfword aligned, two fetches are necessary to fetch the Thumb-2 instruction. However, the 3-entry prefetch buffer ensures that a stall cycle is only

(6)

NVIC

• _{The NVIC is tightly coupled to the processor core. This facilitates} low latency exception processing. The main features include:

• a configurable number of external interrupts, from 1 to 240

• a configurable number of bits of priority, from three to eight bits • level and pulse interrupt support

• dynamic reprioritization of interrupts • priority grouping

• support for tail-chaining of interrupts

• processor state automatically saved on interrupt entry, and restored on interrupt exit, with no instruction overhead.

(7)

BUS MATRIX

The bus matrix connects the processor and debug interface to the external buses. The bus matrix interfaces to the following external buses:

• _{ICode bus. This is for instruction and vector fetches from code} space. This is a 32-bit AHB-Lite bus.

• _{DCode bus. This is for data load/stores and debug accesses to code} space. This is a 32-bit AHB-Lite bus

• _{System bus. This is for instruction and vector fetches, data}

load/stores and debug accesses to system space. This is a 32-bit AHB-Lite bus.

• _{PPB. This is for data load/stores and debug accesses to PPB space.} This is a 32-bit APB (v2.0) bus.

(8)

• _{The bus matrix also controls the following:}



_{Unaligned accesses. The bus matrix converts}

unaligned processor accesses into



_{aligned accesses. Bit-banding. The bus matrix}

converts bit-band alias accesses into bit-band region

accesses. It performs:



_{—bit field extract for bit-band loads}



_{— atomic read-modify-write for bit-band stores.}



_{Write buffering. The bus matrix contains a one-entry}

write buffer to decouple bus stalls from the

processor core.

(9)

FPB

The FPB unit implements hardware breakpoints

and patches accesses from code space to system

space. The FPB has eight comparators as follows:

• _{You can individually configure six instruction}

comparators to either remap instruction fetches

from code space to system space, or perform a

hardware breakpoint.

• _{Two literal comparators that can remap literal}

(10)

DWT

The DWT unit incorporates the following debug

functionality:

• _{Four comparators that you can configure either as a}

hardware watchpoint, an ETM trigger, a PC sampler

event trigger, or a data address sampler event trigger.

• _{Several counters or a data match event trigger for}

performance profiling.

• _{Configurable to emit PC samples at defined intervals,}

and to emit interrupt event information.

(11)

ITM

The ITM is a an application driven trace source that supports

application event trace and printf style debugging.

The ITM provides the following sources of trace information:

• Software trace. Software can write directly to ITM stimulus

registers. This causes packets to be emitted.

• Hardware trace. These packets are generated by the DWT,

and emitted by the ITM.

• Time stamping. Timestamps are emitted relative to

packets.

(12)

MPU

An optional MPU is available for the processor to provide memory protection. The MPU checks access permissions and memory attributes. It contains eight regions, and an optional background region that implements the default memory map attributes.

ETM

The ETM is a low-cost trace macrocell that supports instruction trace only.

TPIU

The TPIU acts as a bridge between the Cortex-M3 trace data from the ITM, an ETM if present, and an off-chip Trace Port Analyzer. You can configure the TPIU to support either serial pin trace for low-cost debug, or multi-pin trace for higher bandwidth race. The TPIU is CoreSight compatible.

SW/SWJ-DP

You can configure the processor to have SW-DP or SWJ-DP debug port interfaces. The debug port provides debug access to all registers and memory in the system,

(13)

(14)

Prefetch Unit

The purpose of the

Prefetch Unit (PFU) is to:

• Fetch instructions in advance and forward PC

relative branch instructions. Fetches are

speculative in the case of conditional branches

Fetches 3 thumb 2 insturction.

• Detect Thumb-2 instructions and present

these as a single instruction word.

(15)

Branch target forwarding(BRCHSTAT)

• _{Provides memory transaction earlier than reaching , EXECUTE}

• _{Increases performance of the core}

• _{It loses a fetch opportunity if speculated on conditional opcode.} • _{The additional penalty is a cycle of pipeline stalling}

• _{Brach forwarding can be thought of assigning internal memory for}

branch

• _{Branch forwarding is costly than wait statement}

– Gives control to subroutine when conditional branch is there

– _{A refinement is to only predict backward conditional branches to accelerate}

loops

– _{with ARM compilers favoring loops with unconditional branch backwards at}

the bottom and then conditional branch forward tests on the loop limit, the core fetch queue being ahead at the start of the loop yields good behavior

lec 2-arm cortex - processor core

Processor core

Registers

The processor contains:

• 13 general purpose 32-bit registers

•

Link Register (LR)

•

Program Counter (PC)

•

Program Status Register, xPSR

Memory interface

NVIC

BUS MATRIX

•

The bus matrix also controls the following:



Unaligned accesses. The bus matrix converts

unaligned processor accesses into



aligned accesses. Bit-banding. The bus matrix

converts bit-band alias accesses into bit-band region

accesses. It performs:



—bit field extract for bit-band loads



— atomic read-modify-write for bit-band stores.



Write buffering. The bus matrix contains a one-entry

write buffer to decouple bus stalls from the

processor core.

FPB

The FPB unit implements hardware breakpoints

and patches accesses from code space to system

space. The FPB has eight comparators as follows:

•

You can individually configure six instruction

comparators to either remap instruction fetches

from code space to system space, or perform a

hardware breakpoint.

•

Two literal comparators that can remap literal

DWT

The DWT unit incorporates the following debug

functionality:

•

Four comparators that you can configure either as a

hardware watchpoint, an ETM trigger, a PC sampler

event trigger, or a data address sampler event trigger.

•

Several counters or a data match event trigger for

performance profiling.

•

Configurable to emit PC samples at defined intervals,

and to emit interrupt event information.

ITM

The ITM is a an application driven trace source that supports

application event trace and printf style debugging.

The ITM provides the following sources of trace information:

• Software trace. Software can write directly to ITM stimulus

registers. This causes packets to be emitted.

• Hardware trace. These packets are generated by the DWT,

and emitted by the ITM.

• Time stamping. Timestamps are emitted relative to

packets.

Prefetch Unit

The purpose of the

Prefetch Unit (PFU) is to:

• Fetch instructions in advance and forward PC

relative branch instructions. Fetches are

speculative in the case of conditional branches

Fetches 3 thumb 2 insturction.

• Detect Thumb-2 instructions and present

these as a single instruction word.

Branch target forwarding(BRCHSTAT)

•

The BRCHSTAT also includes other information

about the next opcode to reach execute.

–

BRCHSTAT with respect to execute opcodes is a

_{The bus matrix also controls the following:}

_{Unaligned accesses. The bus matrix converts}

_{aligned accesses. Bit-banding. The bus matrix}

_{—bit field extract for bit-band loads}

_{— atomic read-modify-write for bit-band stores.}

_{Write buffering. The bus matrix contains a one-entry}

_{You can individually configure six instruction}

_{Two literal comparators that can remap literal}

_{Four comparators that you can configure either as a}

_{Several counters or a data match event trigger for}

_{Configurable to emit PC samples at defined intervals,}

_{The BRCHSTAT also includes other information}

_{BRCHSTAT with respect to execute opcodes is a}

_{helps to avoid any trailing waitstates of the}