• No results found

lec 2-arm cortex - processor core

N/A
N/A
Protected

Academic year: 2020

Share "lec 2-arm cortex - processor core"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Processor core

The processor core implements the ARMv7-M architecture. It has the following main features:

• Thumb-2 (ISA) subset consisting of all base Thumb-2 instructions, 16-bit and 32-bit.

• Harvard processor architecture enabling simultaneous instruction fetch with data load/store. • Three-stage pipeline.

• Single cycle 32-bit multiply. • Hardware divide.

• Thumb and Debug states. • Handler and Thread modes. • Low latency ISR entry and exit.

— Processor state saving and restoration, with no instruction fetch overhead. Exception vector is fetched

from memory in parallel with the state saving, enabling faster ISR entry. — Support for late arriving interrupts.

— Tightly coupled interface to interrupt controller enabling efficient processing of late-arriving interrupts. — Tail-chaining of interrupts, enabling back-to-back interrupt processing

without the overhead of state saving and restoration between interrupts.

• Interruptible-continued LDM/STM, PUSH/POP. • ARMv6 style BE8/LE support.

(3)
(4)

Registers

The processor contains:

• 13 general purpose 32-bit registers

Link Register (LR)

Program Counter (PC)

Program Status Register, xPSR

(5)

Memory interface

The processor has a Harvard interface to enable simultaneous instruction fetches with data load/stores. Memory accesses are controlled by:

• A separate Load Store Unit (LSU) that decouples load and store operations from

the Arithmetic and Logic Unit (ALU).

• A 3-word entry Prefetch Unit (PFU).

One word is fetched at a time. This can be two Thumb instructions, one word-aligned Thumb-2 instruction, or the upper/lower halfword of a

halfword-aligned Thumb-2 instruction with one Thumb instruction, or the lower/upper halfword of another halfword-aligned Thumb-2 instruction. All fetch

addresses from the core are word aligned. If a Thumb-2 instruction is

halfword aligned, two fetches are necessary to fetch the Thumb-2 instruction. However, the 3-entry prefetch buffer ensures that a stall cycle is only

(6)

NVIC

The NVIC is tightly coupled to the processor core. This facilitates low latency exception processing. The main features include:

• a configurable number of external interrupts, from 1 to 240

• a configurable number of bits of priority, from three to eight bits • level and pulse interrupt support

• dynamic reprioritization of interrupts • priority grouping

• support for tail-chaining of interrupts

• processor state automatically saved on interrupt entry, and restored on interrupt exit, with no instruction overhead.

(7)

BUS MATRIX

The bus matrix connects the processor and debug interface to the external buses. The bus matrix interfaces to the following external buses:

ICode bus. This is for instruction and vector fetches from code space. This is a 32-bit AHB-Lite bus.

DCode bus. This is for data load/stores and debug accesses to code space. This is a 32-bit AHB-Lite bus

System bus. This is for instruction and vector fetches, data

load/stores and debug accesses to system space. This is a 32-bit AHB-Lite bus.

PPB. This is for data load/stores and debug accesses to PPB space. This is a 32-bit APB (v2.0) bus.

(8)

The bus matrix also controls the following:

Unaligned accesses. The bus matrix converts

unaligned processor accesses into

aligned accesses. Bit-banding. The bus matrix

converts bit-band alias accesses into bit-band region

accesses. It performs:

—bit field extract for bit-band loads

— atomic read-modify-write for bit-band stores.

Write buffering. The bus matrix contains a one-entry

write buffer to decouple bus stalls from the

processor core.

(9)

FPB

The FPB unit implements hardware breakpoints

and patches accesses from code space to system

space. The FPB has eight comparators as follows:

You can individually configure six instruction

comparators to either remap instruction fetches

from code space to system space, or perform a

hardware breakpoint.

Two literal comparators that can remap literal

(10)

DWT

The DWT unit incorporates the following debug

functionality:

Four comparators that you can configure either as a

hardware watchpoint, an ETM trigger, a PC sampler

event trigger, or a data address sampler event trigger.

Several counters or a data match event trigger for

performance profiling.

Configurable to emit PC samples at defined intervals,

and to emit interrupt event information.

(11)

ITM

The ITM is a an application driven trace source that supports

application event trace and printf style debugging.

The ITM provides the following sources of trace information:

• Software trace. Software can write directly to ITM stimulus

registers. This causes packets to be emitted.

• Hardware trace. These packets are generated by the DWT,

and emitted by the ITM.

• Time stamping. Timestamps are emitted relative to

packets.

(12)

MPU

An optional MPU is available for the processor to provide memory protection. The MPU checks access permissions and memory attributes. It contains eight regions, and an optional background region that implements the default memory map attributes.

ETM

The ETM is a low-cost trace macrocell that supports instruction trace only.

TPIU

The TPIU acts as a bridge between the Cortex-M3 trace data from the ITM, an ETM if present, and an off-chip Trace Port Analyzer. You can configure the TPIU to support either serial pin trace for low-cost debug, or multi-pin trace for higher bandwidth race. The TPIU is CoreSight compatible.

SW/SWJ-DP

You can configure the processor to have SW-DP or SWJ-DP debug port interfaces. The debug port provides debug access to all registers and memory in the system,

(13)
(14)

Prefetch Unit

The purpose of the

Prefetch Unit (PFU) is to:

• Fetch instructions in advance and forward PC

relative branch instructions. Fetches are

speculative in the case of conditional branches

Fetches 3 thumb 2 insturction.

• Detect Thumb-2 instructions and present

these as a single instruction word.

(15)

Branch target forwarding(BRCHSTAT)

Provides memory transaction earlier than reaching , EXECUTE

Increases performance of the core

It loses a fetch opportunity if speculated on conditional opcode.The additional penalty is a cycle of pipeline stalling

Brach forwarding can be thought of assigning internal memory for

branch

Branch forwarding is costly than wait statement

– Gives control to subroutine when conditional branch is there

A refinement is to only predict backward conditional branches to accelerate

loops

with ARM compilers favoring loops with unconditional branch backwards at

the bottom and then conditional branch forward tests on the loop limit, the core fetch queue being ahead at the start of the loop yields good behavior

(16)

The BRCHSTAT also includes other information

about the next opcode to reach execute.

BRCHSTAT with respect to execute opcodes is a

hint unrelated to any transaction and can be

asserted for multiple cycles

helps to avoid any trailing waitstates of the

controller prefetch from impacting the branch

target when it is generated in execute.

References

Related documents

DeLillo's narratives in Libra and Mao II, then, suggest that in the postmodern world, the act of detection does not result in desirable forms of closure, but rather in either

Número: Nome: 4/9.. Consider a memory system for a 32-bit processor with separate caches for code and data. As- sume that the processor always makes accesses to 32-bit words, and

Number of memory slots 2 Processor Brand Intel® Processor base frequency 2.3 Processor base frequency uom GHz Processor cache 8 MB L3 Processor core 4 Processor family Core™

Data address match breakpoint that the base register at subsequent memory hints allow the arm cortex m architecture reference manual also provides debug.. On a processor that

A processor executing load and store instructions observes the effects of the loads and stores that use the same memory access type in the order that they occur in the

Appendix 1 provides the optimal tax solution and a table of optimal tax rates. The optimal tax increases with the externality parameter,

– Up to 1.4-GHz Cortex-A15 Processor Core Speed – 4MB L2 Cache Memory Shared by All ARM Cores. – Full Implementation of ARMv7-A Architecture Instruction

38 A fabricating procedure calls for fillet welds to be blended in by grinding.. 39 Bend test specimens have been taken from a 25 mm thick carbon steel butt weld. Which would show