AXI Overview

(1)

AXI Overview

(2)



Advanced Microcontroller Bus Architecture

– On-chip bus protocol from ARM

• On-chip interconnect specification for the connection and management of functional blocks including processor and peripheral devices

– Introduced in 1996

– AMBA is a registered trademark of ARM Limited – AMBA is an open standard

(3)

(4)



This presentation outlines the specific topics/sections that need to be

understood.



For each of the topic corresponding section number in the AMBA®

AXI™ and ACE™ Protocol Specification (Issue E, Date 22 February 2013)

is provided.



This has been divided into 3 parts:

– Part A: AMBA AXI3 and AXI4 Protocol Specifications – Part B: AMBA AXI4-Lite

– Part C: ACE Protocol Specification



Please go through the details in the AMBA specification.

(5)

 Introduction to AXI Protocol – Features (A1.1) – Revisions (A1.2)

– AXI Architecture (A1.3)  Signal Descriptions

– Global Signals (A2.1)

– Write Address Channel Signals (A2.2) – Write Data Channel Signals (A2.3) – Write Response Channel Signals (A2.4) – Read Address Channel Signals (A2.5) – Read Data Channel Signals (A2.6) – Low Power Interface Signals (A2.7)  Signal Interface Requirements

– Clock and Reset (A3.1)

– Basic read and write transactions (A3.2) – Relationship between channels (A3.3) – Transaction Structure (A3.4)

 Transaction Attributes

– Transaction Attributes (A4.1)

– AXI3 memory attribute signaling (A4.2)

– AXI4 changes to memory attribute signaling (A4.3) – Memory Types (A4.4)

– Mismatched memory attributes (A4.5) – Transaction Buffering (A4.6)

– Access Permissions (A4.7) – Legacy Considerations (A4.8)

 Multiple Transactions

– AXI Transaction Identifiers (A5.1) – Transaction ID (A5.2)

– Transaction Ordering (A5.3)

– Removal of Write Interleaving Support (A5.4)  AXI4 Ordering Model

– Definition of the ordering model (A6.1) – Master Ordering (A6.2)

– Interconnect Ordering (A6.3) – Slave Ordering (A6.4)

– Response before final destination (A6.5) – Ordered write observation (A6.6)  Atomic Accesses

– Single Copy Atomicity Size (A7.1) – Exclusive Accesses (A7.2)

– Locked Accesses (A7.3) – Atomic Access Signaling (A7.4)  AXI4 Additional Signalling

– QoS Signaling (A8.1)

– Multiple Region Signaling (A8.2) – User defined Signaling (A8.3)  Low Power Interface

– About Low Power interface (A9.1) – Low Power Clock Control (A9.2)  Default signaling and Interoperability

– Interoperability principles (A10.1) – Major interface categories (A10.2) – Default Signal Values (A10.3)

(6)



Definition of AXI4-Lite (B1.1)



Interoperability (B1.2)



Defined Conversion Mechanism (B1.3)



Conversion, Protection and Detection (B1.4)

(7)



About ACE (C1)



Signal Descriptions (C2)



Channel Signaling (C3)



Coherency Transactions on Read and Write Address Channels (C4)



Snoop Transactions (C5)



Interconnect Requirements (C6)



Cache Maintenance (C7)



Barrier Transactions (C8)



Exclusive Accesses (C9)



Optional External Snoop Filtering (C10)



ACE-Lite (C11)



Distributed Virtual Memory (C12)



Interface Control (C13)



Master Design Recommendations (C14)

(8)

Part A

(9)

 AMBA AXI protocol is targeted at high-performance, high-frequency system designs

 AXI key features

– Support for separate channels for:

• Read Address • Read Data • Write Address • Write Data and • Write Response

– Support for unaligned data transfers using byte strobes – Ability to issue multiple outstanding addresses

– Out of order (OO) transaction completion – Support for data interleaving

– Advanced system cache support

• Specify if transaction is cacheable/bufferable • Specify attributes such as write-back/write-through

– Enhanced protection support

• Secure/non-secure transaction specification

– Exclusive access (for semaphore operations)

– Register slice can be easily added for timing-closure

(10)

(11)

 Read address channel and Write address channel

– Conveys address and other control information – Variable length burst: 1 ~ 16 data transfers

• Exception: In AXI4, INCR bursts can have lengths upto 256 transfers.

– Burst with a transfer size of 8 ~ 1024 bits (i.e.1Byte ~ 128Bytes)

 Read data channel

– Convey data and any read response info.

– Data bus can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide – Read response is signaled per transfer.

 Write data channel

– Data bus can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide

 Write response channel

– Write response info, signaled for the entire burst.

NOTES:

 Each channel is independent and uses a 2-way flow-control.

(12)

(13)

AMBA AXI Read Channels

(14)

AMBA AXI Read Channels

Give me some data

(15)

AXI Read Channels

Give me some data

Here you go

(16)

AXI Read Channels

Give me some data

Here you go

Independent

channels synchronized with ID # or “tags”

(17)

(18)

AXI Write Channels

Independent

(19)

AXI Write Channels

I’m sending data. Please store it.

Independent

(20)

AXI Write Channels

Here is the data. Independent

(21)

AXI Write Channels

Here is the data.

I received that data correctly.

Independent

(22)

AXI Write Channels

Here is the data.

I received that data correctly.

Independent

channels synchronized with ID # or “tags”

(23)

AXI Flow-Control

• AXI uses a valid/ready

handshake acknowledge

• Each channel has its own

valid/ready

• Information moves only when:

– Sourcehas Valid information and – Destination is Ready

• On each channel the master or

slave can limit the flow

• Flexible signaling functionality

– Inserting wait states

– Always Ready

– Same Cycle Acknowledge

Inserting Wait States

Always Ready

(24)

AXI Flow-Control

Transfer Inserting Wait States

Always Ready

Same Cycle Acknowledge

• AXI uses a valid/ready

handshake acknowledge

• Each channel has its own

valid/ready

• Information moves only when:

– Sourcehas Valid information and – Destination is Ready

• On each channel the master or

slave can limit the flow

• Flexible signaling functionality

– Inserting wait states

– Always Ready

(25)

AXI Read

Read Address Channel

(26)

Read Burst Operation

Read request is initiated Read request is accepted Slave is ready 1 st_data

is transferred The last data is transferred

(27)



Separation of address and data channel

– Master provides only the start address of burst

– Slave needs to generate the remaining addresses based on

burst type (FIXED, INCR, WRAP)

One Address for Burst

A21 A11 D21 D11 D12 D13 D14 A31 D22 D23 D31 ADDRESS DATA

(28)

Overlapping Read Bursts

Read request A

is accepted Read request B

is accepted via AR channel

while data A(0) is

(29)

AMBA AXI Write

Write Address Channel

Write Response Channel

Write Data Channel

(30)

Write Burst Operation

Write request A

(31)



To prevent a deadlock situation, you must observe

the dependencies that exist between the handshake

signals



In any transaction:

– The VALID signal of one AXI component must not

be dependent on the READY signal of the other

component in the transaction

– The READY signal can wait for assertion of the VALID

signal

Dependencies between Channel Handshake

Signals (AXI3)

(32)



The AXI3 protocol requires that the write response for all transactions

must not be given until the clock cycle after the acceptance of the last data

transfer



In addition, the AXI4 protocol requires that the write response for all

transactions must not be given until the clock cycle after address

acceptance

Dependencies between Channel Handshake

Signals (AXI4)

AXI3

AXI4

WLAST

(33)



AXI gives an ID tag to every transaction

Use of IDs

Write Address Channel Write Data Channel Write Response Channel Read Address Channel Read Data Channel

(34)



Real implementation

– Transaction ID = <master ID, channel ID> – Channel ID = original AXI transaction id

– Master ID is needed to identify the initiating master among all the masters

Transaction ID Implementation

CPU DecodeVideo r

3D Graphic

s

LCD

Control Process Video Mixer DMA

Interconnect

Memory Controller

ID: 3 ID: 2 ID: 3 ID: 0 ID: 3 ID: 2 ID: 4

(35)



Multiple Outstanding Addresses:

– By using IDs, a master can issue transactions without waiting for earlier transactions to complete.



Write Data Interleaving in AXI3 Slaves:

– With Write Data Interleaving, an AXI3 slave can accept interleaved write-data with different AWID values.

– This feature is not supported in AXI4

• All Write Data for a transaction must be provided in consecutive transfers on the write data channel.

• WID signal is not supported in AXI4



Out of Order completion

– Transactions with the same ID are completed in order

– Transactions with different IDs can be completed out of order

– Fast-responding slaves respond in advance of earlier transactions with slower slaves

– This is not a required feature. Simple masters and slaves can process one transaction at a time in the order they are issued

(36)



Ordering by transaction ID

– Slave can handle data transfers with different transaction IDs

out-of-order

– The order within a single burst is maintained

Out-of-Order Transaction

A11 D11 D12 D13 D14 A21 D21 D22 D23 ADDRESS RDATA A31 D31

(37)



Write request and data

– The write data can appear at an interface before the write

address that relates to it



Two relationships that must be maintained are:

– Read data must always follow the address to which the

data relates

– A write response must always follow the last write

transfer in the write transaction to which the write

response relates

(38)



No ordering restrictions between read and write transactions with the

same AWID and ARID. If a master requires an ordering restriction then

it must enforce the ordering.

(39)



The data for a sequence of read

transactions with the same

ARID must be returned in

order that:

– When reads with the same ARID

are from the same slave then the

slave must ensure that the read

data is returned in the same order

that the addresses are received

– When reads with the same ARID

are from different slaves, the

interconnect must ensure that

the read data is returned in the

same order that the master issued

the addresses in.

Ordering Rules #3: Multiple reads with same ARID

Slave IP Master IP Slave IP 1 Slave IP 2 Inter- connect Master IP

(40)



Interleaving rule

– Write Data with different ID can be interleaved.

– This is supported only in AXI3

– The order within a single burst is maintained

Ordering Rules #4: Write Interleaving

A11 D11 D12 D13 D14 ADDRESS WDATA A21 D21 D22 D23 A31 D31

(41)



No support of write interleaving in AXI4



Master must ensure same order for write data as that of address



Removal of WID in AXI4, why?

– Write data with different AWIDs follow their address order + no write interleaving  no need of WID!

– Responses to multiple writes with different IDs can be out-of-order from address order  BID remains!

(42)



OKAY

◦ Normal access success/Exclusive access failure/Exclusive access to non-supporting Slave



EXOKAY

◦ Exclusive access success



SLVERR

◦ Slave generates error response/unsupported transfer size/WR access to RO/timeout condition in slave/access to address where no register

present/access to disabled or powered-down function



DECERR

◦ Can not Decode Slave Access then default slave gives DECERR

(Note: For a write transaction, there is just one response given for the entire

burst and not for each data transfer within the burst. In a read transaction,

the slave can give different responses for different transfers within a

burst.)

(43)



AXI enables the insertion of a register slice in any channel at the cost of

an additional cycle latency

– Trade-off between latency and maximum frequency

 Register slice can be used at any channel independently  Register slice incurs one cycle latency per insertion

Register Slice for Timing Isolation

AXI

Master AXI Slave

Write data WREADY Read data Response BREADY Write Address/Control AWREADY Read Address/Control ARREADY

(44)



A single clock signal, ACLK.

◦ All input signals sampled on the rising edge of ACLK. All output signal changes must occur after the rising edge of ACLK.

◦ Must be no combinatorial paths between input and output signals on both master and slave interfaces.



Single Active low reset ARESETn.

(45)



Transaction burst type determines address bus behavior

– Fixed, increment, or wrap



Unaligned Access

◦ Master uses lower address bits; byte lane strobe must be consistent to lower address bits information



Optional address Lock signals facilitates exclusive and atomic access

protection



System cache support



Protection unit support

(46)

(47)

 In a Narrow Transfer, the address and control (WSTRB) determine which byte lanes the transfer uses.

 WSTRB[n:0] signals, when high specify which byte-lanes are used  Example 1: A narrow transfer with 8-bit transfers

– Burst has 5 transfers – Starting address is 0 – Transfer size is 8-bits – Data bus-width is 32-bit – Burst type is INCR

 Example 2: A narrow transfer with 32-bit transfers

– Burst has 3 transfers – Starting address is 4 – Transfer size is 32-bit – Data bus-width is 64-bit – Burst type is INCR

(48)



AXI Support Unaligned Transfers

– Unaligned Transfer is a transfer in which the 1st_{byte accessed is}

unaligned with the natural address boundary

– e.g. A 32-bit transfer that starts at address 0x1002 is not aligned to the natural 32-bit address boundary



Master can:

– Use low-order address lines to signal an unaligned start address OR

– Provide an aligned address and use byte-lane strobes to signal the unaligned start address

(49)

Unaligned Transfer (Cont’d)

(50)



Wrapping burst case

– The wrap boundary is aligned to the total size of the data to be transferred

• That is, to ( (size of each transfer in burst) x (number of transfers in burst) )

– After each transfer, the address increments same as for INCR Burst – If incremented address is ( (wrap boundary) + ( total size of data to be

transferred) ), then the address wraps around to wrap-boundary.

(51)



AR(W)LEN[7:0] allows INCR burst of 256 beats



Burst in AXI3 protocol:

– Early termination of bursts is not supported.

– A burst must not cross a 4-kbyte boundary. This ensures that a burst is

only destined for a single slave.



AXI4 protocol longer burst support:

– Bursts longer than 16 beats are only supported for the INCR burst

type. Both WRAP and FIXED burst types remain constrained to a

maximum burst length of 16 beats.

– Exclusive accesses are not permitted to use a burst length greater than

16.

(52)



AxCACHE[3:0] signals define the

transaction attributes of a

transfer



Transaction attributes control:

– How a transaction progresses

through the system

– How any system-level cache handles

the transaction

System Cache Support

ARCACHE[3:0] / AWCACHE[3:0]

[AXI3] L1 Data Cache L1 Instruction Cache Unified L2 Cache RF _Memory Memory Memory Memory CPU

(53)

 Bufferable bit (B): AxCACHE[0]

– The interconnect or any component can delay the transaction for any number of cycles. This is usually only relevant to writes.

– Transaction response may not be from the final destination, but from the intermediate point, like Cache. The cache is then responsible to update the memory.

 Cacheable bit (C): AxCACHE[1]

– The characteristics of the transaction at the final destination does not have to match the characteristics of the original transaction.

– For writes this means that a number of different writes can be merged together.

– For reads this means that a location can be pre-fetched or can be fetched just once for multiple read transactions.

– To determine if a transaction should be cached this bit should be used in conjunction with the

Read Allocate (RA) and Write Allocate (WA) bits.

 Read Allocate bit (RA): AxCACHE[2]

– If ‘1’, the transaction should be looked-up in the cache

– In case of read miss, it is recommended to allocate an entry in the cache – If C=low, RA=low

 Write Allocate bit (WA): AxCACHE[3]

– If ‘1’, the transaction should be looked-up in the cache

– In case of write miss, it is recommended to allocate an entry in the cache – If C=low, WA=low

System Cache Support

ARCACHE[3:0] / AWCACHE[3:0]

(54)



The AxCACHE[1] bits are renamed:

– From ‘Cacheable’ to ‘Modifiable’ – to better describe the required functionality – Actual Functionality is unchanged



Ordering requirements are defined for Non-Modifiable transactions

– Ordering between transactions should be maintained, if the transactions satisfy all of the following conditions:

• Transactions are Non-Modifiable • Transactions use the same ID

• Transactions target the same slave device



Meanings of RA and WA bits are updated:

– One bit indicates if an this transaction should be allocated in Cache

– Other bit indicates if an allocation could have been made due to another transaction

(55)



For Read Transactions:

– RA bit means the same: The location could have been previously allocated in the cache. It is recommended that this transaction is allocated in cache. – WA bit is redefined: The location could have been previously allocated in the

cache because of other transaction – Either Write transaction or Transaction by other master



For Write Transactions:

– WA bit means the same: The location could have been previously allocated in the cache. It is recommended that this transaction is allocated in cache.

– RA bit is redefined: The location could have been previously allocated in the cache because of other transaction – Either a Read transaction or Transaction by other master



This change means:

– For a same location, a read and a write transfer may have different values for

(56)



Normal or Privileged Mode: AxPROT[0]

◦ Indicates whether an access was done by a Master in Privilege Mode or

in Unprivileged Mode

◦ LOW indicates an access done by a Master in Unprivileged Mode

◦ HIGH indicates an access done by a Master in Privileged Mode

 A privileged processing mode typically has a greater level of access within a system.



Secure or Non-secure: AxPROT[1]

◦ LOW indicates an Secure access

◦ HIGH indicates an Non-secure access

 Used where a greater degree of differentiation between processing modes is required.



Instruction or data, AxPROT[2]

◦ LOW indicates a data access

◦ HIGH indicates an instruction access.

Protection Support

(57)

 Normal access, AxLOCK[1:0]=b00  Exclusive access, b01

– Exclusive read – Exclusive write

– If no intervening write to the address region, EXOKAY response. If not, OKAY response.

– Usually used for read-modify-write

 Locked access, b10

– Start with b10, and end with b00 – During the period, only the lock

initiating master can access the address region

Atomic Access

(58)



Semaphore type operation without requiring bus to remain locked to a

particular master for the duration of the operation



Usually used for read-modify-write kind of operations



Slave must have additional logic to support exclusive access.



The basic process for an exclusive access is:

– A master performs an exclusive read from an address location.

– At some later time, the master attempts to complete the exclusive operation by performing an exclusive write to the same address location.

– The exclusive write access of the master is signaled as:

• Successful (EXOKAY) if no other master has written to that location between the read and write accesses.

• Failed (OKAY) if another master has written to that location between the read and write accesses. In this case the address location is not updated

Exclusive Access

time E.RD 0x100 WR 0x100 Master 1 Master 2 E.WR 0x100 Master 1 OKAY Slave 1

(59)



Interconnect must ensure that only that master is allowed access to the

slave region until an unlocked transfer from the same master completes



Master should have no other outstanding transactions waiting to complete

before issuing locked sequence



Final transaction effectively removes the lock

Locked Access

[AXI3]

time 0x100 0x100 Master 1 Master 2 0x100 Master 1 Lock Unlock

(60)



No support of locked access



All locked accesses from AXI3 masters need to be converted to normal

accesses

(61)

 Quality of Service Signaling (AxQOS[3:0])

– AxQOS 4-bit signals sent on address channel for each transaction – This protocol does not specify exact use of QoS identifiers

– Recommendation: Can be used as a priority indicator for that transaction – Default value of b0000 indicates no participation in QoS scheme

 Region Identifier Signals ( AxREGION[3:0] ):

– 4-bit signals can uniquely identify upto 16 different regions

– The region identifier provides a decode of higher order address bits

– Using regional identifiers, a single Physical Interface on a slave can mimic multiple (upto 16) logical interfaces, each with a different location in the system address map

– Interconnect should produce AxREGION signals when performing the address decode function for a signle slave that has multiple logical interfaces

 User Signals on each AXI Channel for ‘User Defined’ Signaling

– ( AWUSER, WUSER, BUSER, ARUSER, RUSER )

– Specification recommends not to use them, to avoid interoperability issues

(62)



Optional Extension to AXI protocol

– Uses 3 level signals for handshake between the system-clock-controller and the peripheral



Signals:

– CACTIVE: (driven by peripheral)

• High => Peripheral requires a clock signal. Clock-Controller must enable the clock immediately.

• Low => Peripheral does not require the clock

– CSYSREQ: (driven by clock-controller)

• Low => Request for the peripheral, to enter a low-power state • High => Request for the peripheral, to exit a low-power state

– CSYSACK: (driven by peripheral)

• Low => Low-power entry request acknowledged by peripheral • High => Low-power exit request acknowledged by peripheral

(63)



The peripheral can accept or deny the request, from the

system-clock-controller, to enter low-power state.



The level of the ‘CACTIVE’ signal when the peripheral acknowledges the

request by driving CSYSACK low indicates the acceptance or denial of the

request.

(64)

Low Power Interface (C channel)

Acceptance of low-power request

(65)

 Additional QoS Signaling: AxQOS[3:0]

 Additional 4-bit interface signals AxREGION

◦ allows 16 different regions to be uniquely identified

◦ Region identifier should be constant in 4kB address space

 Added USER signals

– AxUSER, RUSER,WUSER,BUSER

 Removes support for locked transfers so AxLOCK signal is single bit (Normal/Exclusive)

 Removal of Write Interleaving support

– Removes WID signal

 Write response requirements are updated:

◦ AXI3: clock cycle after last data transfer ◦ AXI4: clock cycle after address acceptance

 Support of upto 256beats of burst lengths (for INCR bursts)

 AWCACHE and ARCACHE signaling is updated

(66)

Part B

(67)



AXI4-Lite is a subset of the AXI4 protocol intended for communication

with simpler, smaller control-register style interfaces in components.



AXI4-Lite is a simpler AXI4 for onchip devices requiring a more powerful

interface than APB.



Features:

 All transactions with burst length of 1

 all data accesses are the same size as the width of the data bus  support for data bus width of 32-bit or 64-bit

 all accesses are equivalent to AWCACHE or ARCACHE equal to b0000

(i.e. non-modifiable and non-bufferable)  exclusive accesses are not supported.

 AXI IDs not supported – All transactions must be in order  So signal list reduced

(68)



Subset of AXI signal set



Simple traditional signaling



Targeted applications: simple, low-performance peripherals

– GPIO – Uart



Signals not-supported in AXI-Lite:

– AWLEN, ARLEN – AWSIZE, ARSIZE – AWBURST, ARBURST – AWLOCK, ARLOCK – AWCACHE, ARCACHE – WLAST, RLAST

(69)

Part C

(70)



Two problems for systems that

contain caches:

1) Memory may be updated (by another master) after a cached master has taken a copy

• The cache no-longer contains up-to-date data

2) In systems that contain write-back caches, if the master writes to local cached copy

• The memory no-longer contains up-to-date data.

• A 2nd_{master reading from}

memory will see stale data.

Coherency Problem

Interconnect Master1 Master2 Cache Cache Main Memory Master3 Cache

(71)



Snooping Cache Coherency Protocols

– Transactions to a shared-region are ‘broadcast’ to all masters

– All masters ‘listen-in’ to all shared data-transactions originating from other masters

– When the master detects a read transaction for which it has the most up-to-date data, it provides the data to the other master requesting it; or in the case of a write, it invalidates it’s own copy.



Directory based Cache Coherency Protocols

– A single ‘directory’ is maintained, which contains a list of where every cached line within the system is held.

– A master initiating a transaction first consults the directory to find where the data is cached and then directs cache coherency traffic to only those masters containing cached copies.

(72)



ACE is an extension to AXI



Aims at providing Hardware based cache coherency



Adds 3 new Snoop Channels:



Adds additional signal to existing AXI channels



Also adds barrier support to enforce ordering of multiple outstanding

transactions

(73)

 ACE Protocol is realized using:

– A 5-State Cache Model to define the state of a Cache Line in the coherent system – Additional Snoop Channels that enable communication with a cached master when

another master is accessing a shared address location

• Read Channels: (AR, R) • Write Channels: (AW, W, B) • Snoop Channels: (AC, CR, CD)

– Additional Signaling on the existing AXI4 channels that enables new transactions and information to be conveyed

 ACE Supported Policies

– 100% Snoop – Directory Based

– Anything in-between (Snoop Filter)

 ACE adds following to the AXI

– Support for hardware coherency – Support for cache maintenance

(74)

Terms used to describe the state of a cache line are:

 Unique:

– The cache line resides ONLY in this cache

 Shared

– The cache line MAY be in other cache

 Clean

– The cache controller does not have to update the main memory

 Dirty

– The cache controller is responsible to update the main memory

 Invalid

– The cache line is not being used for caching data



Devices are not required to

support all 5 states internally

(75)

 Non-Shareable

– The domain contains a single master

 Inner Shareable

– The domain can include additional masters

 Outer Shareable

– The domain contains at least all masters in the inner domain – Can include additional masters

 System

– This domain includes all masters in the systems

(76)



For Coherency Transactions:

– A master uses a shareability domain to determine which

other masters might have a copy of the addressed location in their local cache

– Interconnect uses this

information to determine, for any given transaction, which other masters must be snooped



For Barrier Transactions:

– The domain of a barrier transaction can be used to determine how far a barrier transaction must propagate

(77)

 Snoop Channels enable communication with a cached master when another master is accessing a shared address location

 AC Channel (Coherent Address Channel): Input to Master

– ACADDR used for sending the address of snoop request to a cached master, accompanied with control signals

 CR Channel (Coherent Response Channel): Output from Master

– CRRESP is used by the master to signal the responses to snoop to the interconnect

– A narrow, 5-bit response indicating whether an associated data transfer is expected on the CD channel

 CD Channel (Coherent Data Channel): Output from Master

– CDDATA, used by the master to provide the data in response to a snoop.

– Optional for write-through caches

(78)



ACE adds additional signals to existing AXI Channels:

– Read Address Channel

and

Write Address Channel

• ARSNOOP [3:0] / AWSNOOP[2:0]

- Indicate the type of snoop transactions for shareable transactions

• ARBAR [1:0] / AWBAR [1:0]

- Are used for barrier signaling

• ARDOMAIN [1:0] / AWDOMAIN [1:0]

- Indicates which masters should be snooped for snoop transactions and which masters must be considered for ordering of barrier transactions

– Read Data Channel

and

Write Data Channel

• RRESP [3:2]

- Additional response bits, for shared read transactions that are indirectly driven by CRRESP outputs from a snooped master

• RACK / WACK

(79)



In ACE, snoop requests must be responded in-order (as it doesn’t have ID

signals)



The system interconnect is responsible for coordinating the progress of all

shared (coherent) transactions:

– e.g. The interconnect may present snoop addresses to all masters in parallel simultaneously, OR it may present snoop addresses one at a time serially – Access to system memory can be issued upon snoop-miss, or speculatively

before all snoop responses have arrived

– One example of such coherent interconnect is the ‘CCI-400 Interconnect’ developed by ARM

(80)



Interconnect:

– CCI (Cache Coherent Interconnect)



ACE Masters

– Masters with Caches



ACE-Lite Masters

– Components without caches snooping other caches



Slaves

– Components not initiating snoop transactions

Different kinds of Components

Example Cortex-A15 Coherent System with CCI-400 Interconnect

(81)

 ACE introduces a large number of new transactions to AMBA4.  Non-Shared Transactions

– These are the existing AXI read and write transactions – Used for non-coherent, non-snooped transactions

 Non-Cached Transactions – ReadOnce

– WriteUnique – WriteLineUnique

 Cache Maintenance Transactions – CleanShared

– CleanInvalid – MakeInvalid

 Shareable Read Transactions – ReadShared

– ReadNotSharedDirty

 Shareable Write Transactions – MakeUnique – ReadUnique – CleanUnique  Write-back Transactions – WriteBack – WriteClean – Evict

Transaction Groups

(82)



Initiating Master component issues a transaction



Depending on whether coherency support is required, either:

– The transaction is passed directly to a slave component

– The transaction is passed to the coherency support logic within the interconnect



Interconnect initiates the snoop transactions that are required



Each cached master that receives a snoop transaction provides a snoop

response.



The interconnect determines whether a main memory access is required



The interconnect collates snoop responses and any required data



The initiating master completes the transaction

(83)



Master Component issues a read transaction on Read-Address channel



Interconnect determines whether any other cache holds a copy of the

location, by Snooping:

– i.e. It passes the shareable address to other caching masters that can hold a copy, on the Snoop Address Channel

– If any snooped master holds the requested cache line, then it:

• Responds on the snoop response channel

• Provides the snoop-data to the interconnect on the snoop data channel

– If no snooped master component holds the requested cache line:

• The interconnect initiates a transaction to main memory,

• The read data is supplied back to the initiating master on the AXI Read Data channel, as for standard transactions

– The master component indicates that the transaction has completed, using the RACK signal

(84)



Initiating master component requests a unique copy of the cache-line by

issuing a ‘MakeUnique’ transaction on the AXI Read Address Channel

– Interconnect passes the transaction to other caches on the Snoop Address Channel

– Snooped masters respond on ‘Snoop Response Channel’ to indicate that the cache line has been removed from their local caches

– A response is provided to the initiating master, using AXI Read Data channel (no data transfer occurs)

– MakeUnique removes copies of the cache-lines from other Caching Masters



Initiating master performs the store using standard AXI write channels



Initiating master issues and RACK signal, to indicate that the transaction

has completed

(85)

(86)

 ARM Architecture supports 2 types of Barrier Instructions:

– DMB (Data Memory Barrier):

• The DMB transaction can flow on the pipelined interconnect but no re-ordering is allowed about the DMB. • Ensures that all memory transactions prior to the barrier are visible by other masters

• This prevents re-ordering about the DMB

• Everything before the DMB must be complete before anything after the DMB • This was ensured by the ARM MPCore processor cluster

• In ACE, the DMB Barriers may define a subset of masters that must be able to observe the barrier:

- This is indicated on the AxDOMAIN signals. These can indicate: Inner, Outer, System or Non-Shareable.

– DSB (Data Synchronization Barrier):

• DSB is used to stall the processor until previous transactions have completed

• Can be used for example to ensure data written to DMA command buffer in memory has reached its destination before kicking off the DMA via a peripheral register

• Is the most time-consuming barrier since it stops the processor until transactions are complete

 A master issues a Barrier on both: Read Address Channel and Write Address channel simultaneously using ARBAR and AWBAR signaling.

 A barrier transaction has an address phase and response phase, but no data transfer occurs.  Barriers enforce ordering because a master must not issue any read or write transaction

until the master has received a response for the barrier on both: read and write channels

(87)



Full-ACE Master:

– Contains all ACE Channels

– Can issue snoop requests and can be snooped by interconnect – e.g. ARM Cortex A15 Processor cluster



ACE-Lite Master:

– Does not include the AC, CR and CD channels

– But has the additional coherency signals on existing AXI channels – Can issue Snoop requests but it itself cannot be snooped

– E.g. a GPU or a Coherent I/O Device

(88)



ACE-Lite is a subset of ACE



Enables uncached masters to snoop ACE Coherent masters

– e.g. An AXI Master interface like GigabitEthernet that shares data

with CPU can directly read/write cached data shared within the CPU.



ACE-Lite masters have additional signals on AXI Channels, but do

not have the additional three ACE Snoop channels.

(89)



ACE protocol does not guarantee Coherency!

(90)



AMBA® AXI™ and ACE™ Protocol Specification (Issue E, Date 22

February 2013)

– http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0022e/index.html

(91)