AXI Overview
Advanced Microcontroller Bus Architecture
– On-chip bus protocol from ARM
• On-chip interconnect specification for the connection and management of functional blocks including processor and peripheral devices
– Introduced in 1996
– AMBA is a registered trademark of ARM Limited – AMBA is an open standard
This presentation outlines the specific topics/sections that need to be
understood.
For each of the topic corresponding section number in the AMBA®
AXI™ and ACE™ Protocol Specification (Issue E, Date 22 February 2013)
is provided.
This has been divided into 3 parts:
– Part A: AMBA AXI3 and AXI4 Protocol Specifications – Part B: AMBA AXI4-Lite
– Part C: ACE Protocol Specification
Please go through the details in the AMBA specification.
© 2010 Wipro Ltd - Confidential
5 © 2010 Wipro Ltd - Confidential 5
Introduction to AXI Protocol – Features (A1.1) – Revisions (A1.2)
– AXI Architecture (A1.3) Signal Descriptions
– Global Signals (A2.1)
– Write Address Channel Signals (A2.2) – Write Data Channel Signals (A2.3) – Write Response Channel Signals (A2.4) – Read Address Channel Signals (A2.5) – Read Data Channel Signals (A2.6) – Low Power Interface Signals (A2.7) Signal Interface Requirements
– Clock and Reset (A3.1)
– Basic read and write transactions (A3.2) – Relationship between channels (A3.3) – Transaction Structure (A3.4)
Transaction Attributes
– Transaction Attributes (A4.1)
– AXI3 memory attribute signaling (A4.2)
– AXI4 changes to memory attribute signaling (A4.3) – Memory Types (A4.4)
– Mismatched memory attributes (A4.5) – Transaction Buffering (A4.6)
– Access Permissions (A4.7) – Legacy Considerations (A4.8)
Multiple Transactions
– AXI Transaction Identifiers (A5.1) – Transaction ID (A5.2)
– Transaction Ordering (A5.3)
– Removal of Write Interleaving Support (A5.4) AXI4 Ordering Model
– Definition of the ordering model (A6.1) – Master Ordering (A6.2)
– Interconnect Ordering (A6.3) – Slave Ordering (A6.4)
– Response before final destination (A6.5) – Ordered write observation (A6.6) Atomic Accesses
– Single Copy Atomicity Size (A7.1) – Exclusive Accesses (A7.2)
– Locked Accesses (A7.3) – Atomic Access Signaling (A7.4) AXI4 Additional Signalling
– QoS Signaling (A8.1)
– Multiple Region Signaling (A8.2) – User defined Signaling (A8.3) Low Power Interface
– About Low Power interface (A9.1) – Low Power Clock Control (A9.2) Default signaling and Interoperability
– Interoperability principles (A10.1) – Major interface categories (A10.2) – Default Signal Values (A10.3)
Definition of AXI4-Lite (B1.1)
Interoperability (B1.2)
Defined Conversion Mechanism (B1.3)
Conversion, Protection and Detection (B1.4)
About ACE (C1)
Signal Descriptions (C2)
Channel Signaling (C3)
Coherency Transactions on Read and Write Address Channels (C4)
Snoop Transactions (C5)
Interconnect Requirements (C6)
Cache Maintenance (C7)
Barrier Transactions (C8)
Exclusive Accesses (C9)
Optional External Snoop Filtering (C10)
ACE-Lite (C11)
Distributed Virtual Memory (C12)
Interface Control (C13)
Master Design Recommendations (C14)
Part A
AMBA AXI protocol is targeted at high-performance, high-frequency system designs
AXI key features
– Support for separate channels for:
• Read Address • Read Data • Write Address • Write Data and • Write Response
– Support for unaligned data transfers using byte strobes – Ability to issue multiple outstanding addresses
– Out of order (OO) transaction completion – Support for data interleaving
– Advanced system cache support
• Specify if transaction is cacheable/bufferable • Specify attributes such as write-back/write-through
– Enhanced protection support
• Secure/non-secure transaction specification
– Exclusive access (for semaphore operations)
– Register slice can be easily added for timing-closure
Read address channel and Write address channel
– Conveys address and other control information – Variable length burst: 1 ~ 16 data transfers
• Exception: In AXI4, INCR bursts can have lengths upto 256 transfers.
– Burst with a transfer size of 8 ~ 1024 bits (i.e.1Byte ~ 128Bytes)
Read data channel
– Convey data and any read response info.
– Data bus can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide – Read response is signaled per transfer.
Write data channel
– Data bus can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide
Write response channel
– Write response info, signaled for the entire burst.
NOTES:
Each channel is independent and uses a 2-way flow-control.
AMBA AXI Read Channels
AMBA AXI Read Channels
Give me some data
AXI Read Channels
Give me some data
Here you go
AXI Read Channels
Give me some data
Here you go
Independent
channels synchronized with ID # or “tags”
AXI Write Channels
Independent
AXI Write Channels
I’m sending data. Please store it.
Independent
AXI Write Channels
I’m sending data. Please store it.
Here is the data. Independent
AXI Write Channels
I’m sending data. Please store it.
Here is the data.
I received that data correctly.
Independent
AXI Write Channels
I’m sending data. Please store it.
Here is the data.
I received that data correctly.
Independent
Independent
channels synchronized with ID # or “tags”
AXI Flow-Control
•
AXI uses a valid/ready
handshake acknowledge
•
Each channel has its own
valid/ready
•
Information moves only when:
– Sourcehas Valid information and – Destination is Ready•
On each channel the master or
slave can limit the flow
• Flexible signaling functionality
– Inserting wait states
– Always Ready
– Same Cycle Acknowledge
Inserting Wait States
Always Ready
AXI Flow-Control
Transfer Inserting Wait States
Always Ready
Same Cycle Acknowledge
•
AXI uses a valid/ready
handshake acknowledge
•
Each channel has its own
valid/ready
•
Information moves only when:
– Sourcehas Valid information and – Destination is Ready•
On each channel the master or
slave can limit the flow
• Flexible signaling functionality
– Inserting wait states
– Always Ready
AXI Read
Read Address Channel
Read Burst Operation
Read request is initiated Read request is accepted Slave is ready 1 st datais transferred The last data is transferred
Separation of address and data channel
– Master provides only the start address of burst
– Slave needs to generate the remaining addresses based on
burst type (FIXED, INCR, WRAP)
One Address for Burst
A21 A11 D21 D11 D12 D13 D14 A31 D22 D23 D31 ADDRESS DATA
Overlapping Read Bursts
Read request A
is accepted Read request B
is accepted via AR channel
while data A(0) is
AMBA AXI Write
Write Address Channel
Write Response Channel
Write Data Channel
Write Burst Operation
Write request A
To prevent a deadlock situation, you must observe
the dependencies that exist between the handshake
signals
In any transaction:
– The VALID signal of one AXI component must not
be dependent on the READY signal of the other
component in the transaction
– The READY signal can wait for assertion of the VALID
signal
Dependencies between Channel Handshake
Signals (AXI3)
The AXI3 protocol requires that the write response for all transactions
must not be given until the clock cycle after the acceptance of the last data
transfer
In addition, the AXI4 protocol requires that the write response for all
transactions must not be given until the clock cycle after address
acceptance
Dependencies between Channel Handshake
Signals (AXI4)
AXI3
AXI4
WLAST
AXI gives an ID tag to every transaction
Use of IDs
Write Address Channel Write Data Channel Write Response Channel Read Address Channel Read Data Channel
Real implementation
– Transaction ID = <master ID, channel ID> – Channel ID = original AXI transaction id
– Master ID is needed to identify the initiating master among all the masters
Transaction ID Implementation
CPU DecodeVideo r
3D Graphic
s
LCD
Control Process Video Mixer DMA
Interconnect
Memory Controller
ID: 3 ID: 2 ID: 3 ID: 0 ID: 3 ID: 2 ID: 4
Multiple Outstanding Addresses:
– By using IDs, a master can issue transactions without waiting for earlier transactions to complete.
Write Data Interleaving in AXI3 Slaves:
– With Write Data Interleaving, an AXI3 slave can accept interleaved write-data with different AWID values.
– This feature is not supported in AXI4
• All Write Data for a transaction must be provided in consecutive transfers on the write data channel.
• WID signal is not supported in AXI4
Out of Order completion
– Transactions with the same ID are completed in order
– Transactions with different IDs can be completed out of order
– Fast-responding slaves respond in advance of earlier transactions with slower slaves
– This is not a required feature. Simple masters and slaves can process one transaction at a time in the order they are issued
Ordering by transaction ID
– Slave can handle data transfers with different transaction IDs
out-of-order
– The order within a single burst is maintained
Out-of-Order Transaction
A11 D11 D12 D13 D14 A21 D21 D22 D23 ADDRESS RDATA A31 D31
Write request and data
– The write data can appear at an interface before the write
address that relates to it
Two relationships that must be maintained are:
– Read data must always follow the address to which the
data relates
– A write response must always follow the last write
transfer in the write transaction to which the write
response relates
No ordering restrictions between read and write transactions with the
same AWID and ARID. If a master requires an ordering restriction then
it must enforce the ordering.
The data for a sequence of read
transactions with the same
ARID must be returned in
order that:
– When reads with the same ARID
are from the same slave then the
slave must ensure that the read
data is returned in the same order
that the addresses are received
– When reads with the same ARID
are from different slaves, the
interconnect must ensure that
the read data is returned in the
same order that the master issued
the addresses in.
Ordering Rules #3: Multiple reads with same ARID
Slave IP Master IP Slave IP 1 Slave IP 2 Inter- connect Master IP
Interleaving rule
– Write Data with different ID can be interleaved.
– This is supported only in AXI3
– The order within a single burst is maintained
Ordering Rules #4: Write Interleaving
A11 D11 D12 D13 D14 ADDRESS WDATA A21 D21 D22 D23 A31 D31
No support of write interleaving in AXI4
Master must ensure same order for write data as that of address
Removal of WID in AXI4, why?
– Write data with different AWIDs follow their address order + no write interleaving no need of WID!
– Responses to multiple writes with different IDs can be out-of-order from address order BID remains!
OKAY
◦ Normal access success/Exclusive access failure/Exclusive access to non-supporting Slave
EXOKAY
◦ Exclusive access success
SLVERR
◦ Slave generates error response/unsupported transfer size/WR access to RO/timeout condition in slave/access to address where no register
present/access to disabled or powered-down function
DECERR
◦ Can not Decode Slave Access then default slave gives DECERR
(Note: For a write transaction, there is just one response given for the entire
burst and not for each data transfer within the burst. In a read transaction,
the slave can give different responses for different transfers within a
burst.)
AXI enables the insertion of a register slice in any channel at the cost of
an additional cycle latency
– Trade-off between latency and maximum frequency
Register slice can be used at any channel independently Register slice incurs one cycle latency per insertion
Register Slice for Timing Isolation
AXI
Master AXI Slave
Write data WREADY Read data Response BREADY Write Address/Control AWREADY Read Address/Control ARREADY
A single clock signal, ACLK.
◦ All input signals sampled on the rising edge of ACLK. All output signal changes must occur after the rising edge of ACLK.
◦ Must be no combinatorial paths between input and output signals on both master and slave interfaces.
Single Active low reset ARESETn.
Transaction burst type determines address bus behavior
– Fixed, increment, or wrap
Unaligned Access
◦ Master uses lower address bits; byte lane strobe must be consistent to lower address bits information
Optional address Lock signals facilitates exclusive and atomic access
protection
System cache support
Protection unit support
In a Narrow Transfer, the address and control (WSTRB) determine which byte lanes the transfer uses.
WSTRB[n:0] signals, when high specify which byte-lanes are used Example 1: A narrow transfer with 8-bit transfers
– Burst has 5 transfers – Starting address is 0 – Transfer size is 8-bits – Data bus-width is 32-bit – Burst type is INCR
Example 2: A narrow transfer with 32-bit transfers
– Burst has 3 transfers – Starting address is 4 – Transfer size is 32-bit – Data bus-width is 64-bit – Burst type is INCR
AXI Support Unaligned Transfers
– Unaligned Transfer is a transfer in which the 1st byte accessed is
unaligned with the natural address boundary
– e.g. A 32-bit transfer that starts at address 0x1002 is not aligned to the natural 32-bit address boundary
Master can:
– Use low-order address lines to signal an unaligned start address OR
– Provide an aligned address and use byte-lane strobes to signal the unaligned start address
Unaligned Transfer (Cont’d)
Wrapping burst case
– The wrap boundary is aligned to the total size of the data to be transferred
• That is, to ( (size of each transfer in burst) x (number of transfers in burst) )
– After each transfer, the address increments same as for INCR Burst – If incremented address is ( (wrap boundary) + ( total size of data to be
transferred) ), then the address wraps around to wrap-boundary.
AR(W)LEN[7:0] allows INCR burst of 256 beats
Burst in AXI3 protocol:
– Early termination of bursts is not supported.
– A burst must not cross a 4-kbyte boundary. This ensures that a burst is
only destined for a single slave.
AXI4 protocol longer burst support:
– Bursts longer than 16 beats are only supported for the INCR burst
type. Both WRAP and FIXED burst types remain constrained to a
maximum burst length of 16 beats.
– Exclusive accesses are not permitted to use a burst length greater than
16.
AxCACHE[3:0] signals define the
transaction attributes of a
transfer
Transaction attributes control:
– How a transaction progresses
through the system
– How any system-level cache handles
the transaction
System Cache Support
ARCACHE[3:0] / AWCACHE[3:0]
[AXI3] L1 Data Cache L1 Instruction Cache Unified L2 Cache RF Memory Memory Memory Memory CPU Bufferable bit (B): AxCACHE[0]
– The interconnect or any component can delay the transaction for any number of cycles. This is usually only relevant to writes.
– Transaction response may not be from the final destination, but from the intermediate point, like Cache. The cache is then responsible to update the memory.
Cacheable bit (C): AxCACHE[1]
– The characteristics of the transaction at the final destination does not have to match the characteristics of the original transaction.
– For writes this means that a number of different writes can be merged together.
– For reads this means that a location can be pre-fetched or can be fetched just once for multiple read transactions.
– To determine if a transaction should be cached this bit should be used in conjunction with the
Read Allocate (RA) and Write Allocate (WA) bits.
Read Allocate bit (RA): AxCACHE[2]
– If ‘1’, the transaction should be looked-up in the cache
– In case of read miss, it is recommended to allocate an entry in the cache – If C=low, RA=low
Write Allocate bit (WA): AxCACHE[3]
– If ‘1’, the transaction should be looked-up in the cache
– In case of write miss, it is recommended to allocate an entry in the cache – If C=low, WA=low
System Cache Support
ARCACHE[3:0] / AWCACHE[3:0]
The AxCACHE[1] bits are renamed:
– From ‘Cacheable’ to ‘Modifiable’ – to better describe the required functionality – Actual Functionality is unchanged
Ordering requirements are defined for Non-Modifiable transactions
– Ordering between transactions should be maintained, if the transactions satisfy all of the following conditions:
• Transactions are Non-Modifiable • Transactions use the same ID
• Transactions target the same slave device
Meanings of RA and WA bits are updated:
– One bit indicates if an this transaction should be allocated in Cache
– Other bit indicates if an allocation could have been made due to another transaction
For Read Transactions:
– RA bit means the same: The location could have been previously allocated in the cache. It is recommended that this transaction is allocated in cache. – WA bit is redefined: The location could have been previously allocated in the
cache because of other transaction – Either Write transaction or Transaction by other master
For Write Transactions:
– WA bit means the same: The location could have been previously allocated in the cache. It is recommended that this transaction is allocated in cache.
– RA bit is redefined: The location could have been previously allocated in the cache because of other transaction – Either a Read transaction or Transaction by other master
This change means:
– For a same location, a read and a write transfer may have different values for
Normal or Privileged Mode: AxPROT[0]
◦ Indicates whether an access was done by a Master in Privilege Mode or
in Unprivileged Mode
◦ LOW indicates an access done by a Master in Unprivileged Mode
◦ HIGH indicates an access done by a Master in Privileged Mode
A privileged processing mode typically has a greater level of access within a system.
Secure or Non-secure: AxPROT[1]
◦ LOW indicates an Secure access
◦ HIGH indicates an Non-secure access
Used where a greater degree of differentiation between processing modes is required.
Instruction or data, AxPROT[2]
◦ LOW indicates a data access
◦ HIGH indicates an instruction access.
Protection Support
Normal access, AxLOCK[1:0]=b00 Exclusive access, b01
– Exclusive read – Exclusive write
– If no intervening write to the address region, EXOKAY response. If not, OKAY response.
– Usually used for read-modify-write
Locked access, b10
– Start with b10, and end with b00 – During the period, only the lock
initiating master can access the address region
Atomic Access
Semaphore type operation without requiring bus to remain locked to a
particular master for the duration of the operation
Usually used for read-modify-write kind of operations
Slave must have additional logic to support exclusive access.
The basic process for an exclusive access is:
– A master performs an exclusive read from an address location.
– At some later time, the master attempts to complete the exclusive operation by performing an exclusive write to the same address location.
– The exclusive write access of the master is signaled as:
• Successful (EXOKAY) if no other master has written to that location between the read and write accesses.
• Failed (OKAY) if another master has written to that location between the read and write accesses. In this case the address location is not updated
Exclusive Access
time E.RD 0x100 WR 0x100 Master 1 Master 2 E.WR 0x100 Master 1 OKAY Slave 1
Interconnect must ensure that only that master is allowed access to the
slave region until an unlocked transfer from the same master completes
Master should have no other outstanding transactions waiting to complete
before issuing locked sequence
Final transaction effectively removes the lock
Locked Access
[AXI3]time 0x100 0x100 Master 1 Master 2 0x100 Master 1 Lock Unlock
No support of locked access
All locked accesses from AXI3 masters need to be converted to normal
accesses
Quality of Service Signaling (AxQOS[3:0])
– AxQOS 4-bit signals sent on address channel for each transaction – This protocol does not specify exact use of QoS identifiers
– Recommendation: Can be used as a priority indicator for that transaction – Default value of b0000 indicates no participation in QoS scheme
Region Identifier Signals ( AxREGION[3:0] ):
– 4-bit signals can uniquely identify upto 16 different regions
– The region identifier provides a decode of higher order address bits
– Using regional identifiers, a single Physical Interface on a slave can mimic multiple (upto 16) logical interfaces, each with a different location in the system address map
– Interconnect should produce AxREGION signals when performing the address decode function for a signle slave that has multiple logical interfaces
User Signals on each AXI Channel for ‘User Defined’ Signaling
– ( AWUSER, WUSER, BUSER, ARUSER, RUSER )
– Specification recommends not to use them, to avoid interoperability issues
© 2010 Wipro Ltd - Confidential
62 © 2010 Wipro Ltd - Confidential 62
Optional Extension to AXI protocol
– Uses 3 level signals for handshake between the system-clock-controller and the peripheral
Signals:
– CACTIVE: (driven by peripheral)
• High => Peripheral requires a clock signal. Clock-Controller must enable the clock immediately.
• Low => Peripheral does not require the clock
– CSYSREQ: (driven by clock-controller)
• Low => Request for the peripheral, to enter a low-power state • High => Request for the peripheral, to exit a low-power state
– CSYSACK: (driven by peripheral)
• Low => Low-power entry request acknowledged by peripheral • High => Low-power exit request acknowledged by peripheral
The peripheral can accept or deny the request, from the
system-clock-controller, to enter low-power state.
The level of the ‘CACTIVE’ signal when the peripheral acknowledges the
request by driving CSYSACK low indicates the acceptance or denial of the
request.
Low Power Interface (C channel)
Acceptance of low-power request
Additional QoS Signaling: AxQOS[3:0]
Additional 4-bit interface signals AxREGION
◦ allows 16 different regions to be uniquely identified
◦ Region identifier should be constant in 4kB address space
Added USER signals
– AxUSER, RUSER,WUSER,BUSER
Removes support for locked transfers so AxLOCK signal is single bit (Normal/Exclusive)
Removal of Write Interleaving support
– Removes WID signal
Write response requirements are updated:
◦ AXI3: clock cycle after last data transfer ◦ AXI4: clock cycle after address acceptance
Support of upto 256beats of burst lengths (for INCR bursts)
AWCACHE and ARCACHE signaling is updated
Part B
AXI4-Lite is a subset of the AXI4 protocol intended for communication
with simpler, smaller control-register style interfaces in components.
AXI4-Lite is a simpler AXI4 for onchip devices requiring a more powerful
interface than APB.
Features:
All transactions with burst length of 1
all data accesses are the same size as the width of the data bus support for data bus width of 32-bit or 64-bit
all accesses are equivalent to AWCACHE or ARCACHE equal to b0000
(i.e. non-modifiable and non-bufferable) exclusive accesses are not supported.
AXI IDs not supported – All transactions must be in order So signal list reduced
Subset of AXI signal set
Simple traditional signaling
Targeted applications: simple, low-performance peripherals
– GPIO – Uart
Signals not-supported in AXI-Lite:
– AWLEN, ARLEN – AWSIZE, ARSIZE – AWBURST, ARBURST – AWLOCK, ARLOCK – AWCACHE, ARCACHE – WLAST, RLAST
Part C
Two problems for systems that
contain caches:
1) Memory may be updated (by another master) after a cached master has taken a copy
• The cache no-longer contains up-to-date data
2) In systems that contain write-back caches, if the master writes to local cached copy
• The memory no-longer contains up-to-date data.
• A 2nd master reading from
memory will see stale data.
Coherency Problem
Interconnect Master1 Master2 Cache Cache Main Memory Master3 Cache
Snooping Cache Coherency Protocols
– Transactions to a shared-region are ‘broadcast’ to all masters
– All masters ‘listen-in’ to all shared data-transactions originating from other masters
– When the master detects a read transaction for which it has the most up-to-date data, it provides the data to the other master requesting it; or in the case of a write, it invalidates it’s own copy.
Directory based Cache Coherency Protocols
– A single ‘directory’ is maintained, which contains a list of where every cached line within the system is held.
– A master initiating a transaction first consults the directory to find where the data is cached and then directs cache coherency traffic to only those masters containing cached copies.
ACE is an extension to AXI
Aims at providing Hardware based cache coherency
Adds 3 new Snoop Channels:
Adds additional signal to existing AXI channels
Also adds barrier support to enforce ordering of multiple outstanding
transactions
ACE Protocol is realized using:
– A 5-State Cache Model to define the state of a Cache Line in the coherent system – Additional Snoop Channels that enable communication with a cached master when
another master is accessing a shared address location
• Read Channels: (AR, R) • Write Channels: (AW, W, B) • Snoop Channels: (AC, CR, CD)
– Additional Signaling on the existing AXI4 channels that enables new transactions and information to be conveyed
ACE Supported Policies
– 100% Snoop – Directory Based
– Anything in-between (Snoop Filter)
ACE adds following to the AXI
– Support for hardware coherency – Support for cache maintenance
Terms used to describe the state of a cache line are:
Unique:
– The cache line resides ONLY in this cache
Shared
– The cache line MAY be in other cache
Clean
– The cache controller does not have to update the main memory
Dirty
– The cache controller is responsible to update the main memory
Invalid
– The cache line is not being used for caching data
Devices are not required to
support all 5 states internally
Non-Shareable
– The domain contains a single master
Inner Shareable
– The domain can include additional masters
Outer Shareable
– The domain contains at least all masters in the inner domain – Can include additional masters
System
– This domain includes all masters in the systems
For Coherency Transactions:
– A master uses a shareability domain to determine which
other masters might have a copy of the addressed location in their local cache
– Interconnect uses this
information to determine, for any given transaction, which other masters must be snooped
For Barrier Transactions:
– The domain of a barrier transaction can be used to determine how far a barrier transaction must propagate
Snoop Channels enable communication with a cached master when another master is accessing a shared address location
AC Channel (Coherent Address Channel): Input to Master
– ACADDR used for sending the address of snoop request to a cached master, accompanied with control signals
CR Channel (Coherent Response Channel): Output from Master
– CRRESP is used by the master to signal the responses to snoop to the interconnect
– A narrow, 5-bit response indicating whether an associated data transfer is expected on the CD channel
CD Channel (Coherent Data Channel): Output from Master
– CDDATA, used by the master to provide the data in response to a snoop.
– Optional for write-through caches
ACE adds additional signals to existing AXI Channels:
– Read Address Channel
and
Write Address Channel
• ARSNOOP [3:0] / AWSNOOP[2:0]
- Indicate the type of snoop transactions for shareable transactions
• ARBAR [1:0] / AWBAR [1:0]
- Are used for barrier signaling
• ARDOMAIN [1:0] / AWDOMAIN [1:0]
- Indicates which masters should be snooped for snoop transactions and which masters must be considered for ordering of barrier transactions
– Read Data Channel
and
Write Data Channel
• RRESP [3:2]
- Additional response bits, for shared read transactions that are indirectly driven by CRRESP outputs from a snooped master
• RACK / WACK
In ACE, snoop requests must be responded in-order (as it doesn’t have ID
signals)
The system interconnect is responsible for coordinating the progress of all
shared (coherent) transactions:
– e.g. The interconnect may present snoop addresses to all masters in parallel simultaneously, OR it may present snoop addresses one at a time serially – Access to system memory can be issued upon snoop-miss, or speculatively
before all snoop responses have arrived
– One example of such coherent interconnect is the ‘CCI-400 Interconnect’ developed by ARM
Interconnect:
– CCI (Cache Coherent Interconnect)
ACE Masters
– Masters with Caches
ACE-Lite Masters
– Components without caches snooping other caches
Slaves
– Components not initiating snoop transactions
Different kinds of Components
Example Cortex-A15 Coherent System with CCI-400 Interconnect
ACE introduces a large number of new transactions to AMBA4. Non-Shared Transactions
– These are the existing AXI read and write transactions – Used for non-coherent, non-snooped transactions
Non-Cached Transactions – ReadOnce
– WriteUnique – WriteLineUnique
Cache Maintenance Transactions – CleanShared
– CleanInvalid – MakeInvalid
Shareable Read Transactions – ReadShared
– ReadNotSharedDirty
Shareable Write Transactions – MakeUnique – ReadUnique – CleanUnique Write-back Transactions – WriteBack – WriteClean – Evict
Transaction Groups
Initiating Master component issues a transaction
Depending on whether coherency support is required, either:
– The transaction is passed directly to a slave component
– The transaction is passed to the coherency support logic within the interconnect
Interconnect initiates the snoop transactions that are required
Each cached master that receives a snoop transaction provides a snoop
response.
The interconnect determines whether a main memory access is required
The interconnect collates snoop responses and any required data
The initiating master completes the transaction
Master Component issues a read transaction on Read-Address channel
Interconnect determines whether any other cache holds a copy of the
location, by Snooping:
– i.e. It passes the shareable address to other caching masters that can hold a copy, on the Snoop Address Channel
– If any snooped master holds the requested cache line, then it:
• Responds on the snoop response channel
• Provides the snoop-data to the interconnect on the snoop data channel
– If no snooped master component holds the requested cache line:
• The interconnect initiates a transaction to main memory,
• The read data is supplied back to the initiating master on the AXI Read Data channel, as for standard transactions
– The master component indicates that the transaction has completed, using the RACK signal
Initiating master component requests a unique copy of the cache-line by
issuing a ‘MakeUnique’ transaction on the AXI Read Address Channel
– Interconnect passes the transaction to other caches on the Snoop Address Channel
– Snooped masters respond on ‘Snoop Response Channel’ to indicate that the cache line has been removed from their local caches
– A response is provided to the initiating master, using AXI Read Data channel (no data transfer occurs)
– MakeUnique removes copies of the cache-lines from other Caching Masters
Initiating master performs the store using standard AXI write channels
Initiating master issues and RACK signal, to indicate that the transaction
has completed
ARM Architecture supports 2 types of Barrier Instructions:
– DMB (Data Memory Barrier):
• The DMB transaction can flow on the pipelined interconnect but no re-ordering is allowed about the DMB. • Ensures that all memory transactions prior to the barrier are visible by other masters
• This prevents re-ordering about the DMB
• Everything before the DMB must be complete before anything after the DMB • This was ensured by the ARM MPCore processor cluster
• In ACE, the DMB Barriers may define a subset of masters that must be able to observe the barrier:
- This is indicated on the AxDOMAIN signals. These can indicate: Inner, Outer, System or Non-Shareable.
– DSB (Data Synchronization Barrier):
• DSB is used to stall the processor until previous transactions have completed
• Can be used for example to ensure data written to DMA command buffer in memory has reached its destination before kicking off the DMA via a peripheral register
• Is the most time-consuming barrier since it stops the processor until transactions are complete
A master issues a Barrier on both: Read Address Channel and Write Address channel simultaneously using ARBAR and AWBAR signaling.
A barrier transaction has an address phase and response phase, but no data transfer occurs. Barriers enforce ordering because a master must not issue any read or write transaction
until the master has received a response for the barrier on both: read and write channels
Full-ACE Master:
– Contains all ACE Channels
– Can issue snoop requests and can be snooped by interconnect – e.g. ARM Cortex A15 Processor cluster
ACE-Lite Master:
– Does not include the AC, CR and CD channels
– But has the additional coherency signals on existing AXI channels – Can issue Snoop requests but it itself cannot be snooped
– E.g. a GPU or a Coherent I/O Device
ACE-Lite is a subset of ACE
Enables uncached masters to snoop ACE Coherent masters
– e.g. An AXI Master interface like GigabitEthernet that shares data
with CPU can directly read/write cached data shared within the CPU.
ACE-Lite masters have additional signals on AXI Channels, but do
not have the additional three ACE Snoop channels.
ACE protocol does not guarantee Coherency!
AMBA® AXI™ and ACE™ Protocol Specification (Issue E, Date 22
February 2013)
– http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0022e/index.html