• No results found

Design of a Direct Memory Access Controller for a Cortex-M0 based System on Chip.

N/A
N/A
Protected

Academic year: 2020

Share "Design of a Direct Memory Access Controller for a Cortex-M0 based System on Chip."

Copied!
56
0
0

Loading.... (view fulltext now)

Full text

(1)

ABSTRACT

SIDDALINGADEVARU, RESHMA BANGALORE. Design of a Direct Memory Access Controller for a Cortex-M0 based System on Chip. (Under the direction of Dr. W. Rhett Davis.)

The design of a Direct Memory Access Controller (DMAC) for a Cortex-M0 Processor

based System on Chip (SoC) has been illustrated in this document. The DMAC designed

in this project can handle memory to memory data transfers of multiple lengths. A

per-formance improvement of 37% was obtained when the DMAC was used to perform data

transfers. A speedup of 1.8 was observed per word transfer when the DMAC performed the

data transfer as against the CPU performing the transfer. The design was synthesized at

10 ns clock and the DMAC’s functionality was ensured with the synthesized netlist. Logic

Synthesis was carried out at Register Transfer Level at 45 nm technology with a 10 ns clock yielding a total synthesized area of 3173 um2 for the DMAC. Communication was

established between the Processor, Memory Controller and the DMAC using AHB-Lite

(2)

© Copyright 2016 by Reshma Bangalore Siddalingadevaru

(3)

Design of a Direct Memory Access Controller for a Cortex-M0 based System on Chip

by

Reshma Bangalore Siddalingadevaru

A thesis submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Master of Science

Computer Engineering

Raleigh, North Carolina

2016

APPROVED BY:

(4)

DEDICATION

(5)

BIOGRAPHY

Reshma Bangalore Siddalingadevaru was born and brought up in Bangalore, India. She

earned her Bachelor of Engineering degree in Electronics and Communication from RV

College of Engineering in 2014. She joined the Masters program in Computer Engineering

at North Carolina State University in Fall 2014. During the program, she spotted her true

interests in Digital Design and Verification. She started working on her Master’s thesis with

(6)

ACKNOWLEDGEMENTS

I would like to take this opportunity to express my gratitude towards everyone who helped

me throughout my project.

First of all, I would like to thank Dr. Rhett Davis for advising me through my thesis. His

guidance and support were invaluable in the completion of this project. I have been very

lucky to have taken two courses and a thesis under him as I have learnt a lot through the

process. I would also like to thank him for funding me in my last semester.

I would like to thank Dr. Paul Franzon for teaching ASIC Design and inspiring many

people like me to pursue a career in Digital Design. I would also like to extend my sincere

gratitude to Dr. Huiyang Zhou for teaching me Computer Architecture which is one of

the best courses I studied. I would like to thank you professors, for also serving on my

thesis committee. I would like to thank all my professors for making these two years highly

rewarding for me.

I would like to thank my friends for their inputs and support throughout the project.

Last, but not the least, I would like to thank my family for having faith in me to complete

(7)

TABLE OF CONTENTS

LIST OF TABLES . . . vii

LIST OF FIGURES. . . viii

Chapter 1 INTRODUCTION . . . 1

1.1 Motivation . . . 1

1.2 Goal . . . 2

1.3 Thesis Outline . . . 2

Chapter 2 SYSTEM OVERVIEW AND SPECIFICATION. . . 3

2.1 System Overview . . . 3

2.2 Master Interface . . . 4

2.3 Slave Interface . . . 6

2.4 System Specifications . . . 7

Chapter 3 DMAC DESIGN PROCEDURE. . . 8

3.1 DMAC Unit Overview . . . 8

3.2 DMAC Unit Modules . . . 9

3.2.1 Arbiter . . . 9

3.2.2 DMA-Slave . . . 13

3.2.3 DMA-Master . . . 14

3.3 Critical Path . . . 22

Chapter 4 STUB INTERFACE WITH THE DMAC . . . 23

4.1 Interface Procedure . . . 24

4.2 Limitation of the DRAM Controller . . . 24

4.3 Test Cases . . . 24

4.3.1 Only DMA transactions, no DRAM transactions . . . 25

4.3.2 DMA and DRAM transactions . . . 28

4.4 Synopsis . . . 33

Chapter 5 CORTEX-M0 INTERFACE AND SYNTHESIS . . . 34

(8)

Chapter 6 Conclusion . . . 44 6.1 Future Scope . . . 45

(9)

LIST OF TABLES

Table2.1 Description of Global AHB-Lite signals . . . 4

Table2.2 Description of AHB-Lite Master signals . . . 5

Table2.3 Description of AHB-Lite Slave signals . . . 7

Table3.1 Arbiter Interface . . . 12

Table3.2 DMA-Slave Interface . . . 13

Table3.3 DMA-Master Controller Interface . . . 16

(10)

LIST OF FIGURES

Figure2.1 System Overview . . . 4

Figure3.1 Detecting DMA transaction . . . 10

Figure3.2 Detecting DRAM transaction . . . 11

Figure3.3 DMA Slave Functionality . . . 14

Figure3.4 Block diagram of DMA-master . . . 15

Figure3.5 Control-path of the DMA-master . . . 17

Figure3.6 Source Address Generation . . . 18

Figure3.7 HADDR Generation . . . 19

Figure3.8 HWDATA Generation . . . 19

Figure3.9 HWRITE Generation . . . 20

Figure3.10 HTRANS Generation . . . 20

Figure4.1 Text file with only DMA transactions . . . 25

Figure4.2 Transaction dump when only DMAC transactions are issued by the AHBStub . . . 26

Figure4.3 DMA transfer starts after the DMAC registers are written to . . . 27

Figure4.4 DMA transfer mechanism . . . 28

Figure4.5 Text file with DRAM and DMA transactions . . . 29

Figure4.6 Transaction dump when both DRAM and DMAC transactions are issued . . . 30

Figure4.7 Execution of DRAM transactions before the DMA transfer . . . 31

Figure4.8 End of DMA transfer . . . 32

Figure4.9 Execution of DRAM transactions after the DMA transfer . . . 33

Figure5.1 Snippet of the C program when DMAC handles the transfer . . . 36

Figure5.2 Transaction dump of the first DMA transfer . . . 38

Figure5.3 Transaction dump of the second DMA transfer . . . 39

Figure5.4 Snippet of the C program when CPU handles the transfer . . . 41

(11)

CHAPTER

1

INTRODUCTION

1.1

Motivation

The advancements in semiconductor industry have made it possible for an entire electronic

system to fit on a single die. In this digital era of smartphones and tablets, it is necessary to

build System-on-Chips (SoC) which are powerful computers but in a very compact form.

Due to high level of integration, SoCs also reduce power consumption.

Due to the increased demand for SoC design and verification in industries, many leading

(12)

1.2. GOAL CHAPTER 1. INTRODUCTION

DMA Controller and this was the missing piece which I contributed through this project.

DMAC is one of most important components of an SoC. It bypasses the processor and

routes data directly between external interfaces and memory, thereby increasing the data

throughput of the SoC. The Central Processing Unit (CPU) initiates a DMA transfer and

continues with other operations while the transfer is in progress and receives a signal from

the DMAC when the operation is done[5]. This feature is very useful when the CPU cannot keep up with the rate of data transfer, or when the CPU needs to perform useful work while

waiting for a relatively slow I/O data transfer. The DMAC designed in this project can handle memory to memory data transfers.

1.2

Goal

The goal of this project was to design a Direct Memory Access Controller for a

Cortex-M0 Processor based SoC. The DMAC designed was capable of performing memory to

memory transfers. Various test cases were simulated to ensure successful DRAM and DMA operations. Performance analysis was done to demonstrate the efficiency of the DMAC.

The complete design was synthesized with no errors. Two AHB-Lite interconnects were

used to interface the DMAC and the DRAM controller to the processor.

1.3

Thesis Outline

The rest of this report is organized as follows. Chapter 2 discusses the AHB Lite Protocol

Interview. Chapter 3 describes the procedure to design the DMAC unit. Chapter 4 discusses the interface of AHBStub and DMAC to the DRAM Controller. Chapter 5 discusses the

interface of the Cortex-M0 processor and DMAC to the DRAM Controller and Synthesis.

(13)

CHAPTER

2

SYSTEM OVERVIEW AND SPECIFICATION

2.1

System Overview

AMBA High Performance Bus(AHB) Lite protocol was used to establish communication

between the components of the SoC. It is a bus interface that supports a single master and

enables high bandwidth operation[1]. A single AHB-Lite bus was initially used, but the arbitration mechanism became very complicated as the bus allows only a single master and the CPU and the DMAC have to be the AHB-master while initiating a transaction. Another

(14)

2.2. MASTER INTERFACE CHAPTER 2. SYSTEM OVERVIEW AND SPECIFICATION

slave. In the second subsystem, DMAC is the AHB-Lite master and the DRAM Controller is

the Lite slave. For convenience, the buses of the two systems will be referred to as

AHB-Lite-1 and AHB-Lite-2 respectively. Figure 2.1 shows a block diagram of this implementation.

The DMAC has an Arbiter and a DMA-slave design which implement the slave behavior of

the DMAC. The DMA-master of the DMA-unit acts as the master to the DRAM Controller.

Figure 2.1System Overview

Table 2.1 describes the global signals in an AHB-Lite system.

Name Description

HCLK The bus clock times all bus transfers. All signal timings

are related to the rising edge of HCLK.

HRESETn The bus reset signal is active LOW and resets the system

and the bus. This is the only active LOW AHB-Lite signal.

Table 2.1Description of Global AHB-Lite signals

2.2

Master Interface

The Cortex-M0, being the bus master on AHB-Lite-1 bus, generates control signals, 32-bit

write data and address. The DMA-master does so on the AHB-Lite-2 bus. AHB master

interface signals are explained in Table 2.2. The Cortex-M0 does not generate locked

(15)

2.2. MASTER INTERFACE CHAPTER 2. SYSTEM OVERVIEW AND SPECIFICATION

additional information about bus access such as if the transfer is an opcode fetch or data

access. In this system, HPROT signal is always kept at 4’b0000. The AHB protocol supports

bursts of 4,8 and 16 beats. But the processor does not produce burst transfers and pulls the

HBURST signal to 3’b000. The HBURST signal on the AHB-Lite-2 bus is also pulled down to

3’b000 as the DRAM controller cannot handle bursts either. HTRANS indicates the type of

transfer: idle, busy, non-sequential or sequential. Only idle and non-sequential transfers are supported. The master uses an idle transfer (HTRANS=2’b00) when it does not want to perform a data transfer. Non-sequential transfer (HTRANS=2’b10) indicates a word transfer.

Name Description

HADDR[31:0] The 32-bit system address bus.

HWRITE Indicates the transfer direction. When HIGH this signal indicates a write transfer and when LOW a read transfer. It must remain constant throughout a burst transfer.

HSIZE[3:0] Indicates the size of the transfer, that is typically byte, halfword, or word.

HBURST[2:0] The burst type indicates if the transfer is a single transfer or forms part of a burst. Fixed length bursts of 4, 8 and 16 are supported. HPROT[3:0] The protection control signals provide additional information about

a bus access and are primarily intended for use by any module that wants to implement some level of protection.

HTRANS[1:0] Indicates the transfer type of the current transfer.

HMASTLOCK When HIGH, this signal indicates that the current transfer is part of a locked sequence.

HWDATA[31:0] The write data bus transfers data from the master to the slaves during write operations.

(16)

2.3. SLAVE INTERFACE CHAPTER 2. SYSTEM OVERVIEW AND SPECIFICATION

2.3

Slave Interface

The DMAC acts as a slave on the AHB-Lite-1 bus. The DRAM Controller is the slave on

the AHB-Lite-2 bus. HRDATA, HREADYOUT and HRESP are the outputs of a slave system.

Here, we do not consider HRESP and it is tied to zero. Table 2.3 explains the slave interface

(17)

2.4. SYSTEM SPECIFICATIONS CHAPTER 2. SYSTEM OVERVIEW AND SPECIFICATION

Name Description

HREADY When HIGH, the HREADY signal indicates that a transfer has finished

on the bus. This signal can be driven LOW to extend a transfer.

HRESP When LOW, the HRESP signal indicates that the transfer status is

OKAY. When HIGH, the HRESP signal indicates that the transfer status

is ERROR.

HRDATA[31:0] During read operations, the read data bus transfers data from the selected slave to the multiplexor. The multiplexor then transfers the

data to the master.

Table 2.3Description of AHB-Lite Slave signals

2.4

System Specifications

The DMAC has 3 registers - source address register, destination address register and transfer

length register. The DMAC passes on the transactions issued by the CPU to the DRAM

Controller via AHB-Lite-2 without any delay. Upon a WRITE transaction to its source address

register, the DMAC realizes that it needs to perform a transfer and issues DMA transactions

over the AHB-Lite-2 bus. Once its transfer length register is written to by the CPU, the

DMAC which behaves as the slave of AHB-Lite-1 pulls the HREADY signal LOW to stop the

CPU from issuing any more transactions till the transfer is complete. After the transfer is

complete, the DMAC continues to forward the transactions given by the CPU until the next

(18)

CHAPTER

3

DMAC DESIGN PROCEDURE

3.1

DMAC Unit Overview

DMAC is required to act as a slave in the first AHB-Lite system and as a master in the second.

Hence, there are two parts in this design - DMA-slave and DMA-master. The DMA-master

further has a Datapath which implements the DMA transfer logic and a Controller which

provides the appropriate control signals needed for the transfer. An arbiter is required to check the transactions issued by the CPU and determine if it is intended for the DRAM

controller or the DMAC. If the transaction is supposed to be given to the DRAM controller,

the arbiter passes it on to the AHB-Lite-2 bus. If it is intended for a DMA transfer, the arbiter

(19)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

3.2

DMAC Unit Modules

The following sections discuss the functionality and implementation of the sub-modules

in the DMAC.

3.2.1

Arbiter

The arbiter module checks every transaction issued by the CPU and distinguishes between

AHB-Lite-2 and DMA-Slave intended transactions based on the address. If the transaction

is intended to be passed on to AHB-Lite-2 bus, the arbiter does so without any delay. If the

transaction is required for a DMA transfer, the arbiter directs it to the DMA Slave. Figure 3.1

represents the assignment of the input address as a DMA address upon finding the address

to be corresponding to the source, destination or the transfer_length registers of the DMAC.

Figure 3.2 shows that when the input address does not match the DMAC registers, it is passed on as an AHB-Lite-2 transaction when HREADY is HIGH. Similarly, all the other

output signals are set based on the input address value. Table 3.1 explains the signals in the

(20)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

(21)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

(22)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

Port Name Direction Description

hclk input Top level clock input. hreset_n input Active LOW global reset. hready input Input from AHBLite1.

haddr_cpu[31:0] input Address from the transaction issued by CPU. hwdata_cpu[31:0] input Data to be written.

hwrite_cpu input If HIGH, write operation else read operation. hprot_cpu[3:0],

hmast-lock_cpu, hburst_cpu

[2:0]

input Control signals issued by the CPU.

hsize_cpu [2:0], htrans_cpu[1:0]

input Indicates size of the transfer and the transfer type respectively.

haddr_cpu_o [31:0], haddr_dma_o[31:0]

output Address intended for DRAM and DMAC re-spectively.

hwdata_cpu_o [31:0], hwdata_dma_o[31:0]

output Data to be written to the DRAM Controller and to the DMAC registers respectively.

hwrite_cpu_o, hwrite_-dma_o

output If HIGH, write operation else read operation.

hsize_cpu_o [2:0], htrans_cpu_o [1:0], hsize_dma_o [2:0], htrans_dma_o[1:0]

output Indicates size of the transfer and the transfer type for the DRAM and DMAC respectively.

(23)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

3.2.2

DMA-Slave

Figure 3.3 shows the functionality of the DMA-slave. It latches the address obtained from

the arbiter when HREADY is HIGH. Then the address is compared with the addresses of the

DMAC registers and accordingly the source, destination and transfer_length register values

are set. It then forwards these register values to the DMA-master. The control signal

dma_-start is set to 1’b1 when the source register is written to. It can be seen that hready_pulldown is set to 1’b1 upon writing to the transfer_length register to pull the HREADY of AHB-Lite-1

LOW to stop the CPU from issuing any more transactions till the DMA transfer completes.

All the output signals are pulled LOW when the transfer is done and HREADY is restored back

for the CPU to continue giving transactions. Table 3.2 describes the DMA-Slave interface.

Port Name Direction Description

hclk input Top level clock input.

hreset_n input Active LOW global reset.

hreadyout_o output HREADY to AHB-Lite-1.

src_addr_reg,

dest_-

addr_reg,transfer_-length_reg

output DMAC registers storing source address,

des-tination address and transfer length

respec-tively.

hready_dram input HREADY of AHB-Lite-2.

hready_pulldown output Control signal to pull the HREADY of

AHB-Lite-1 LOW during DMA transfer.

dma_start output Control signal given to the DMA-master to

start the transfer.

done input Interrupt given to the CPU and DMA-slave

(24)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

Figure 3.3DMA Slave Functionality

3.2.3

DMA-Master

Figure 3.4 shows the block diagram of the DMA-master. It includes a data-path and a

control-path and is the master of AHB-Lite-2. Inputs to the system are source address,

destination address and transfer length which are given by the DMA-slave. Source address

is the address in the memory from where the data needs to be read for the transfer to take

place. Destination address is the address in the memory to which the data needs to be

written to. Transfer length indicates the number of words to be transferred. Only word

transfers are implemented in this project as the DRAM Controller can handle only word

(25)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

obtained by the DMA-slave, the data-path gives a DMA-start signal to the controller. Upon

receiving this signal, the control-path starts its functionality and provides the necessary

signals which control the timing of the transfer.

(26)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

3.2.3.1 Controller

Figure 3.5 shows the state diagram of the DMA-master Controller. The 6-bit output indicated

in the state diagram is a concatenation of the control signals generated. The functionality

of these signals are explained in Table 3.3 .

Port Name Direction Description

hclk input Top level clock input.

hreset_n input Active LOW global reset.

hready_i input HREADY from AHB-Lite-2.

dma_start input Signal issued by the DMAslave at the start of

the transfer.

start output The datapath starts the transfer when start is

1.

sel_control output Signal to choose between source and

destina-tion registers during the transfer.

buf_control output Signal to control the value of HWDATA of

AHB-Lite-2.

write_control output Signal to control the value of HWRITE of AHB-Lite-2.

length_control output Signal used to increment the source and

des-tination addresses after a word transfer.

trans_control output Signal to control the value of HTRANS of AHB-Lite-2.

(27)
(28)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

3.2.3.2 Datapath

The datapath is responsible for controlling the AHB-Lite-2 signals during the DMA transfer.

The source and destination addresses obtained from the DMA-slave need to be incremented

by 4 bytes after every word transfer. This is controlled by the length_control signal provided

by the controller. The logic for source address increment is shown in Figure 3.6.

Figure 3.6Source Address Generation

Figure 3.7 indicates the logic to write to HADDR of AHB-Lite-2. Once the addresses

for the next word transfer are ready, the datapath forwards them based on the Sel_control

signal issued by the controller. The datapath writes the chosen address to the HADDR during DMA operation. If the transaction is intended to be directly passed on to the DRAM,

the datapath does so based on the Haddr_pulldown signal obtained from the DMA-slave. It

is to be noted that only a delay of the multiplexor is added when the DRAM transactions

are passed on to the AHB-Lite-2 bus. This helps in maintaining the simulation time of the

(29)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

Figure 3.7HADDR Generation

Figure 3.8 indicates the logic to control the HWDATA of AHB-Lite-2. Based on the

Buf_-control signal issued by the Buf_-controller and the Hready_pulldown signal by the DMA-slave,

the datapath writes the HWDATA value. During a DMA transfer, the output is the value

(30)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

Figure 3.9HWRITE Generation

Trans_control signal issued by the controller is responsible in controlling the value of

HTRANS signal of AHB-Lite-2. Figure 3.10 indicates the logic to do the same. HTRANS was

held at 2’b10 during a transfer as only word and not burst transfers were implemented in this design. It was not possible to implement bursts due to a bug in the base testbench

used in this project. Due to this bug, during burst READ operations, the DRAM controller

read only the first data throughout the length of the burst. Hence, the DMAC was made to

perform only word transfers during both READ and WRITE transactions.

Table 3.4 explains the signals at the datapath interface.

(31)

3.2. DMAC UNIT MODULES CHAPTER 3. DMAC DESIGN PROCEDURE

Port Name Direction Description

hclk input Top level clock input.

hreset_n input Active LOW global reset.

hready_i input HREADY from AHB-Lite-2.

hrdata_i[31:0] input HRDATA from AHB-Lite-2. src_addr_reg,

dest_-addr_reg,

transfer_-length_reg

input DMAC register values storing source address,

destination address and transfer length

re-spectively obtained from DMA-slave.

dma_start input Control signal from the DMA-slave to initiate

transfer.

hready_pulldown input Control signal from the DMA-slave to choose

from the write data given by CPU or read by

the DMA-Datapath.

haddr_pulldown input Control signal from the DMA-slave to choose

from the address given by the CPU or

gener-ated by the DMA-Datapath.

haddr_o[31:0] output Address given to HADDR of AHB-Lite-2. hwdata_o[31:0] output Write Data given to HADDR of AHB-Lite-2. hwrite_o output If HIGH, write operation else read operation.

hsize_o[2:0], htrans_o

[1:0]

output Indicates size of the transfer and the transfer

type.

done output Signal which indicates the end of DMA

trans-fer.

haddr_cpu_o [31:0], hwdata_cpu_o [31:0], hwrite_cpu_o,

hsize_-input Signals obtained by the CPU generated

(32)

3.3. CRITICAL PATH CHAPTER 3. DMAC DESIGN PROCEDURE

3.3

Critical Path

The timing analysis was performed when the design was synthesized using the NanGate

45nm Open Cell Library. The critical path was identified in the datapath module and the

path was between the 2nd bit of the destination register to its 31thbit. The destination

register is updated similar to source register as shown in Figure 3.6. A schematic of this path

was observed on Design Vision and it was found to have a chain of NAND and NOT gates

which contributed to the delay. The maximum delay through this path was calculated to be

6.1815 ns. This value decided the minimum clock period required by the system. For a clock

of 10 ns, a positive slack of 3.5447 ns was observed for the system under slow operating

(33)

CHAPTER

4

STUB INTERFACE WITH THE DMAC

This chapter details the simulations used to verify the basic functionality of the DMAC.

An AHBStub was initially used as a master instead of the CPU to make the debugging of

the system easier. The DRAM transactions generated by the processor need to be passed

on to the DRAM Controller without any delay as the processor stalls issuing transactions

in case of a delay or a miss in transaction completion.The Stub was made to issue only

certain transactions which could help in understanding the DMAC’s functionality better.

An AHBStub written in System Verilog was the master on the AHB-Lite-1 bus with the

(34)

4.1. INTERFACE PROCEDURE CHAPTER 4. STUB INTERFACE WITH THE DMAC

bus.

4.1

Interface Procedure

The DMA transfer does not happen if the DRAM transactions are not handled as the CPU does not issue further data if the previous transactions are not complete. An AHBStub

was used instead of the CPU as a master which made debugging of the system easier. The

AHBStub read the transactions from a text file and passed it on to the arbiter. The arbiter

was used to decide if the transaction was intended for the DRAM or DMA based on the

address in the received transaction. The DRAM transactions are passed on to the DRAM

controller via the AHB-Lite-2 bus as long as a DMA transfer is not initiated. Once the arbiter

identifies the transactions intended for the DMA, it passes on the address, data and control

signals to the DMA slave which then sends the source, destination and transfer_length

register values to the DMA master. Once a DMA transfer is initiated, the DMA slave pulls the

HREADY of the AHB-Lite-1 bus to stop the AHBStub from issuing any more transactions. It gives the control back to the AHBStub after the DMA transfer.

4.2

Limitation of the DRAM Controller

When any program is run on the CPU, it was observed that the DRAM transaction preceeding

a WRITE to the console (address range of 0x40000000 - 0x5FFFFFFF) was not completed.

Due to this bug, a delay was added before writing to the source register of the DMAC as its

address is 0x40000010 which can cause the previous DRAM transaction to not complete.

4.3

Test Cases

The following sections describe the various test cases used to test the functionality of the

(35)

4.3. TEST CASES CHAPTER 4. STUB INTERFACE WITH THE DMAC

4.3.1

Only DMA transactions, no DRAM transactions

The first test shows a simple transfer of 8 words. Figure 4.1 shows the text file with the

transactions sent to the DMA unit. The time in Figure 4.1 indicates the time after which the

stub starts attempting to issue the transaction and not the actual time of the transaction.

Figure 4.2 shows a transaction dump of the Verilog RTL simulations and is the output of

a simple process to snoop both the AHB-Lite buses and print transactions. In Figure 4.2, the first 3 transactions were used to write values to source register, destination register

and transfer_length register of the DMAC respectively. This indicates that the DMA unit

recognizes address 0x40000010 as source register, 0x40000060 as destination register and

0x40000090 as transfer_length register. Addr_1 refers to the transactions happening between

the AHBStub and the DMA unit (via AHB-Lite-1) and addr_2 refers to the transactions

between DMA unit and the DRAM controller (via AHB-Lite-2). The following transactions

in the simulation indicate a DMA transfer of 8 bytes from source address 0x00000010 to

destination address 0x00001000. It can be seen that the first READ on the AHB-Lite-2 bus

occurs 99 cycles after the simulation leaves RESET (1015 ns) or a previous WRITE if there is

any due to the initial delay required by the DRAM Controller. Figure 4.3 shows the start of the simulation in which the registers of the DMAC are written to. Once that is done, the

datapath gives out address and write data to the AHB-Lite-2 bus starting the DMA transfer.

Figure 4.4 shows the simulation from 2005 ns where the first data is read to 2225 ns where

(36)

4.3. TEST CASES CHAPTER 4. STUB INTERFACE WITH THE DMAC

(37)

4.3. TEST CASES CHAPTER 4. STUB INTERFACE WITH THE DMAC

(38)

4.3. TEST CASES CHAPTER 4. STUB INTERFACE WITH THE DMAC

Figure 4.4DMA transfer mechanism

4.3.2

DMA and DRAM transactions

The purpose of this test was to show that DMA transfers work properly with other

trans-actions issued immediately before and afterwards, with no delay. Figure 4.5 shows the

text file with transactions to be sent to the DMAC. It has 3 DRAM transactions before the

source, destination and transfer_length registers of the DMA unit are written to. There are 3 more DRAM instructions after the DMAC registers are written to. These transactions are

expected to pass through directly to the DRAM controller via the DMA unit. Due to the

reason explained in Section 4.2, a delay has been included at the start of the WRITE to

the source register to ensure the completion of the previous DRAM transaction. Figure 4.6

shows the transaction dump corresponding to the transfers. It is observed that the first

READ on the AHB-Lite-2 bus occurs at 3155 ns which is 99 cycles after the previous WRITE

on the bus (2045 ns). It is seen that both READ and WRITE transactions intended for the

DRAM are completed without any delay. The arbiter forwards the DRAM transactions onto

(39)

4.3. TEST CASES CHAPTER 4. STUB INTERFACE WITH THE DMAC

at 3025 ns following consecutive writes to destination and transfer_length registers. This

initiates the DMA transfer of 8 bytes from source address 0x00000010 to destination address

0x00001000. The remaining 3 DRAM transactions are completed after the DMA transfer.

A delay has been added at the start of the last 3 DRAM transactions i.e., they are made to

start at 5000 ns even though the DMA transfer completes before that. The reason for this is

due to a limitation of the DMAC as it performs an extra READ transaction to the last value of source address when it is actually supposed to pass on the DRAM transaction forward.

To avoid the non-completion of the DRAM transaction following the DMA transfer, a delay

needs to be added.

(40)

4.3. TEST CASES CHAPTER 4. STUB INTERFACE WITH THE DMAC

Figure 4.6Transaction dump when both DRAM and DMAC transactions are issued

(41)

4.3. TEST CASES CHAPTER 4. STUB INTERFACE WITH THE DMAC

DMA transfer happens. At 2035 ns a READ happened from address 0x00000020 and at 2045

ns, a WRITE transaction happened to address 0x00000010. Figure 4.8 shows a part of the

DMA transfer that follows. It depicts the transfer of the last 4 words. It can be seen that the

’done’ signal of the datapath goes high after the last address is written to.

(42)

4.3. TEST CASES CHAPTER 4. STUB INTERFACE WITH THE DMAC

Figure 4.8End of DMA transfer

Figure 4.9 shows the waveform indicating the execution of the DRAM transactions after

(43)

4.4. SYNOPSIS CHAPTER 4. STUB INTERFACE WITH THE DMAC

Figure 4.9Execution of DRAM transactions after the DMA transfer

4.4

Synopsis

The DMAC was successfully interfaced with AHBStub and DRAM Controller and its

func-tionality was ensured. The DMAC could switch between DMA mode and DRAM mode

(44)

intro-CHAPTER

5

CORTEX-M0 INTERFACE AND

SYNTHESIS

This chapter details the interfacing procedure of the DMAC to the CPU and discusses the

results obtained. The Cortex-M0 processor is a very low gate count, highly energy efficient

processor that is intended for microcontroller and deeply embedded applications that

require an area optimized processor[3]. The processor’s thumb instruction set combines high code density with 32-bit performance. 2 AHB-Lite Interconnects are used for com-munication on the bus. The processor generates the transactions to be given to the DRAM

(45)

5.1. SYSTEM OVERVIEW CHAPTER 5. CORTEX-M0 INTERFACE AND SYNTHESIS

5.1

System Overview

The AHBStub which was used for debugging was replaced by the Cortex-M0 processor. It

is very important for the DMA unit to forward the transactions given by the CPU to the

DRAM without any delay as the system does not function correctly if there is any delay

or loss of transaction. The system switches to the DMA mode when the CPU writes to the

source register of the DMA unit. All transactions from the CPU are stopped till the DMA

transfer is complete by pulling the HREADY of the AHB-Lite-1 bus low. Once the transfer is

complete, the arbiter continues to forward the transactions onto the second AHB-Lite-2

bus. The input to the CPU is a C code which writes to the dedicated registers of the DMA

unit. The system can perform multiple DMA transfers as it can switch from normal mode to DMA mode whenever the source register is written to. Figure 5.1 shows the snippet of the C

program given as input to the CPU. It is seen that there are 2 DMA transfers to be performed

with a delay after each transfer. The first is a transfer of 16 words from address 0x00000040

to address 0x00004000. The second is a transfer of 8 words from address 0x00000010 to

(46)
(47)

5.2. RESULTS CHAPTER 5. CORTEX-M0 INTERFACE AND SYNTHESIS

5.2

Results

5.2.1

DMAC performing the data transfer

This test shows the operation of the DMAC when interfaced with the CPU. There are 2 DMA

transfers of length 16 and 8 which are shown here. Binary file corresponding to the program

in Figure 5.1 was loaded into the memory. It can be seen that there are "for" loops before

and after the WRITES to the DMAC registers due to reasons explained in Section 4.2 and

Section 4.3.2 respectively. These loops add delays before and after the transfer and ensure

the completion of important DRAM transactions. If the "for" loop before the WRITES to the DMAC registers is removed, the simulation stalls as the DRAM transaction needs to

be completed before the DMA transfer starts. If the last "for" loop after the second DMA

transfer is removed, both the transfers complete, however, due to non-completion of the

following DRAM transaction, the simulation stalls. The total RTL Simulation time taken

for execution was recorded to be 178495 ns. The transaction dump representing the first

transfer is shown in Figure 5.2. DMA operation starts when the length register is written

to at 11885 ns. It can be seen that until then, DRAM transactions are just being passed

through without any delay. The transfer of 16 words from address 0x00000040 to address

0x00004000 is completed at 13685 ns. The transaction dump showing the second transfer is shown in Figure 5.3. The transfer begins when the length register is written to at 14215 ns.

The transfer of 8 words from address 0x00000010 to address 0x00001000 is completed at

(48)

5.2. RESULTS CHAPTER 5. CORTEX-M0 INTERFACE AND SYNTHESIS

(49)

5.2. RESULTS CHAPTER 5. CORTEX-M0 INTERFACE AND SYNTHESIS

Figure 5.3Transaction dump of the second DMA transfer

(50)

5.2. RESULTS CHAPTER 5. CORTEX-M0 INTERFACE AND SYNTHESIS

as the master and the DRAM Controller as the slave. Upon running the code, a simulation

time of 244465 ns was noted which was 37% more than the simulation time with the DMAC

(178495 ns). With the DMAC, one set of READ and WRITE operations in a transfer took 11

cycles, which is lesser than 20 cycles which is the time taken by the CPU to do a single word

transfer. As shown in Figure 5.5, a READ from address 0x00000040 starts at 13745 ns and

the WRITE to address 0x00004000 completes at 13945 ns indicating a transfer of one word in 200 ns, i.e., 20 cycles. This accounts to a speedup of 1.8 cycles per word transfer with the

(51)
(52)

5.3. SYNTHESIS CHAPTER 5. CORTEX-M0 INTERFACE AND SYNTHESIS

Figure 5.5Transfer of one word by the CPU

5.2.3

Only DRAM transactions using DMAC

This test is to show that DRAM transactions are handled by the DMAC without any delay.

A simple fibonacci program was run on the CPU without the presence of the DMAC. The

same program was then run on the system with the DMAC. It was observed that both the

simulation times were identical. This can be further explained using Figure 3.7 where we

can see that haddr_cpu_o which is the address obtained from the transaction issued by the

CPU is passed on to the AHB-Lite-2 bus with delay corresponding only to a multiplexor.

5.3

Synthesis

The design was synthesized using Synopsys Design Compiler (45 nm technology). Clock

period was set at 10 ns. The design was latch free and devoid of major LINT errors. Setup

and hold constraints were met. The synthesizable verilog netlist of the processor, DRAM

controller and AHB-Lite were obtained from the ECE 720 curriculum. The system was

re-simulated at a gate-level with the synthesized netlists using the NanGate 45nm Open

Cell Library and DMA functionality was ensured. The behaviors of the RTL simulation and gate-level simulation were identical with the same simulation times. Area of 3173 um2 was

(53)

5.4. SUMMARY CHAPTER 5. CORTEX-M0 INTERFACE AND SYNTHESIS

5.4

Summary

The DMAC was interfaced with the CPU and multiple DMA transfers were tested. A

perfor-mance improvement of 37% was obtained when the system was simulated using the DMAC.

A speedup of 1.8 cycles was seen per word transfer when the DMAC was used for data

transfer. There was no delay while executing programs which did not need data transfers.

The only limitation in this design is the addition of delays before and after DMA transfers

(54)

CHAPTER

6

CONCLUSION

In this thesis, the design of a Direct Memory Access Controller for a Cortex-M0 based System

on Chip has been illustrated. Successful memory to memory transfers were carried out

by the DMAC for multiple transfer lengths. The DMAC was able to switch from the DMA

transfer mode to a normal DRAM operation mode if there was no signal from the CPU to

perform a transfer. A performance improvement of 37% was obtained when the system

used the DMAC to perform data transfers. A speedup of 1.8 cycles was observed per word

transfer when the DMAC performed the data transfer as against the CPU performing the

transfer. There was no delay while executing programs which did not need data transfers. The design was synthesized at 10 ns clock and the DMAC’s functionality was ensured with

the synthesized netlist.

The limitation of this design is the necessity to add delays before and after a DMA

transfer to ensure the completion of DRAM transactions issued just prior to and just after

(55)

6.1. FUTURE SCOPE CHAPTER 6. CONCLUSION

6.1

Future Scope

• Using a DRAM Controller which can handle burst data. The Synopsys DRAM

Con-troller used in the SoC cannot handle burst transactions. However, the AHB-Lite

interconnect can pass burst data. The DMAC will be more useful if burst transactions

can be handled by the DRAM Controller due to lesser overhead.

• Integrating a cache for the Cortex-M0 processor. The cache can store multiple words

and issue burst data to the AHB-Lite interconnect as the processor itself cannot issue

burst data. With these modifications, the DMAC can be further improved to handle

(56)

BIBLIOGRAPHY

[1] ARM.AMBA 3 AHB-Lite Protocol. 1.0. 2006.

[2] ARM.Cortex-M0 Devices. Generic User Guide. 2009.

[3] ARM.Cortex-M0 Technical Reference Manual. 2009.

[4] Shivashankar, K.ARM Cortex-M0 Design Start.

Figure

Figure 2.1 System Overview
Figure 3.1 Detecting DMA transaction
Figure 3.2 Detecting DRAM transaction
Table 3.1 Arbiter Interface
+7

References

Related documents

When an insured or lessee has initially selected limits of uninsured motorist coverage lower than her or his bodily injury liability limits, higher limits of uninsured motorist

In this paper, we review some of the main properties of evenly convex sets and evenly quasiconvex functions, provide further characterizations of evenly convex sets, and present

This raises the question of, ‘what might make somebody feel like an outsider?’ I remember a fellow in the army telling me he and his buddies would go to church on Sundays just

Our Workforce Timekeeper ™ and data collection solutions, such as the Kronos 4500 ™ terminal, help you control labor costs.. Minimize compliance

At the end of '99, the standards for reservist training statewide jumped to 650 hours but Blanas could have grandfathered Jack Sellers and other major cronies the way Sheriff

This thesis argues that an assessment of Turkey’s impact on the role of the EU in the world stage must take into account the three existing normative approaches for

Gavin and Wright (2007) as reported by Kirk et al (2014) state that African American patients showed “a lack of understanding about Type 2 diabetes and its relation to their

Key policy drivers (IOM Health Professions Education: A Bridge to Quality (2003); Lancet Commission (Frenk et al., 2010), Framework for Action on Interprofessional Education