Intel ® FPGA IP Design Example User Guide
Updated for Intel® Quartus® Prime Design Suite: 21.2
IP Version: H-Tile IP version-21.1.0 P-Tile IP version-2.0.0
Contents
1. Terms and Acronyms... 3
2. Design Example Detailed Description...4
2.1. Design Example Overview...4
2.2. Hardware and Software Requirements... 7
2.3. PIO Using MCDMA Bypass Mode...8
2.3.1. Avalon-ST PIO Using MCDMA Bypass Mode...8
2.3.2. Avalon-MM PIO Using MCDMA Bypass mode... 10
2.4. Avalon-ST Packet Generate/Check...12
2.4.1. Four-Port Avalon-ST Packet Generate/Check... 12
2.4.2. Single-Port Avalon-ST Packet Generate/Check...19
2.5. Avalon-ST Device-side Packet Loopback... 21
2.5.1. Simulation Results... 23
2.5.2. Hardware Test Results... 24
2.6. Avalon-MM DMA...26
2.6.1. Simulation Results... 27
2.6.2. Hardware Test Results... 27
3. Design Example Quick Start Guide... 30
3.1. Design Example Directory Structure... 30
3.2. Generating the Example Design using Intel Quartus Prime...32
3.2.1. Procedure... 32
3.3. Simulating the Design Example...33
3.3.1. Testbench Overview... 33
3.3.2. Example Testbench Flow for DMA Test with Avalon-ST Packet Generate/ Check Design Example...34
3.3.3. Run the Simulation Script...35
3.3.4. View the Results... 36
3.4. Compiling the Example Design in Intel Quartus Prime... 36
3.5. Running the Design Example Application on a Hardware Setup...37
3.5.1. Program the FPGA...37
3.5.2. Quick Start Guide... 38
4. Multi Channel DMA for FPGA IP Design Example User Guide Archives... 63
5. Document Revision History for the Multi Channel DMA for FPGA IP Design Example User Guide...64
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
2
1. Terms and Acronyms
Table 1. Acronyms
Term Definition
PCIe* Peripheral Component Interconnect Express (PCI Express*)
DMA Direct Memory Access
MCDMA Multi Channel Direct Memory Access
PIO Programmed Input/Output
UIO User Space Input/Output
VFIO Virtual Function Input/Output
DPDK Data Plane Development Kit
H2D Host-to-Device
D2H Device-to-Host
H2DDM Host-to-Device Data Mover
D2HDM Device-to-Host Data Mover
QCSR Queue Control and Status register
GCSR General Control and Status Register
IP Intellectual Property
HIP Hard IP
PD Packet Descriptor
QID Queue Identification
TIDX Queue Tail Index (pointer)
HIDX Queue Head Index (pointer)
TLP Transaction Layer Packet
IMMWR Immediate Write Operation
MRRS Maximum Read Request Size
CvP Configuration via Protocol
PBA Pending Bit Array
Avalon®-MM Avalon Memory-Mapped Interface
Avalon-ST Avalon Streaming Interface
2. Design Example Detailed Description
2.1. Design Example Overview
The Multi Channel DMA for PCI Express IP Design Examples demonstrate a Multi Channel DMA solution for Intel® Stratix® 10 GX/MX devices using the H-Tile PCIe Gen3 hard IP, Intel Stratix 10 DX and Intel Agilex™ devices using the P-Tile PCIe Gen4 Hard IP and soft IP implemented in the FPGA fabric.
You can generate the design example from the Example Designs tab of the Multi Channel DMA for PCI Express IP Parameter Editor. The desired user interface type, either Avalon-ST or Avalon-MM, can be chosen. You can allocate up to 2048 DMA channels (with a maximum of 512 channels per function) when the Avalon-MM
Interface type is selected. For the Avalon-ST 4-port Interface, one channel is allocated per port. For the Avalon-ST 1-port interface, you can allocate up to 256 channels in both H-Tile and P-Tile variants. You can also configure the PCIe BAR2 size that is mapped to the Avalon-MM PIO Master port.
Table 2. Supported Design Example Configurations for H-tile
The Hard IP Modes supported are Gen3 x16, 512-bit/Gen3 x8, 256-bit Design
Example User Mode Interface
Type Number of Ports
Total Channels Supported
SR-IOV
Support Simulation Synthesis Driver Support
Avalon-MM DMA
Multi channel
DMA
Avalon-MM
1 Up to 2K
(2) No Up to 2K
channels Up to 2K channels
• Custom
• DPDK
• Kernel Mode
Avalon-MM DMA
Multi channel
DMA 1 Up to
2K (2) Yes
1 Physical Function
and its Virtual Functions
Up to 2K channels
• Custom
• DPDK
Device-side Packet Loopback
Multi channel
DMA
Avalon-ST
4 4 No 1 channel
per port 1 channel per port
• Custom
• DPDK
• Kernel Mode
Packet Generate /
Check
Multi channel
DMA 4 4 No 1 channel
per port 1 channel per port
• Custom
• DPDK
• Kernel Mode continued...
(1) Custom driver is used to generate hardware test results in this document. Every hardware test result is based on the
perfq_app
command.(2) 512 channels per function
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
ISO 9001:2015 Registered
Design
Example User Mode Interface
Type Number of Ports
Total Channels Supported
SR-IOV
Support Simulation Synthesis Driver Support
PIO using MQDMA
Bypass Mode
Multi channel
DMA 4 N/A No Yes Yes
• Custom
• DPDK
• Kernel Mode
Device-side Packet Loopback
Multi channel
DMA 1 Up to 256 No Up to 256
channels Up to 256 channels
• Custom
• DPDK
• Kernel Mode
Packet Generate /
Check
Multi channel
DMA 1 Up to 256 No Up to 256
channels Up to 256 channels
• Custom
• DPDK
• Kernel Mode PIO using
MQDMA Bypass Mode
Multi channel
DMA 1 N/A No Yes Yes
• Custom
• DPDK
• Kernel Mode
Device-side Packet Loopback
Multi channel
DMA 1 Up to 256 Yes
1 Physical Function
and its Virtual Functions
Up to 256 channels
• Custom
• DPDK
Packet Generate /
Check
Multi channel
DMA 1 Up to 256 Yes
1 Physical Function
and its Virtual Functions
Up to 256 channels
• Custom
• DPDK
PIO using MQDMA
Bypass Mode
Bursting
Master Avalon-MM N/A N/A No Yes Yes
• Custom
• DPDK
• Kernel Mode
(1) Custom driver is used to generate hardware test results in this document. Every hardware test
Table 3. Supported Design Example Configurations for P-tile
The Hard IP Modes supported are Gen4 x16, 512-bit/Gen4 x8, 256-bit/Gen3 x16, 512-bit/Gen3 x8, 256-bit
Design
Example User Mode Interface
Type Number of Ports
Total Channels Supported
SR-IOV
Support Simulation Synthesis Driver Support(3)
Avalon-MM DMA
Multi channel
DMA Avalon-MM
1 Up to
2K (2) No Up to 2K
channels Up to 2K channels
• Custom
• DPDK
• Kernel Mode
Avalon-MM DMA
Multi channel
DMA 1 Up to
2K (2) Yes Not
supported Up to 2K channels
• Custom
• DPDK
Device-side Packet Loopback
Multi channel
DMA
Avalon-ST
4 4 No 1 channel
per port 1 channel per port
• Custom
• DPDK
• Kernel Mode
Packet Generate /
Check
Multi channel
DMA 4 4 No 1 channel
per port 1 channel per port
• Custom
• DPDK
• Kernel Mode PIO using
MQDMA Bypass Mode
Multi channel
DMA 4 N/A No Yes Yes
• Custom
• DPDK
• Kernel Mode
Device-side Packet Loopback
Multi channel
DMA 1 Up to 256 No Up to 64
channels Up to 256 channels
• Custom
• DPDK
• Kernel Mode
Packet Generate /
Check
Multi channel
DMA 1 Up to 256 No Up to 64
channels Up to 256 channels
• Custom
• DPDK
• Kernel Mode PIO using
MQDMA Bypass Mode
Multi channel
DMA 1 N/A No Yes Yes
• Custom
• DPDK
• Kernel Mode continued...
(3) Custom driver is used to generate hardware test results in this document. Every hardware test result is based on the
perfq_app
command.Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
6
Design
Example User Mode Interface
Type Number of Ports
Total Channels Supported
SR-IOV
Support Simulation Synthesis Driver Support(3)
Device-side Packet Loopback
Multi channel
DMA 1 Up to 256 Yes (4) Not
supported Up to 256 channels
• Custom
• DPDK
Packet Generate /
Check
Multi channel
DMA 1 Up to 256 Yes (4) Not
supported Up to 256 channels
• Custom
• DPDK
PIO using MQDMA
Bypass Mode
Bursting
Master Avalon-MM N/A N/A No Yes Yes
• Custom
• DPDK
• Kernel Mode
2.2. Hardware and Software Requirements
• Intel Quartus® Prime Pro Edition Software version 21.2
• OS: CentOS Linux 7.4
• Kernel: 3.10.0-693
• Modelsim(5), VCS, or Xcelium(6)
• Intel Stratix 10 MX or GX FPGA Development Kit supporting H-Tile PCIe Gen3
• Intel Stratix 10 DX ES or Intel Agilex F-Series ES FPGA Development Kit supporting P-Tile PCIe Gen4 / Gen3
For details on the design example simulation steps and running Hardware test, refer to the Quick Start Guide.
For more information on development kits, refer to FPGA Development Kits on the Intel website.
(3) Custom driver is used to generate hardware test results in this document. Every hardware test result is based on the
perfq_app
command.(4) Refer to P-tile Avalon Streaming Intel FPGA IP for PCI Express User Guide for supported physical function or virtual function combinations.
(5) Supports H-tile only
(6)
2.3. PIO Using MCDMA Bypass Mode
2.3.1. Avalon-ST PIO Using MCDMA Bypass Mode
2.3.1.1. Four-Port Avalon-ST PIO Using MCDMA Bypass Mode Figure 1. Four-port Avalon-ST PIO Using MCDMA Bypass Mode
ninit_done
AVMM Design Example Platform Designer System Multi Channel DMA
for PCI Express
rx_pio_
master
DMAH2D PCIeHIP
Host
hip_serial
DMAD2H
resetIP
MEM_PIO
h2d_st_0 h2d_st_1 h2d_st_2 h2d_st_3 d2h_st_0 d2h_st_1 d2h_st_2 d2h_st_3
This design example enables Avalon-MM PIO master which bypasses the DMA path.
The Avalon-MM PIO master allows application to perform single, non-bursting register read/write operation with on-chip memory.
This design example only supports PIO functionality and does not perform DMA operations. Hence, the Avalon-ST DMA ports are not connected.
The design example includes the Multi Channel DMA for PCI Express IP Core with the parameters you specified and following components:
• resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire FPGA fabric enters user mode.
• MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (
rx_pio_master
) port that is mapped to PCIe BAR2.Transfer mode option supported in test application software (
perfq_app
) command line:• PIO test: -o
For a description of which driver(s) to use with this design example, refer to Driver Support on page 38.
2.3.1.1.1. Simulation Result
Testbench writes 4 KB of incrementing pattern to on-chip memory and read back via Avalon-MM PIO interface. This design example testbench doesn’t simulate H2D/D2H data movers.
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
8
Figure 2. Simulation Log
Figure 3. Simulation Waveform
2.3.1.1.2. Hardware Test Result
The Custom Driver and DPDK Driver were used to generate the following output:
Figure 4. PIO Test
-o option
2.3.1.2. Single-Port Avalon-ST PIO Using MCDMA Bypass Mode
This design example supports the same PIO functionality as the design example described in Four-Port Avalon-ST PIO Using MCDMA Bypass Mode on page 8. However, this design example has only one Avalon-ST DMA port (instead of four), which is not connected as shown in the block diagram below.
Figure 5. Single-Port Avalon-ST PIO Using MCDMA Bypass Mode
ninit_done
AVMM Design Example Platform Designer System Multi Channel DMA
for PCI Express
rx_pio_
master
DMAH2D PCIeHIP
Host
hip_serial
DMAD2H
resetIP
MEM_PIO
h2d_st_0
d2h_st_0
2.3.2. Avalon-MM PIO Using MCDMA Bypass mode
Figure 6. Avalon-MM PIO using MCDMA Bypass mode
ninit_done
AVMM Design Example Platform Designer System Multi Channel DMA
for PCI Express
rx_pio_
master
h2ddm_
master DMAH2D
PCIeHIP Host
hip_serial
d2hdm_
master DMAD2H
resetIP
MEM_PIO
This design example enables Avalon-MM PIO master which bypasses the DMA path.
The Avalon-MM PIO master allows application to perform single, non-bursting register read/write operation with on-chip memory.
This design example only supports PIO functionality and does not perform DMA operations (similar to the design examples in Avalon-ST PIO Using MCDMA Bypass Mode on page 8). Hence, the Avalon-MM DMA ports are not connected.
The design example includes the Multi Channel DMA for PCI Express IP Core with the parameters you specified and other supporting components:
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
10
• resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire FPGA fabric enters user mode.
• MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (
rx_pio_master
) port that is mapped to PCIe BAR2.Transfer mode option supported in test application software (
perfq_app
) command line:• PIO test: -o
For a description of which driver(s) to use with this design example, refer to Driver Support on page 38.
2.3.2.1. Simulation Results
Testbench writes 4 KB of incrementing pattern to on-chip memory and read back via Avalon-MM PIO interface. This design example testbench doesn’t simulate H2D/D2H data movers.
Figure 7. Simulation Log
Figure 8. Simulation Waveform
2.3.2.2. Hardware Test Results Figure 9. PIO Test
-o option
2.4. Avalon-ST Packet Generate/Check
2.4.1. Four-Port Avalon-ST Packet Generate/Check
Figure 10. Avalon-ST Packet Generate/Check
ninit_done
AVMM
AVST AVST AVST AVST Design Example Platform Designer System Multi Channel DMA
for PCI Express
rx_pio_
master
DMAH2D PCIeHIP
Host
hip_serial
AVST AVST AVST AVST DMAD2H
resetIP
MEM_PIO
h2d_st_0 h2d_st_1 h2d_st_2 h2d_st_3
GEN_CHK d2h_st_0
d2h_st_1 d2h_st_2 d2h_st_3
This design example performs H2D and D2H multi channel DMA via Avalon-ST streaming as well as PIO operations. The Multi Channel DMA for PCI Express IP core provides four independent Avalon-ST Source/Sink ports. DMA channel and Avalon-ST port has 1:1 mapping.
This design example instantiates a packet generator and checker module.
For H2D (Tx) DMA, the host populates the descriptor rings, allocates Tx packet buffers in the host memory, and fills the Tx buffers with a predefined pattern. When the application updates the Queue Tail Pointer register (Q_TAIL_POINTER), the MCDMA IP starts the H2D DMA and sends the received data to the packet checker module, which verifies the data integrity.
For D2H (Rx) DMA, packets generated from a packet generator module are transferred to the host memory, where the host checks the data integrity.
For Bidirectional DMA, the packet generator and checker modules transmit/receive the packets simultaneously.
In addition, the design example enables Avalon-MM PIO master which bypasses the DMA path. It allows application to perform single, non-bursting register read/write operation with on-chip memory block. Also, test application software, perfq_app, uses the Avalon-MM PIO Master port to configure the Packet Generator and Checker.
The design example includes the Multi Channel DMA for PCI Express IP Core with the parameters you specified and following components:
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
12
• resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire FPGA fabric enters user mode.
• MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (
rx_pio_master
) port that is mapped to PCIe BAR2.• GEN_CHK – Packet Generator and Checker for MCDMA. Connected to the MCDMA Avalon-ST Source (
h2d_st_x
) and Avalon-ST Sink (d2h_st_x
) ports.Transfer mode Options supported in test application software (
perfq_app
) command line:• PIO test: -o
• DMA test: -t (Tx), -r (Rx), -z (Bidirectional)
For a description of which driver(s) to use with this design example, refer to Driver Support on page 38.
2.4.1.1. Simulation Results
Note: For detailed description about the testbench for this design example, refer to Example Testbench Flow for DMA Test with Avalon-ST Packet Generate/Check Design Example on page 34.
Figure 11. H2D Simulation Log
Figure 12. H2D Simulation Waveform
Figure 13. D2H Simulation Log
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
14
Figure 14. D2H Simulation Waveform
2.4.1.2. Hardware Test Results Figure 15. PIO Test
-o option
Figure 16. H2D Avalon-ST Streaming
-t option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
Note: In the example above, the
perfq_app
command transfers 1 GB of total transfer size with payload of 8192 bytes in each descriptor in H2D direction (-t) for four channels.Without -v (data validation) option, the command displays the bandwidth.
The -p option specifies the payload size. The maximum payload size varies depending
• Loopback: 1 MB
• Avalon-MM:
— With validation enabled: ((total available memory) / #channels)
— With validation disabled: 1 MB
• Avalon-ST Packet Generate/Check: 131072 bytes The -s option specifies the transfer size.
• For loopback, Avalon-ST and Avalon-MM design examples (except for the Avalon- ST Packet Generate/Check design example), there is no limit on the transfer size.
• For the Avalon-ST Packet Generate/Check design example, the number of descriptors (transfer size / packet size) should be a modulus of 64 (64 is the default file size).
The -a option specifies the number of threads. For this option, you can provide any number that is a factor of the total number of queues required to distribute the traffic equally among the available cores in the system.
For example, for a system with 64 channels of bidirectional traffic, there is a maximum of 128 possible queues. Hence, the -a option can accept these values:
1,2,4,8,16,32,64,128. If you use -a 128, the 128 queues are distributed among 128 cores. However, if the number of cores in the system is limited, you can use smaller values for a. If you use -a 4, the 128 queues are distributed among 4 cores (with each core handling 32 queues). A higher number of queues per core does lead to a
decrease in performance.
Figure 17. H2D Avalon-ST Streaming with Data Validation Enabled
-t with -v option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
16
Figure 18. D2H Avalon-ST Streaming
-r option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
Figure 19. D2H Avalon-ST Streaming with Data Validation Enabled
-r with -v option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
18
Figure 20. Bidirectional Avalon-ST Streaming
-z option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
Figure 21. Bidirectional Avalon-ST Streaming with Data Validation Enabled
-z with -v option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
2.4.2. Single-Port Avalon-ST Packet Generate/Check
Below is the block diagram of a packet generator design example with a single-port Avalon Streaming interface supporting multiple channels without any interleaving. This design example can be used with the perfq application to evaluate the functionality and capture the MCDMA performance. In the H2D direction, the design example
checks for the received packets and software then reads the status registers to make sure there are no errors. In the D2H direction, the design example generates the packets and forwards them to the Host side by means of PCIe MWr.
For a description of which driver(s) to use with this design example, refer to Driver Support on page 38.
2.4.2.1. Simulation Waveforms
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
20
2.4.2.2. Simulation Log
2.5. Avalon-ST Device-side Packet Loopback
Figure 22. Avalon-ST Device-side Packet Loopback
ninit_done
AVMM
AVST AVST AVST AVST Design Example Platform Designer System Multi Channel DMA
for PCI Express
rx_pio_
master
DMAH2D PCIeHIP
Host
hip_serial
AVST AVST AVST AVST DMAD2H
resetIP
MEM_PIO
h2d_st_0 h2d_st_1 h2d_st_2 h2d_st_3
FIFO_ST0 FIFO_ST1 FIFO_ST2 FIFO_ST3 d2h_st_0
d2h_st_1 d2h_st_2 d2h_st_3
This design example performs H2D and D2H multi channel DMA via Avalon-ST streaming. The Multi Channel DMA for PCI Express IP core provides four independent Avalon-ST Source/Sink ports. DMA channel and Avalon-ST port has 1:1 mapping.
For H2D streaming, Multi Channel DMA sends the data to Avalon-ST loopback FIFOs via four Avalon-ST Source ports. For D2H streaming, Multi Channel DMA receives the data from Avalon-ST loopback FIFOs via Avalon-ST Sink ports.
In this device-side loopback example, the Host first sets up memory locations within the Host memory. Data from the Host memory is then sent to the device-side memory by the Multi Channel DMA for PCI Express IP via H2D DMA operations. Finally, the IP loops this data back to the Host memory using D2H DMA operations.
In addition, the design example enables Avalon-MM PIO master which bypasses the DMA path. It allows application to perform single, non-bursting register read/write operation with on-chip memory block.
The design example includes the Multi Channel DMA for PCI Express IP Core with the parameters you specified and following components:
• resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire FPGA fabric enters user mode
• MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (
rx_pio_master
) port that is mapped to PCIe BAR2• FIFO_ST0, FIFO_ST1, FIFO_ST2, and FIFO_ST3 – Avalon-ST FIFOs for
streaming loopback. Connected to the MCDMA Avalon-ST Source (
h2d_st_x
) and Avalon-ST Sink (d2h_st_x
) portsTransfer mode options supported in test application software (perfq_app) command line:
• PIO test: -o
• DMA test: -i (performance loopback operation where the Tx and Rx are run in two different threads), -v (enable data validation, which will perform a data integrity check).
— -i without -v flag displays the throughput per channel
For a description of which driver(s) to use with this design example, refer to Driver Support on page 38.
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
22
2.5.1. Simulation Results
Figure 23. Simulation Log
Figure 24. H2D Simulation Waveform
Figure 25. D2H Simulation Waveform
2.5.2. Hardware Test Results
The Custom Driver, DPDK Driver, and Kernel Mode Driver were used to generate the following output:
Figure 26. PIO Test
-o option
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
24
Figure 27. Performance Test
-i option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
Figure 28. Data Validation Test
-i with -v option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
2.6. Avalon-MM DMA
Figure 29. Avalon-MM DMA
ninit_done
AVMM Design Example Platform Designer System Multi Channel DMA
for PCI Express
rx_pio_
master
h2ddm_
master DMAH2D
PCIeHIP Host
hip_serial
d2hdm_
master DMAD2H
resetIP
MEM_PIO
AVMM
AVMM
MEM
This design example performs H2D and D2H multi channel DMA via Avalon-MM memory-mapped interface. The Multi Channel DMA for PCI Express IP core provides one Avalon-MM Write/Read Master port. You can allocate up to 2K DMA channels when generating this example design.
This example design contains on-chip memories to support PIO and H2D/D2H DMA operations.
For the H2D (Tx) DMA, the host populates the descriptor rings, allocates Tx packet buffers in the host memory, and fills the Tx buffers with a predefined pattern. When the application updates the Queue Tail Pointer register (Q_TAIL_POINTER), the MCDMA IP starts the H2D DMA and writes received data to the on-chip memory.
For the D2H (Rx) DMA, the host initializes the FPGA on-chip memory with a predefined pattern. The MCDMA IP reads the packet data from the on-chip memory and transmits it to the host memory.
For bidirectional DMA, H2D is started before D2H and then both DMAs operate simultaneously.
In addition, the design example enables Avalon-MM PIO master which bypasses the DMA path. It allows application to perform single, non-bursting register read/write operation with on-chip memory block.
The design example includes the Multi Channel DMA for PCI Express IP Core with the parameters you specified and following components:
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
26
•
resetIP
– Reset Release IP that holds the Multi Channel DMA in reset until the entire FPGA fabric enters user mode•
MEM_PIO
– On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (rx_pio_master
) port that is mapped to PCIe BAR2•
MEM
– Dual port on-chip memory. One port is connected to the Avalon-MM Write Master (h2ddm_master
) and the other port to Avalon-MM Read Master(
d2hdm_master
)Transfer mode options supported in test application software (perfq_app) command line:
• PIO test: -o
• DMA test: -t (Tx), -r (Rx)
For a description of which driver(s) to use with this design example, refer to Driver Support on page 38.
2.6.1. Simulation Results
No simulation for this Avalon-MM DMA design example is available in the current Intel Quartus Prime release.
2.6.2. Hardware Test Results
The Custom Driver, DPDK Driver, and Kernel Mode Driver were used to generate the following output:
Figure 30. PIO Test
-o option
Figure 31. H2D Avalon-MM Write
-t option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
Figure 32. H2D Avalon-MM Write with Data Validation Enabled
-t -v option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
28
Figure 33. D2H Avalon-MM Read
-r option. Note: This hardware test was run with the Intel Stratix 10 GX H-tile PCIe Gen3 x16 configuration.
3. Design Example Quick Start Guide
Using Intel Quartus Prime software, you can generate a design example for the Multi Channel DMA for PCI Express (PCIe) IP core.
The generated design example reflects the parameters that you specify. The design example automatically creates the files necessary to simulate and compile in the Intel Quartus Prime software. You can download the compiled design to your FPGA
Development Board. To download to custom hardware, update the Intel Quartus Prime Settings File (.qsf) with the correct pin assignments.
Figure 34. Design Example Development Steps
Design Example Generation
Compilation
(Simulator) Functional
Simulation
Hardware Testing Compilation
(Quartus Prime)
3.1. Design Example Directory Structure
Table 4. Directory Structure
Directory / File Sub-directory /
File Sub-directory /
File Sub-directory /
File Sub-directory /
File Note
pcie_ed
sim
pcie_ed.v Design example
top-level HDL
<simulators> <simulation scripts> pcie_ed simulation directory
synth pcie_ed.v Design example
top-level HDL
<Components automatically generated by Platform Designer>
pcie_ed_tb pcie_ed_tb sim pcie_ed_tb.v Testbench
including Intel FPGA BFM
continued...
Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current specifications in accordance with Intel's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services.
*Other names and brands may be claimed as the property of others.
ISO 9001:2015 Registered
Directory / File Sub-directory /
File Sub-directory /
File Sub-directory /
File Sub-directory /
File Note
<simulators> <simulation script>
Testbench simulation directory
ip pcie_ed_tb DUT_pcie_tb_ip Intel FPGA BFM
(RP)
pcie_ed_tb.qsys Testbench
Platform Designer file
pcie_ed.ipx
software
dpdk
dpdk
drivers net
examples mcdma-test patches v20.05-rc1 Licenses license_bsd.txt
version.txt
kernel
common
driver kmod mcdma-driver
Kernel driver mqdma-driver
Licenses license_bsd.txt
user
cli
perfq_app
<test application
software> Test Application
README Readme file
sample ref.c Reference API
flow devmem
simple_app testapp
common
include regs MCDMA and Pkt
Gen/Chk registers mk
src
libmqdma <user space library files> User space library Licenses
Readme Readme file
readme Readme file
ip pcie_ed <Design example IP components>
pcie_ed.qpf Quartus project
file
pcie_ed.qsf Quartus setting
file
pcie_ed.qsys Design example
Platform Designer file
3.2. Generating the Example Design using Intel Quartus Prime
Figure 35. Design Example Generation
Start Parameter
Editor Specify IP Variation
and Select Device Select Design Parameters
Specify Example Design and
Select Target Board
Initiate Design Generation
3.2.1. Procedure
1. In the Intel Quartus Prime Pro Edition software, create a new project (File → New Project Wizard).
2. Specify the Directory, Name, and Top-Level Entity.
3. For Project Type, accept the default value, Empty project. Click Next.
4. For Add Files click Next.
5. For Family, Device & Board Settings, select Intel Stratix 10
(GX/SX/MX/TX/DX) or Intel Agilex F-Series and the Target Device for your design.
Note: The selected device is only used if you select None in Step 10c below.
6. Click Finish.
7. In the IP Catalog locate and add the Multi Channel DMA for PCI Express (Intel Stratix 10 GX/MX devices) or Multi Channel DMA P-Tile for PCI Express (Intel Stratix 10 DX and Intel Agilex devices) which brings up the IP Parameter Editor.
8. In the New IP Variant dialog box, specify a name for your IP. Click Create.
9. On the IP Settings tabs, specify the parameters for your IP variation.
10. On the Example Designs tab, make the following selections:
a. For Example Design Files, turn on the Simulation and Synthesis options.
If you do not need these simulation or synthesis files, leaving the
corresponding option(s) turned off significantly reduces the example design generation time.
b. For Generated HDL Format, only Verilog is available in the current release.
c. For Target Development Kit, select the appropriate option.
Note: If you select None, the generated design example targets the device specified. Otherwise, the design example uses the device on the selected development board. If you intend to test the design in hardware, make the appropriate pin assignments in the .qsf file.
d. For Currently Selected Example Design, select a design example from a pulldown menu. Available design examples depends on the User Mode and Interface type setting in MCDMA Settings under IP Settings tab. Available design examples for the Multi Channel DMA mode and Avalon-ST Interface type:
• PIO using MQDMA Bypass Mode
• Packet Generate/Check
• Device-side Packet Loopback
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
32
Available design examples for the Multi Channel DMA mode and Avalon-MM Interface type:
• PIO using MQDMA Bypass Mode
• AVMM DMA
11. Select Generate Example Design to create a design example that you can simulate and download to hardware. If you target one of the Intel FPGA development kits, the device on that board supersedes the device previously selected in the Intel Quartus Prime Pro Edition project if the devices are different.
When the prompt asks you to specify the directory for your example design, you can choose to accept the default directory ./
intel_pcie_mcdma_0_example_design or choose another directory.
12. Click Close on Generate Example Design Completed message.
13. Close the IP Parameter Editor. Click File → Exit. When prompted with Save changes?, you do not need to save the .ip. Click Don’t Save.
3.3. Simulating the Design Example 3.3.1. Testbench Overview
Figure 36. Testbench Platform Designer View
Testbench Platform Designer file path: pcie_ed_tb/pcie_ed_tb.qsys
The design example, pcie_ed_inst, is generated with x16. The Intel FPGA BFM, DUT_pcie_tb, can support up to x8 link. The BFM supports the testbench simulation by down-training to x8 link. If you want to simulate x16 link, you can use a third-party BFM.
The testbench uses a Root Port driver module, altpcietb_bfm_rp_gen3_x8.sv (Path: pcie_ed_tb/ip/pcie_ed_tb/DUT_pcie_tb_ip/
altera_pcie_s10_tbed_191/sim), to exercise the target memory and DMA channel in the Endpoint. This is the module that you can modify to vary the transactions sent to the example Endpoint design or your own design.
For more information about Intel FPGA BFM, refer to Intel Stratix 10 Avalon streaming and SR-IOV Interface for PCI Express Solutions User Guide (Section 9.3 Root Port BFM Overview).
Related Information
Intel Stratix 10 Avalon streaming and SR-IOV Interface for PCI Express Solutions User Guide (Section 9.3 Root Port BFM Overview)
3.3.2. Example Testbench Flow for DMA Test with Avalon-ST Packet Generate/Check Design Example
The DMA testbench for the Avalon-ST Packet Generate/Check design example demonstrates the following two major tasks:
• Host-to-Device: Transferring packets stored in the host memory to the Packet Checker in the design example user logic, where a checker module verifies the integrity of the packet
• Device-to-Host: Packets generated from a Generator module are transferred to the host memory where the host checks the packet integrity
Note: This testbench implements transfer of one packet with length of 4096 bytes.
The DMA testbench for the design example completes the following tasks for each of the 4 ports supported by the DUT:
1. Set up 4096 bytes of incrementing data pattern for testing data movement from the host to the device and then back to the host.
2. Write the expected packet length value (4096 bytes) to the Packet Generation and Checker in the design example user logic through the PIO. This value is used by the Packet checker module for testing packet integrity.
3. MSI-X is enabled and configured for launching a memory write to signal the end of each descriptor’s DMA transaction. Write-Back function is kept disabled for the simulation.
4. Set up the H2D (Host-to-Device) queue in the Multi Channel DMA.
5. Set up three H2D descriptors in the host memory, with the source address pointing to the incrementing data pattern locations in the host memory. The start of packet (SOF) and end of packet (EOF) markers along with packet length are indicated in the descriptors.
6. At the last step of the Queue programming, the Multi Channel DMA tail pointer register is written, which triggers the Multi Channel DMA to start the H2D DMA transaction.
7. The previous step instructs the H2D Data Mover to fetch the descriptors from the host memory.
8. The Multi Channel DMA H2D Data Mover reads the data from the host memory and forwards the packet to the Packet Generator and Checker through the AVST Streaming interface.
9. The checker module receives the packet and checks for integrity by testing the data pattern, length as expected and proper receipt of the “end of packet” marker.
If the packet is found to be proper, the good packet count is incremented by 1 else the bad packet count is incremented.
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
34
10. The testbench does a PIO read access of the Good Packet Count and Bad Packet Count registers and displays the test success or failure status.
11. MSI-X write commands are triggered for every description or completion which are checked by the testbench for proper receipt.
12. Next, set up the D2H (Device-to-Host) Queue.
13. Setup three D2H descriptors in the host memory, with the destination address pointing to a new address space in host memory which is pre-filled with all zeroes.
14. At the last step of the Queue programming, the Multi Channel DMA tail pointer register is written, which triggers the Multi Channel DMA to start the D2H DMA transaction.
15. The previous step instructs the H2D Data Mover to fetch the descriptors from the host memory to start the D2H DMA transaction.
16. The Multi Channel DMA D2H Data Mover reads the incoming packet from the Packet Generator and writes the data to the host memory according to the descriptors fetched in the previous step.
17. MSI-X write commands are triggered for every description completion which are checked by the testbench for proper receipt.
18. Compares the data written back to the system memory in D2H task with the standard incrementing pattern and declare test success/failure.
The simulation reports
Simulation stopped due to successful completion
if no errors occur.3.3.3. Run the Simulation Script
Figure 37. Simulation Script
Change to
Testbench Directory Run
<Simulation Script> Analyze Results
1. Change to the testbench simulation directory, pcie_ed_tb/pcie_ed_tb/sim/
<simulators>.
2. Run the simulation script for the simulator of your choice. Refer to the table below.
3. Analyze the results.
Table 5. Steps to run the simulation
Simulator Simulation Directory Instructions
ModelSim <example_design>/pcie_ed_tb/
pcie_ed _tb/sim/mentor/
1. Invoke vsim (by typing vsim, which brings up a console window where you can run the following commands).
2. do msim_setup.tcl
continued...
Simulator Simulation Directory Instructions Note: Alternatively, instead of
doing Steps 1 and 2, you can type: vsim -c -do msim_setup.tcl 3. ld_debug
4. run -all
5. A successful simulation ends with the following message:
"Simulation stopped due to successful completion!"
Note: ModelSim currently supports BAM and PIO example designs only.
VCS <example_design> /pcie_ed_tb/
pcie_ed _tb/sim/synopsys/vcs
1. sh vcs_setup.sh
USER_DEFINED_COMPILE_OPTIONS=""
USER_DEFINED_ELAB_OPTIONS="- xlrm\ uniq_prior_final"
USER_DEFINED_SIM_OPTIONS=""
2. A successful simulation ends with the following message:
"Simulation stopped due to successful completion!"
Xcelium
Note: CISE to confirm if Xcelium simulator supports all example design variants.
<example_design>/pcie_ed_tb/
pcie_ed_tb/sim/xcelium
1. sh xcelium_setup.sh USER_DEFINED_SIM_OPTIONS=""
USER_DEFINED_ELAB_OPTIONS ="- timescale\ 1ns/1ps\ -NOWARN\
CSINFI"
2. A successful simulation ends with the following message:
"Simulation stopped due to successful completion!"
3.3.4. View the Results
To view the Simulation Logs, Simulation Waveforms and Hardware Test Results for each design example, refer to Design Example Detailed Description on page 4.
3.4. Compiling the Example Design in Intel Quartus Prime
To compile the example design, follow these steps:
1. Navigate to the design example directory,
intel_pcie_mcdma_0_example_design, and open the Intel Quartus Prime project file, pcie_ed.qpf in Intel Quartus Prime Pro Edition software.
2. On the Processing menu, select Start Compilation (use the button circled in green in the image below).
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
36
Figure 38. Design Example Compilation
3.5. Running the Design Example Application on a Hardware Setup
The following list details the development kits used for testing:
• Intel Stratix 10 GX/MX devices using the H-Tile PCIe Gen3 hard IP
• Intel Stratix 10 DX
• Intel Agilex devices using the P-Tile PCIe Gen4 Hard IP and soft IP
3.5.1. Program the FPGA
1. Connect a FPGA programming cable to the Intel Stratix 10 FPGA Development Board
2. On the Tools menu, select Programmer
3. In the Programmer, click Hardware Setup and verify the Intel Stratix 10 FPGA Development Board is detected in Hardware Setting tab and JTAG Settings tab 4. Select Auto Detect to detect the JTAG device chain
5. Select the target FPGA device in the JTAG chain, select Change File, and select the pcie_ed.sof
6. Select Start to start programming
Figure 39. Programming Stratix 10 MX FPGA Development Board
3.5.2. Quick Start Guide
3.5.2.1. Software Test SetupThe following host configuration is used to test the functionality of the design example:
Kernel 3.10
Operating System CentOS Linux release 7.8
GCC Version 4.8.5
3.5.2.2. Driver Support
The table below summarizes the driver support for the MCDMA design example variants. It uses the following acronyms:
• User Space I/O (UIO): A kernel base module that the PCIe device uses to expose its resources to user space.
• Virtual Function I/O (VFIO) driver: An IOMMU/device agnostic framework for exposing direct device access to user space in a secure, IOMMU-protected environment.
• Data Plane Development Kit (DPDK): Consists of libraries to accelerate packet processing workloads running on a wide variety of CPU architectures.
Table 6. Driver Support for MCDMA Design Examples
Custom Driver DPDK Driver Kernel Mode Driver
Description
Also known as the user mode driver, this driver is created to support both UIO and VFIO base kernel modules. This driver provides custom APIs and can be used without depending on any framework.
This DPDK Poll Mode Driver (PMD) uses the DPDK framework. The PMD will expose the device as an ethernet device. It supports both UIO and VFIO base kernel modules. Existing DPDK applications can be integrated with the MCDMA PMD.
Kernel Mode Driver (KMD) exposes the MCDMA IP through the char dev interface & using standard char dev file operations performs DMA transfers
Directory/Driver Path <example_design>/
software/user
<example_design>/
software/dpdk
<example_design>/
software/kernel/
SR-IOV Support Yes Yes No
MCDMA AVMM 1-port
DMA Design Example Yes, up to 2K channels Yes, up to 8 channels Yes, up to 2 K channels MCDMA AVMM 1-port
DMA with SR-IOV Design
Example Yes, up to 2K channels Yes. up to 2k channels No Bursting Master AVMM
PIO using MQDMA Bypass
Mode Design Example Yes Yes. We are supporting now. No MCDMA AVST 4-port
Device-side Packet
Loopback Design Example Yes, 1 channel per port Yes, 1 channel per port Yes, 1 channel per port MCDMA AVST 4-port
Packet Generate/Check
Design Example Yes, 1 channel per port Yes, 1 channel per port Yes, 1 channel per port continued...
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
38
Custom Driver DPDK Driver Kernel Mode Driver MCDMA AVST 4-Port PIO
using MQDMA Bypass
Mode Design Example Yes Yes No
MCDMA AVST 1-port Device-side Packet
Loopback Design Example Yes Yes, same as custom mode driver
Yes
MCDMA AVST 1-port Packet Generate/Check
Design Example Yes Yes, same as custom mode
driver
Yes
MCDMA AVST 1-port PIO using MQDMA Bypass
Mode Design Example Yes Yes No
MCDMA AVST 1-port Device-side Packet Loopback with SR-IOV Design Example
Yes Yes
No
MCDMA AVST 1-port Packet Generate/Check with SR-IOV Design Example
Yes Yes
No
3.5.2.3. MCDMA Custom Driver
3.5.2.3.1. Prerequisites
Configuration Changes from BIOS
Enable the following parameters from the BIOS:
1. KVM
2. VT-d (or AMD-V for AMD processors) 3. SRIOV Enable
4. ARI (Alternative Routing ID Interpretation)
Make sure the IOMMU is enabled on the host by using the following command:
$ virt-host-validate | grep IOMMU
QEMU: Checking for device assignment IOMMU support : PASS QEMU: Checking if IOMMU is enabled by kernel : PASS
External Packages
To create a VM environment with QEMU, install the following software:
1. Use the command below to install the packages:
$ yum install qemu-kvm qemu-img libvirt virt-install libvirt- client
2. Download the QEMU software from the official site.
Note: For testing over VM, you need to generate the necessary
qcow2
file.Set the Boot Parameters
Follow the steps below to modify the default hugepages setting in the grub files:
1. Edit the /etc/default/grub file
Append the highlighted parameters to the GRUB_CMDLINE_LINUX line in the /etc/
default/grub file
GRUB_CMDLINE_LINUX=" rd.lvm.lv=centos/root rd.lvm.lv=centos/
swap rhgb default_hugepagesz=1G hugepagesz=1G hugepages=40
panic=1 intel_iommu=on iommu=ptThis is what the file looks like after the edit:
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system- release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX=" rd.lvm.lv=centos/root rd.lvm.lv=centos/
swap rhgb default_hugepagesz=1G hugepagesz=1G hugepages=40
panic=1 intel_iommu=on iommu=ptGRUB_DISABLE_RECOVERY="true"
In the case of memory allocation failure at the time of Virtual Function creation, add the following boot parameters:
"pci=hpbussize=10,hpmemsize=2M,nocrs,realloc=on"
To bind the device to vfio-pci and use IOMMU, enable the following parameter:
intel_iommu=on
To use UIO and not enable the IOMMU lookup, add the following parameter:
iommu=pt
To use the AMD platform and the UIO driver, add the following parameter at boot time:
iommu=soft
2. Generate GRUB configuration files.
To check whether the boot system is legacy or EFI-based, check the existence of the following file:
$ls -al /sys/firmware/efi
If this file is present, the boot system is EFI-based. Otherwise, it is a legacy system.
a. In case of a legacy system, execute the following command:
$ grub2-mkconfig -o /boot/grub2/grub.cfg
b. In case of an EFI-based system, execute the following command:
$ grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
3. Reboot the system.4. Verify the changes above:
Multi Channel DMA for PCI Express* Intel® FPGA IP Design Example User
Guide Send Feedback
40
$ cat /proc/cmdline
5. Set the huge pages:$ echo 40 > /proc/sys/vm/nr_hugepages
3.5.2.3.2. Software SetupInstalling the Linux Kernel Driver and Enabling VFs
1. Install the UIO driver If we are proceeding with UIO support. If we are proceeding with vfio, this step not required:
$ modprobe uio
2. Build the mqdma kernel driver and load:
a.
$ cd software/kernel
b.
$ make clean all -C driver/kmod/mqdma-driver
c.$ insmod driver/kmod/mqdma-driver/ifc_uio.ko
3. Verify whether driver is loaded or not:$ lspci -d 1172:000 -v | grep ifc_uio
(Kernel driver in use: ifc_uio)4. Enable Virtual functions based on requirements:
$ echo 2 > /sys/bus/pci/devices/<bdf>/max_vfs
Currently, UIO is the default.To use the vfio driver, modify UIO_SUPPORT in
common/mk/common.mk
as follows:__cflags += -UUIO_SUPPORT
1. Installvfio-pci
module.$ modprobe vfio-pci
2. Bind the device tovfio-pci
a. If the device is bound to
ifc_uio
, unbind with the following command:$ echo "<bdf>" > /sys/bus/pci/devices/<bdf>/driver/unbind E.g: echo "0000:01:00.0" > /sys/bus/pci/devices/
0000\:01\:00.0/driver/unbind
b. Bind the device tovfio-pci
$ echo <bdf> > /sys/bus/pci/drivers/vfio-pci/bind
E.g: echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/
bind
Build and Install User Space Library 1. Build the library