3.4 Comparison with the GNU Radio Framework
3.4.2 CPU Load
In this section CPU Loads for GNU Radio and C-coded system are compared. The measurements in this section are done under the circumstances listed below:
• Hardware:
– A desktop computer - with: – Intel Celeron CPU 1.2 GHz – 256 KB cache
– 281.5 MB DRAM – 7.3 GB hard disk
• Software:
– Ubuntu 7.10
– Linux 2.6.22-15-generic Kernel – gcc-4.1.2-16ubuntu2
– oprofile 0.9.3
– GNU Radio SVN Version -r 8198
• Compilation optimization level: – GNU Radio : -O2
– C-coded system : -O2
• Other conditions to be noted:
– Run-time for writing/sending the output to file/USRP is not considered.
Execution Time
The overall run-times measured for the GNU Radio and C-coded system while trans- mitting 62499 (6.25e4) packets with 16 bytes each, and 3124999 (50·6.25e4) packets with 16 bytes each.
The run-time measurements are compared in Table.3.8 and shown in Fig.3.6. The overall runtime for GNU Radio is measured using the Linux’s timecommand.
It is seen that GNU Radio has longer run-time than the C-coded system for the same OFDM transmitter application, for about 10 times longer then the C-coded system needs. The differences in throughput shows that C-coded system has about 10 times larger value then that of the GNU Radio system.
FOR 62499 GNU Radio C-coded (≈6.25e4) PACKETS real 0m52.097s 0m4.901s user 0m43.315s 0m4.312s sys 0m2.312s 0m0.484s Throughput 19194.66 204036.73
(bytes/sec) ≈ 19.19e3 ≈ 10·19.19e3
FOR 3124999 GNU Radio C-coded (≈50·6.25e4) PACKETS
real 41m42.440s 4m1.180s
user 23m58.195s 3m30.013s
sys 1m46.507s 0m24.118s
Throughput 19980.49 207313.97 (bytes/sec) ≈19.98e3 ≈10·19.98e3
Table 3.8: The overall runtimes and throughputs of GNU Radio and C-coded system, while transmitting 6.25e4 and 50·6.25e4 packets with 16 bytes each.
Figure 3.6: The overall runtimes of GNU Radio and C-coded system in CPU seconds.
CPU Load Profile
Then the CPU-loads for each process in GNU Radio and C-coded system are profiled with OProfile.
For the profiling 3124999 (≈50·6.25e4) packets with 16 bytes each were generated. The profilings are done with the following command and options:
$ sudo opcontrol --setup --vmlinux=/home/user/linux-2.6.24/vmlinux $ sudo opcontrol --start
$ sudo opcontrol --dump $ opreport --symbols
With the first command above the linux kernel is also profiled. With the --setup option of--eventthe event to be measured can be set actually. For example with the --setupoption of
$ sudo opcontrol --setup --event=CPU_CLK_UNHALTED:860000:0:1:1
Approximately 1000 samples per second are produced on a Pentium processor. Though in Linux 2.6 kernels on CPUs without OProfile support for the hardware performance counters, the driver falls back to using the timer interrupt for profiling [19]. That is, the OProfile runs in timer interrupt mode, thus the configuration of the number of samples to be produced is not possible in general for Linux 2.6 kernels. In OProfile website [19] is suggested, to try enabling the local APIC (Advanced Pro- grammable Interrupt Controller), though it was not successful on my machine, that the GRUB (GRand Unified Bootloader) automatically removed the option for enabling the local APIC. However the APIC has been disabled for the profiling.
The profiling was done for the time shown in Table.3.9.
FOR 3124999 GNU Radio C-coded
(≈50·6.25e4) PACKETS
real 36m58.226s 3m56.161s
user 35m8.808s 3m30.785s
sys 1m42.210s 0m22.361s
Table 3.9: Run-time for the profiling with OProfile while transmitting 6.25e4 and 50·6.25e4 packets with 16 bytes each.
The profiling results for GNU Radio and C-coded system, while transmitting 6.25e4 and 50·6.25e4 packets with 16 bytes each, are shown in Table.3.10.
As is seen in Table.3.10, the OProfile does not produce much overhead, but about 0.05%-0.06% of overhead.
In Table.3.10, the samples which relate to GNU Radio, but do not related directly to the OFDM transmitter algorithm are all summarized and named as the ’GNU Radio -specifics’. The kernel runs with application names oflibgnuradio-core.so.0.0.0, _gnuradio_swig_py_general.so,_gnuradio_swig_py_runtime.so, but with the sym- bols names not related to the OFDM transmitter algorithm were considered as the ’GNU Radio -specifics’. Here the Python only specific kernel runs were not consid- ered. The GNU Radio specifics requires as large as about 20% more overhead then that of IFFT. The 7 of the 81 ’GNU Raido specific’ samples, in order of the number of samples collected are listed in Table.3.11.
GNU Radio C-coded CRC 3262 322 make packet 5275 216 modulation 10487 4628 insert preamble 1696 2427 IFFT 50515 19425
add cyclic prefix 21462 28
amplify 42971 2738
OProfile 121 10
GNU Radio -specifics 75115 -
Table 3.10: The CPU load of the each function unit, in number of samples captured from OProfile. Measured for GNU Radio and C-coded system using OPro- file, while they transmitted 6.25e4 and 50·6.25e4 packets with 16 bytes each.
Num. Samples app. name symbol name
17644 libgnuradio-core.so.0.0.0 gr single threaded scheduler::main loop() 3646 libgnuradio-core-so.0.0.0 gr block detail::input(unsigned int) 2543 gnuradio swig py runtime.so wrap gr py msg queue insert tail 2095 gnuardio swig py runtime.so wrap message from string 1678 libgnuradio-core.so.0.0.0 i686.get pc thunk.bx 1629 libgnuradio-gore.so.0.0.0 gr msg queue::insert tail
(boost::shared ptr<gr message>) 1542 gnuradio swig py general.so wrap gr ofdm mapper bcv sptr msgq
Table 3.11: The 7 most overheading GNU Radio specific runs.
The first line in Table.3.11 relates to the GNU Radio’s scheduler, second line to abstraction of GNU Radio C++ modules, third and fourth are the wrapper functions, fifth is not known yet, sixth relates to packeting, and seventh relates to the wrapper functoin of BPSK modulatoin.
In Fig.3.7 are the number of samples for GNU Radio OFDM transmitter algorithm compared with the number of samples for GNU Radio specifics. In Fig.3.8 are the 7 most overhead producing GNU Radio specifics shown.
Each GNU Radio specific application does not occur much overhead, though the overhead of 81 applicatoins summed together, the total overhead becomes bigger then the module of largest overhead of the OFDM transmitter algorithm itself.
Therefore it can be seen that most of the overhead compared to the C-coded system in GNU Radio comes from the GNU Radio specific runs.
Figure 3.7: Profile result for OFDM Tx algorithm of GNU Radio.
Figure 3.8: Profile result for GNU Radio specific applications.
of C-coded system. As is mentioned in Chp.2.2.1 and Chp.3.4.1, the IFFT of the GNU Radio is composed of 17 C++ funcitons. It was not straightforward to measure and collect all the IFFT related functions and kernel calls and to summarize. Therefore some IFFT related functions and kernel calls are missed in the measurment, specifi- cally memcpy() functions, which was not subordinated to the IFFT module.
Apart from the fact, that the profiling result with OProfile for GNU Radio was not precise, the number of the OProfile results in CPU seconds veries for each time it is profiled, and even the percentage of each process varies, sometimes quite much. The problem regarding OProfile is discussed before in Chapter3.1.1. A more accurate overview on GNU Radio framework can be remained as a future work.
4 Porting the System to the SFF SDR
Board
4.1 Decision of DSP/FPGA Function Mapping
While porting the OFDM transmitter system to the Lyrtech SFF SDR, some issues are considered for the arrangement of the system modules in DSP and FPGA.
As first, computational complexity for each module is considered for its decision of implementation in software or hardware. While in general software implementation is easier, several IP-cores are available from the FPGA vendors, which the users can use. Still the time for modifying the software and compiling it in software, and modifying a hardware model and generating bitstream differs very much for approximately 1 minute to 1-2 hours.
Second, the communication between the DSP and FPGA is considered. In the Lyrtech SFF SDR platform is the VPSS (Video Processing Subsystem) data port used as the communication interface between FPGA and DSP. VPSS data port operates with its maximum speed of 37.5 MHz with 16 bit data bus for each input and output. More details regarding the VPSS data port will be introduced in Chp.4.2.3.
In Fig. 4.1 is the relational behavior between the net bitrate of the OFDM trans- mitter system and the output data rate of its each module shown. The differences in the behaviors of Source, CRC, and Make Packet are so small compared to those of BPSK Modulation, Insert Preamble, IFFT and Add Cyclic Prefix, that they are not depicted in the figure. Insert Preamble and IFFT behave the same. Output data rate of VPSS, which is the maximum capacity of DSP→FPGA link, is drawn with blue line. As can be seen in Fig.4.1, the capacity limit of the bus determines the point in the data processing chain at which a FPGA implementation is required for a certain module.
For example, when the whole processing chains up to Add Cyclic Prefix is mapped in DSP, because of the maximum capacity of the DSP→FPGA link, the maximum net bitrate of the OFDM system can be to 37.500 Mbit/s. When only the Add Cyclic Prefix module is mapped in FPGA and the other data processing chains placed in DSP, the maximum net bitrate of the system can be to 46.875 Mbit/s.
With the considerations surveyed above, it is decided to put the Insert Preamble, IFFT and Add Cyclic Prefix in FPGA, and the remaining modules in DSP. CRC, Make Packet, BPSK Modulation are done in software in DSP. With this HW/SW co- design of the system, the output of BPSK modulation module is transferred over the
0 50 100 150 200 0 200 400 600 800 1000 1200 1400
Net Bitrate of OFDM System in Mbit/s
Output Data Rate for each Modules in MBit/s
Maximum Capacity
↓ of DSP→FPGA link
max. potential net bitrate with
cplx. transfer →
max. potential net bitrate with
real. transfer →
↑ in HW
↓ in SW
source BPSK Mod.
IFFT, Insert Preamble Add Cycl. Prfx. VPSS
Figure 4.1: Relational behavior between the net bitrate of the system and that of each module
VPSS from DSP to FPGA, with its maximum capacity of 600 Mbit/s, thus resulting the maximum potential net bitrate of the OFDM system to 93.750 Mbit/s, which is drawn with green line in Fig.4.1.
HW/SW co-design for Enhancement of Net Bitrate
Though to enhance the maximum capacity of the system, further considerations about FPGA design are made. While the detailed FPGA design will be discussed in Chp.4.2.2, as it influences the maximum capacity of the system, it will briefly be discussed here. Imaginary part of the output of the BPSK module is actually 0. Taking advantage of this fact, imaginary part of the output of the BPSK module is not transferred via the VPSS, but only the real part of the output is transferred over the VPSS to FPGA. The imaginary part of the BPSK output, which is 0, is read as a constant value in FPGA to be given as imaginary part input of IFFT module. With this design consideration the maximum capacity of output data rate of the BPSK module doubles, resulting the maximum capacity of the OFDM system to its double, to net bitrate of 187.500 Mbit/s. The maximum capacity of the OFDM system is drawn with red line in Fig.4.1. More details about FPGA model will be introduced later in Chp.4.2.2.