Parallel frequency search implementation

IFFT N N PCS FFT

3.3.3 Parallel frequency search implementation

Code NCO

fFPGA

NP NNC

fFPGA

FFT

N^PCS NFFT

IFFT

_NNPCSFFT

IFFT

NN^PCSFFT

IFFT

_NNPCS_FFT

Σ

^N^P

Σ

^N^P

Σ

^N^P

···

| |

···

Σ

^N^NC

Σ

^N^NC

Σ

^N^NC

M M

···

M Code

generator Carrier

NCO

FFT

_NNPCSFFT

sin/cos mapping

z^–1

···

z^{–(N –1)}^B

Figure 3.9: Implementation of the parallel code search architecture using shift in the frequency domain.

and so on and so forth. But in this case, the Doppler effect is more important, since with a Doppler range of ± 2 kHz, a shift of one quarter of a chip may happen after 192.5 ms only. This implementation is denoted PCS* in the following.

3.3.3 Parallel frequency search implementation

The direct implementation of the parallel frequency search method is given Fig. 3.10a. It follows closely Fig. 2.11, but as previously, we show the generation of the local signals and the non-coherent accumulation.

Following the same idea used in Section 3.3.1, we can duplicate the elements to have several branches in order to test several code delays simultaneously. The corresponding implementa-tion considering NB branches is shown in Fig. 3.10b.

Regarding the implementation of the elements, there is nothing new except the ping-pong buffer. This is a buffer, which has a writing order different from the reading order. This is because after the multiplexer there are first the first points of each branch, then the second points of each branch, etc., whereas the FFT should be first fed with all the points of the first branch, then with all the points of the second branch, etc. Also, since data can be written at addresses not yet read, it is necessary to use two buffers, one being read while the other is written to, which alternate their roles (which is often called a ping-pong buffer).

As mentioned in Chapter 2, according to the values selected for NAand N_{F F T}^{P F S}, only a portion

s_b(nT_s)

Σ

^N^A

sin/cos mapping

Code generator

Code NCO Carrier

NCO

| | Σ

^N^NC

FFT

NN^PFSFFT

(a)

N^PFS N_C N_NC N_FFT N_C N_NC N^PFS N_C

sb(nTs)

sin/cos mapping

Code generator

Code NCO Carrier

NCO

···

z⁻¹ z⁻¹

···

^MUX

fFPGA fFPGA

Σ

^N^NC

| | M

Σ Σ Σ

···

NB fFPGA NFBS NB fFPGA

N_FFT N_C NFBS NB fFPGA

FFTNN^PFSFFT

Ping-Pong buffer NA

(b)

Figure 3.10: Implementation of the parallel frequency search architecture, (a) direct, (b) using duplication.

of the FFT bins may be necessary to cover the search space. The number of bins kept is then denoted NF B S, which is equal to the number of frequency bins NF B only if the entire frequency search space is covered by the FFT. The rate after the FFT is thus reduced by N_{F F T}^{P F S}/NF B S. Finally there are the magnitude computation and the non-coherent accumulator based on memory blocks. The memory inside the non-coherent accumulator has NBNF B S

addresses in this case.

In this implementation, the resource usage of the logical elements is relatively similar to the serial search architecture because the accumulators are a little bit smaller and there is just one supplementary FFT, but the memory is far more used. However, there are two limitations with the implementation depicted in Fig. 3.10b. First, as for the serial search implementation, the number of branches is limited by the number of accumulations performed by the accumulator before the multiplexer. But here NAis much lower than NC. However, as before, this limit

may be easily circumvented by duplicating the multiplexer and the latter blocks. The second limitation is if zero-padding is used, because in this case the data rate after the FFT is higher than the data rate after the multiplexer. So, we should take care also to not obtain a rate higher than the FPGA rate. Otherwise, this limit can be circumvented in the same manner as before by duplicating the multiplexer and the latter blocks.

3.3.4 Parallelization

Now that the implementations of the different acquisition architectures have been described, we can determine the parallelization of each implementation, and the time needed to explore the entire acquisition search space.

With a coherent integration time of TC= NCTSand a number of non-coherent accumulations NNC, the total integration time is TT = TCNNC, and the number of samples to process is NCNNC. For a system running at the sampling frequency and using not any parallelism (i.e. the serial search without duplication), the time to test one code delay and one carrier frequency is thus NCNNCTS = TT, and the time to explore the entire search space is thus TTNC BNF B, with NC Bthe number of code bins and NF Bthe number of frequency bins.

Until now, we did not discuss the question of the data transition. So here, we will consider two cases, either the data are ignored which implies a loss, or the alternate half-bits method is used, which requires to double the length of the input signal (see (van Diggelen [2009] and (Psiaki [2001]) respectively for more details). To differentiate these two cases, we use a parameter d , which is equal to 2 for the alternate half-bits method, and 1 otherwise.

Now, if we consider a system running at a higher frequency, the processing time will be divided by a factor G_{F PG A}= ^f^{F PG A}_f_S . Then, if we consider a parallelism, the processing time will be divided by a factor P that indicates the number of bins processed in parallel. Therefore, the time to explore the entire search space is

TE= d TT

GF PG A

NC BNF B

P . (3.1)

Note that the time to load a new code or to modify the carrier and code frequencies on the channel, and the latency in the processing are not taken into account in this formula. Indeed, the loading time is very small, typically on the order of dozens of cycles. The latency is mainly due to the FFTs and corresponds to the size of the elements, i.e., a few thousands of clock cycles, whereas the input signal used is typically composed of hundreds of thousands of samples if high sensitivity is intended (since high sensitivity requires long integration times).

Therefore, the latency represents only a low percentage of TE. Note also that the time needed to store the signal into a memory before the processing (which is equal to d TT) is also not taken into account in Eq. (3.1).

For the serial search (SS), the parallelization comes from the duplication of the elements, and

corresponds to the number of branches implemented. Therefore, TE ,SS= d TT

GF PG A

NC BNF B

NB,SS

. (3.2)

For the parallel code search (PCS), the parallelization comes from the duplication of the elements and from the FFTs for the correlation. Therefore,

TE ,PC S= d TT

GF PG A

NC BNF B

N_{B,PC S}N_{F F T}^{PC S}. (3.3)

If the length of the FFT corresponds to the number of code bins, the equation simplifies to T_{E ,PC S}= d T_T

GF PG A

N_{F B} NB,PC S

In document Resource-efficient parallel acquisition architectures for modernized GNSS signals (Page 91-94)