Digital Signal Processing

(1)

Digital Signal Processing

Jos´e Manoel de Seixas, Jo˜ao Baptista de Oliveira e Souza Filho, Rodrigo Coura Torres, Michel Pompeu Tcheou

Signal Processing Laboratory COPPE/EE/UFRJ Rio de Janeiro, CP 68504, 21945-970, Brazil E-mails: {seixas, nash, torres, pompeu}@lps.ufrj.br

1 Introduction

Signal processing techniques are vastly being used in experimental physics. This is particularly true in high energy physics experiments, which have become very complex along the years, due to the large amount of data to be processed in real-time, stringent noise requirements, extremely high event rates, etc.

The lab session described here aim at presenting an overview of signal processing techniques with emphasis in digital signal processing. As a practical example, a muon detection application is developed using statistical signal processing. Thus, a matched filter based detection system is implemented in digital signal processor environment.

This chapter is organized as follows. Section 2 presents a theoretical overview of signal processing with emphasis on digital signal processing. In Section 3, an introduction to digital signal processors (DSPs) is given, and, as an example, the ADSP-21160 processor from Analog Devices is presented, to-gether with its development tools. Section 4 presents the application example, giving, first, a quick introduction to matched filter theory. Section 5 compares the detection performance of the different discriminators developed and Sec-tion 6 describes the lab secSec-tion itself. Finally, in SecSec-tion 7 some conclusions are derived. The lab session guide is presented at appendixes A and B.

2 What is Signal Processing?

When we hear the word signal, we can think of a phenomenon that occurs during time and carries some kind of information, for instance: the speech, a heart beat, the average temperature over a city, the energy deposition of a particle in a calorimeter, etc. Most of these signals are continuous in time, but some can also be discrete, like the financial indicators in the stock market, for instance[18].

Signal processing techniques[17] aim at studying the behavior of signals, and how to extract useful information from them. When a physicist looks at the energy deposition of an unknown particle in a calorimeter, for instance, he observes the signal shape, duration, amplitude and other features that may

(2)

help him to figure out which particle reached the detector. Doing so, we can say that the physicist is performing a signal processing exploitation.

In electronics engineering, most of the signal processing is applied to electric signals. So, some kind of transducer is needed in order to convert a given phenomenon (temperature, sound, etc) into an electric signal, in order to allow the engineer to apply his techniques and retrieve the information he wants. A good example is a microphone, which converts a mechanical wave (sound) into an electric signal, which then can be applied to an amplification system. This example can be viewed as a trivial signal processing system[27]. 2.1 Digital signal processing

One difficulty in analog signal processing for continuous-time systems is that, for every target project, the engineer must develop some kind of cus-tomized hardware, which can significantly increase the project complexity. Dig-ital signal processing[25] emerges as a solution to that problem. It uses a digDig-ital (programmable) device in order to perform the signal processing task. Doing so, most of the project complexity remains on software development, minimiz-ing the need for customized hardware. The main advantages of digital signal processing are:

• Programmability: once a digital system uses programmable devices, it’s easier to change the system configuration or operation. For instance, changing a low-pass filter to a high-pass filter is just a matter of replacing the filter’s coefficients. This can be done without any kind of hardware modification.

• Stability: digital devices are less sensitive to noise, and other effects, like temperature change and ageing. A digital system can also be programmed to compensate for characteristic’s deviation of the analog parts used in a given hybrid design.

• Cost Reduction: since a programmable device can be configured to implement a large set of functions, the number of hardware components in the system can be reduced, and consequently, the overall cost of the project can be minimized.

• Implementation of Adaptive Systems: a digital device can easily adapt itself to environment changes. The adaptive algorithm simply com-putes the new operational values, which are stored into memory, erasing the previous values.

Any digital device can be used for digital signal processing applications. The most used options are:

(3)

• General Purpose Personal Computer (PC): a PC can easily be programmed, but its major obstacle for real-time or online signal pro-cessing is that it is developed for general applications, so that no specific optimization for digital signal applications can be made, since it would limit some other applications. The performance is increased, mainly, by increasing the clock speed, which results in increasing of power consump-tion, limiting the use of this processor for portable devices.

• Field Programmable Gate Array (FPGA): FPGAs are very fast digital devices, since they are programmed at logical port level. The execution time is simply the amount of time the input signal takes to reach the output. But a FPGA can be very complex to be programmed for some digital signal processing applications: matrix multiplication, for instance, would have to be programmed using logical instructions, which can, in practice, make the implementation rather cumbersome. Code maintenance and upgrade can also be difficult, since FPGA’s program language is considerably low level[14].

• Microcontroller Units (MCU): a MCU is a digital device that has some kind of inner structure for algorithm execution, like ALU, shifter and internal memory[23]. It can be programmed in Assembly language, but its inner structure is developed for simple applications such as control decisions after taking the comparison of the input signal with some kind of reference value. A more complex application, like a digital filter, would not be matched for this device.

• Digital Signal Processors (DSPs): a DSP is a programmable device that has specifically been developed for digital signal processing applica-tions, as it will be seen in the next section.

3 Digital Signal Processor (DSP)

A digital signal processor[24] is a digital device for which both hard-ware and softhard-ware are fully optimized for digital signal processing applications. Differently from a general purpose personal computer processor, which has to support a variety of applications, a DSP, as long as it will be used exclusively for digital signal processing applications, can explore some inherent characteristics of this type of applications, like:

• Large number of multiply and accumulate (inner product) operations. • Strong algorithm iteration.

(4)

So, the DSP inner structure was developed to efficiently perform these opera-tions in such a way that it is possible, in fewer clock cycles, to perform more complex computation tasks. As long as the clock frequency is strongly con-nected to power requirements, DSPs can perform more instructions per watt, which makes them very attractive for portable devices. The main features of DSPs are:

• Dedicated Hardware for Multiply and Accumulate (MAC): a DSP has a proper device to perform multiply and accumulate operations in one single clock cycle. A personal computer (PC) takes, typically, 10 clock cycles to perform the same operation. So, roughly, a DSP with 100 MHz clock frequency may perform the same amount of MAC operations per unit time as a PC with 1200 MHz clock frequency.

• Internal Memory with Multiple Buses: general purpose micropro-cessors do not have internal memory. Besides, they implement the Von Neuman architecture for memory access. This architecture allows only one memory access per clock cycle, since it has only one memory bus. A DSP has an internal memory connected with multiple buses (Harvard architecture)[2], therefore, in one single clock cycle, instruction and data operands can be accessed at the same time, at full clock speed, since the memory resides inside the chip.

• Strong Pipeline Utilization: pipeline is a computing resource that tries to break an instruction execution, which can take some clock cy-cles, in single-cycle execution blocks, which are executed in parallel[21]. Normally, an instruction goes trough three steps:

1. Fetch: where the instruction is read from memory.

2. Decode: where the processor adjusts its inner state (register set up, for instance) in order to execute the instruction.

3. Execution: where the instruction is finally executed.

So, without pipeline, each instruction would always require three clock cycles to be generated. Using pipeline facility, a register is inserted be-tween each step, so that, while the decode phase is being applied to an instruction n, the fetch phase is, at the same time, being applied to the instruction n + 1. With that processing optimization, for a linear code, the first instruction will take three clock cycles to generate its output, since the pipeline is initially empty, but after that, at every clock cycle, an instruction output will be generated. The pipeline concept can be visualized in Table 1. More details about pipeline in digital designs can be found in[19].

(5)

Without Pipeline With Pipeline

Cycle 1 2 3 4 5 6 7 8 9 Cycle 1 2 3 4 5 6 7 8 9

Fetch A B C Fetch A B C D E F G H I

Decode A B C Decode A B C D E F G H

Execution A B C Execute A B C D E F G

Table 1: Example of non-pipelined (left) and pipelined (right) systems. We can see that, for the pipelined version, after the first instruction, each clock cycle outputs the result of an instruction (instructions are represented by letters A,B,C,D,etc).

• Instruction Cache: DSPs store the most frequently used instructions in a cache memory next to its computational units, in order to avoid fetching them from the internal memory. Since a DSP can transfer data through its instruction bus, if the instruction has already been stored in the cache memory, the instruction bus can be used to transport data, allowing the DSP to access more data per clock cycle.

• Hardware Implementation of Iterations: Digital signal processing algorithms are normally very iterative. A general purpose microproces-sor usually has to implement, in software, the code that will test and decrement the counter, and the code responsible to finish the loop after the counter reaches zero. In DSPs, loop control is made by hardware, releasing the processing units from that overhead, which means that the DSP can be devoted exclusively to process data, increasing the system performance.

• Hardware Implementation of Circular Buffers: A circular buffer is useful in applications like digital filters, where the delay line must be updated every iteration. A circular buffer allows only one value to be updated at every iteration, decreasing the time required for output generation. Circular buffers are implemented with modular operations, and in general purpose microprocessors, these modular operations are performed in software. DSPs have an internal structure that implements circular buffers in hardware, increasing the overall speed of applications. An example of circular buffer operation is presented in Figure 1, where a vector of size 11 must be accessed continuously with an pointer increment of 4 (offset). We can visualize that the first accessed position is “0”, followed by the data stored in positions “4” and “8”. The next position, in a regular data access would be “12”, but since we are using a circular buffer, the next position is “1”. So, in circular buffers, the next access

(6)

Figure 1: Example of circular buffer operation.

position p(n+1) is given by taking the remainder of the following division: p(n + 1) = (p(n) + inc) ÷ size (1) where p(n) is the current vector position, inc is the pointer incremental value (which is, in this example, 4), and size is the vector size (11, in this example).

• Multiple Processing Elements: Some DSPs have multiple processing elements which allow the implementation of SIMD (Single Instruction on Multiple Data) algorithms[21]. In this mode, a given instruction is executed in each processing element, but it is applied to different data. As an illustrative example, consider a digital filter that will be applied to a stereo sound signal. With SIMD, each channel can be applied to a processing element, so the output of each channel is generated in parallel. Another use for the SIMD architecture is in vector operations that can be broken in smaller pieces. Suppose an inner product that has to be computed in a N -dimensional vector space. With SIMD, the two vectors can be halved and each half be applied to a processing element, so one processing element will calculate the partial result of the inner product on the first half, and the other processing element will perform the same operation on the second half. At the end, the two partial results are summed, generating the final result. So, with SIMD, this inner product would take only (N/2 + 1) clock cycles, instead of N clock cycles for a SISD (Single Instruction on Single Data) architecture.

(7)

provide instructions typically used in digital signal processing applica-tions, like multiplication with accumulation, 1/x, 1/√x, min, max, and others. All of them are typically executed in one single clock cycle. 3.1 The ADSP-21160 digital signal processor

As an example of a digital signal processor, we discuss the main features of the ADSP-21160 from Analog Devices[5]. The ADSP-21160 is a high perfor-mance 32-bit DSP from the SHARC family. SHARC stands for “Super Harvard Architecture” and is a family of high performance DSPs. Besides the existence of multiple buses for unconstrained data flow, to be a member of this family, the DSP must have a high performance processing core, executing each instruction in one single clock cycle.

The ADSP-21160 is a system-on-a-chip, containing an internal, dual ported memory, integrated I/O peripherals and also an additional processing element for SIMD support[8], which make this processor suitable for real-time demand-ing applications like industrial control, military, audio and medical.

The ADSP-21160 has the following main features:

• 80 MHz clock speed with peak performance of 480 MFlops: By means of internal optimization, its is possible to achieve peaks of 480 MFLOPS, which makes this DSP very attractive for real-time complex applications.

• 4 Mbit internal memory: This DSP has a large amount of internal memory, allowing the implementation of designs demanding deep mem-ory.

• 2 Watts power consumption: This represents low power consumption, when it is compared with general purpose personal computers (approxi-mately 12 Watts[22]), making this DSP an attractive solution for portable devices.

• Low dimensions (24 x 27 mm): It is possible to increase project portability, by reducing the overall project size.

The ADSP-21160 block diagram is presented in Figure 2. There, we can see four main blocks: core processor, internal memory, external port and I/O processor.

3.1.1 Core processor

The core processor of the ADSP-21160 consists of two processing elements (each with three computation units and data register file), a program sequencer,

(8)

(9)

two data address generators, a timer, and an instruction cache. All digital signal processing occurs in the processor core, which comprises the following units:

• Processing Elements: two processing elements (PEx and PEy) are available for this DSP to allow SIMD support. Each processing element contains three computational units (an ALU, a multiplier with MAC support and a shifter), and also a data register file. The computational units can process data in three formats: 32-bit fixed point, 32-bit floating point and 40-bit floating point.

The ALU performs arithmetic and logical operations, while the multiplier performs both fixed and floating-point multiplications, and also MAC operations. The shifter performs binary shifts, bit manipulation and exponent derivation operations on 32-bit operands. All computational units perform their operations in one single clock cycle, and all of them are connected in parallel, in order that the output of a computational unit can be the input of any other unit in the next cycle.

Each processing element contains also a data register file, which is a set of registers that are used to store intermediate results, avoiding the overhead of saving and restoring intermediate values from memory, and also releasing the memory buses for other data access operations. • Program Sequence Control: all the program execution is controlled

by this unit through its four functional blocks: program sequencer, data address generators, timer and instruction cache. This control unit is a powerful optimization, since it allows the processing units to focus only on data process, while this control unit provides all data and instruction needed by the processing elements.

– Program Sequencer: the program sequencer supplies instruction addresses to program memory. It also performs more complex oper-ations like loop control, so, this DSP can implement loops in hard-ware, with zero software overhead, since the program sequencer per-forms, in parallel, all loop management.

– Data Address Generators: the data address generators (DAGs) provide memory addresses when data are transferred between mem-ory and registers. Dual data address generators enable the pro-cessor to output simultaneous addresses for two operand reads or writes. One data address generator provides addresses for the pro-gram memory, and the other to data memory. These data address generators can be used to implement circular buffers, controlling

(10)

both the increment and modular operation of the pointers, reducing overhead, increasing performance, and simplifying implementation. – Timer: the programmable interval timer provides periodic inter-rupt generation. When enabled, the timer decrements a 32-bit count register every cycle. When this count register reaches zero, the ADSP-21160 generates an interrupt. The count register is automat-ically reloaded from a 32-bit period register and the count resumes immediately.

– Instruction Cache: frequently used instructions can be stored in a 32 word instruction cache. This feature is important, since once the instruction is found in the cache, the program memory bus is free to access data stored in the program memory, allowing the DSP to execute an instruction and access two data in the same clock cycle. The DSP automatically determines which instructions are elected to be stored in the cache.

3.1.2 Dual-Ported internal memory (SRAM)

The ADSP-21160 provides 4 megabits of on-chip SRAM, organized as two blocks of 2 Mbits each, which can be configured for different combinations of code and data storage. Each memory block is dual-ported for single-cycle, independent accesses by the core processor and I/O processor or DMA con-troller. The dual-ported memory and separate on-chip buses allow two data transfers from the core and one from I/O, all in a single cycle.

The internal memory can be organized in order to store instructions and data, although it is more efficient to store instructions in the program memory, and data in the data memory. This assures single cycle access to code and data. But data can also be stored in program memory, so they can be accessed together with data from data memory, if the instruction is already residing in the instruction cache.

The internal memory has also an additional bus connected to the I/O processor, allowing data to be accessed without impacting the accesses from the processing elements.

3.1.3 External port

The external port is usually used to connect the DSP with external memory modules, a host processor, etc. An address range of 4 gigawords off-chip address space is unified address space, so that the developer simply points to the data he wants, and the processor automatically determines whether data are located on-chip or off-chip, performing the data packing and transfer,

(11)

without core processor intervention. This allows the connection with storage devices, host processor or even another DSP with minimal additional hardware, minimizing the overall system cost.

• Host Processor Interface: the ADSP-21160’s host interface allows easy connection to standard microprocessor buses, both 16 bits and 32 bits, with little additional hardware required. The transfer can be per-formed at half of the internal clock rate and can be both asynchronous and synchronous. The data transfer can be done by DMA operation or direct by the core processor.

• Multiprocessor System Interface: the ADSP-21160, as well as the other members of the SHARC family, is optimized for multiprocessor ap-plications. The memory space is organized in such a way that the internal memories of all DSPs contained in a project are mapped onto each DSP. Therefore, the DSP automatically determines whether data required re-side on-chip or off-chip and take the necessary actions to perform the access. For multiprocessing purposes, the DSP has also a broadcast area, where once a datum is stored in that area, the DSP automatically sends it to all DSPs in the multiprocessing system.

3.1.4 I/O processor

The ADSP-21160’s Input/Output Processor (IOP) ports, six link ports, and a DMA controller. The I/O processor controls booting operations, allowing the DSP to boot from the external port (with data from an 8 bit EPROM or a host processor) or a link port. Alternatively, a no-boot mode lets the DSP start by executing instructions from external memory without booting. The I/O processor is divided into the following blocks:

• Serial Ports: the ADSP-21160 features two synchronous serial ports that can be used to interface with devices such as CODECs, sensors, and others. The serial port can operate at a maximum speed of half the clock frequency and data can be transferred automatically by using DMA transfers.

• Link Ports: the ADSP-21160 features six 8-bit link ports that provide additional I/O capabilities. Link ports are especially useful for point-to-point interprocessor communication in multiprocessing systems. The link ports can operate independently and simultaneously.

• DMA Controller: the DMA controller is responsible for direct memory access (DMA) transfers. DMA operations are quite useful, since they release the core processor from transferring data. An example of DMA

(12)

operation can be given if we imagine an application that performs real-time calculation of a 512-point FFT. Imagine that the data is arriving by the serial port, which is connected to a A/D converter. Without DMA, the DSP would have to wait for the transfer of 512 sampling values to start the FFT calculation, since the core processor would be responsible for getting the values received by the serial port, one by one. With the DMA facility, the core processor is released from that operation, since each arriving value would automatically be transferred from the serial port to the internal memory. So, while the DSP is performing the FFT calculation on one block of data, the DMA controller is, at the same time, organizing the values of the next block, in such a way that, after the DSP finishes its calculation, the next block of data is already available.

3.1.5 JTAG port

The JTAG port supports the IEEE 1149.1 Joint Test Action Group (JTAG) standard for system test. This standard defines a method for serially scanning the I/O status of each component in a system. Emulators use the JTAG port to monitor and control the DSP during emulation. Emulators using this port provide full-speed emulation with access to inspect and modify mem-ory, registers, and processor stacks. JTAG-based emulation is non-intrusive and does not affect target system loading or timing. Several JTAG emulator boards are available today for use with DSPs[3]. hese boards can be attached to general purpose personal computers, providing a simple way of code debug-ging and replacement through the DSP’s JTAG port, even when the DSP is already inserted into some kind of customized hardware.

3.2 ADSP-21160 EZ-KIT lite evaluation board

When a DSP based system is developed, some design steps are followed, in order to optimize system implementation. Firstly, the performance of the chosen processor is evaluated by means of software simulation. This is useful once it allows to debug the code and evaluate whether the DSP is suitable for the application. However, the simulation phase can not fully evaluate how the DSP will interact with other devices in the actual application being developed. So, although it is possible to determine whether the DSP is suitable for the application, it is not possible to test the entire application, since simulation of connection between the DSP and other devices is very limited, requiring the development of a physical prototype.

To overcome this problem, and still maintain the cost of a prototype as low as possible, evaluation boards are produced by DSP manufacturers and

(13)

Figure 3: Block diagram of the ADSP-21160 EZ-KIT Lite evaluation board. Extracted from[7].

also by third-party companies. Since these boards are mass produced, their final cost is considerably low.

So, the developer can develop a prototype using an evaluation board, which will allow the connection between the DSP and other devices.

The block diagram of the ADSP-21160 EZ-KIT Lite Evaluation Board[7] is presented in Figure 3. This evaluation board contains a ADSP-21160 DSP for evaluation and some standard devices required in a design:

• External memory modules (4 Mbits total).

• Flash memory module for stand-alone applications (4 Mbits total). • CODEC with maximum sample rate of 48 kHz.

• JTAG connector.

• Indicator LEDs, controlled by software. • Switches for hardware interrupt generation. • Link port and serial port connectors.

(14)

Figure 4: VisualDSP++ development environment.

• Connectors for other devices not contained on the evaluation board. 3.3 Development tool

For the implementation of a DSP based project, it is essential the use of a software tool that minimizes the time required for project development. The Visual DSP++ provides complete graphical control of the edit, build, and debug process. In this integrated environment, you can move easily between editing, building, and debugging activities[10]. In Figure 4, a debug window of such development tool is presented, showing parts of a digital filter project. The main features of this tool are[9]:

• Easy-to-use debugging activities: in one single user interface you can perform edit and debug activities. You can also simulate processors and switch easily between them, allowing quick initial validation of DSPs. • Multiple language support: you can debug programs written in C,

(15)

C++, or Assembly languages, and view your program in machine code. For programs written in C/C++, you can view the source in C/C++ or mixed C/C++ and Assembly, and display the values of local variables or evaluate expressions (global and local) based on the current context. • Effective debug control: you can set breakpoints on symbols and

ad-dresses and then step through the program execution to find problems in coding logic. You can set watchpoints (conditional breakpoints) on reg-isters, stacks, and memory locations to identify when they are accessed. • Tools for improving performance: you can use profiling tools to

identify system bottlenecks and to identify program optimization needs. You can plot vectors to view data arrays graphically. You can generate interrupts and inputs and outputs (data streams) as well, to simulate real-world application conditions.

• Multiprocessor debugging: you can easily manage and debug any number of processors from the same debug session. Fully synchronous multiprocessor operations such as step, run, and halt allow cycle-accurate debugging of complex systems. You can arrange processors into logical groups for better control of system behavior.

• Support for Real-Time Operational System: this tool offers the re-sources needed for implementation of real-time operational systems[13], in order to make possible for the DSP to execute multiple complex tasks simultaneously, either at a single processor or by means of a set of pro-cessing elements controlled by a master processor.

• Third-party function libraries: the development tool contains a set of digital signal processing common functions like FFT, FIR and IIR filters, matrix operations and others. All functions are written in hand made Assembly language for maximum performance, exploring all the processor’s resources. These functions can even be called from C/C++ functions.

4 A Matched Filter for Muon Detection

In order to handle the huge amount of data generated by LHC[16], a sophisticated on-line triggering system is being developed for the ATLAS experiment[15], one of LHC main detectors. The trigger system is divided in three levels of operation and should reduce dramatically the background noise rate. The first level (LVL1) accepts the full LHC bunch-crossing rate (40 MHz) and should produce a maximum rate of 100 kHz. To perform this task, LVL1

(16)

makes use of the calorimeter information with reduced granularity (through trigger tower signals) and fast muon detectors[1]. Both the second and the third levels reduce further the acceptance rate to ≈ 100 Hz[12].

In order to reduce the rate of fake triggers, LVL1 is considering to use an additional muon trigger, which will be provided by the scintillating tile calorimeter (Tilecal). As a case study, we propose the development of a muon detection system based on Tilecal information, which would be ready to satisfy LVL 1 stringent requirements, in terms of speed and detection efficiency. Two main aspects concerning system development will be addressed: filter design and full detection system implementation. In the following, we describe the main characteristics of the hadronic calorimeter detector and the muon signal it produces. After, we cover some methods for signal detection and detail the main results of the proposed matched filter design approach. Finally, we discuss the implementation of the system on a DSP platform. It should be mentioned that the actual design may go analog, in order to profit from the analog signal provided by Tilecal, so that processing speed can be optimized.

4.1 Hadronic calorimeter (TILECAL)

The Tile calorimeter (Tilecal) consists of a cylindrical structure divided in 64 modules. In the longitudinal direction, Tilecal is divided in three parts: the barrel and two extended barrels[11]. Tilecal modules are segmented in three sampling layers in the radial direction and each one comprises 45 cells for the barrel and 16 cells for the extended barrel with double signal readout, which means that each cell provides two electrical signals. The detector layout is shown in Figure 5. ∗_.

The signal from the third sampling layer (D-cells) is furnished by Tilecal as a muon trigger.

Muons deposit very small energy levels on the calorimeter. Therefore, muon detection using Tilecal signal will be performed under severe low signal-to-noise ratio (SNR) conditions. An important feature of the muon energy deposition on the calorimeter is that the amount of energy deposited in the third layer cell of the barrel modules is smaller than that deposited on the extended barrel ones, due to different length, of the muon path. Thus, in the extended barrel, the signal to noise ratio is considerably better with respect to the barrel.

∗_{Collider experiments typically make use of a specific cylindrical coordinate}

system: (η,φ,z). Direction φrepresents the rotation around the collision axis, z, and η, which is known as pseudorapidity, corresponds to the projection direction of a particle produced in a given collision.

(17)

R (m) 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0. 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Z (m) 2.0 2.5 3.0 3.5 4.0 η A12

A13 A14 A15 A16

A11 B10 D5 D4 D6 B11 B12 B13 B14 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 C1 C2 C3 C4 C5 C6 C7 C8 B9 D1 D2 D3 D0 B1 B2 B3 B4 B5 B6 B7 B8 B15 C9 TILECAL CELLS A12 C 9 A13 B 1 1 B 1 2 A14 B1 3 A15 B1 4 A16 B1 5 D4 D5 D6

Figure 5: Detector layout.

4.2 Muon Signal

At CERN, an experimental setup has been used for data acquisition of Tilecal signals during detector calibration runs. Muon signals were acquired from both barrel and extended barrel modules of Tilecal (see Figure 6). The trigger tower signals were acquired by a FAST ADC (40 MHz) with 12-bits resolution for the amplitudes. A total of 16 samples were produced for each event. Since muon signals are severely corrupted by background noise, the online detection is a critical process at LHC conditions. Therefore, an optimal filter detection for muon signal, which maximizes the signal to noise ratio, looks attractive.

Our database is formed from experimental projective data collected at η = 0.45 (D2 cell), comprising 19907 (muon data) and 19896 (background noise) events. Due to a problem in the acquisition system, only 14 from the 16 samples available for each event were used in the filter design. In order to improve signal-to-noise ratio, we have also added the two signal readout from each detector cell.

4.3 Muon signal detection

For comparison, we consider three muon detection approaches: peak value and both deterministic and stochastic matched filters. They will be briefly described in this section.

(18)

(19)

4.3.1 Peak detector

Peak detector is a simple and useful technique for signal detection. For an incoming event, the detector compares the amplitude of the pulse signal with a detection threshold, which is determined from the background noise distribution. If the peak value is above the threshold, a muon signal is detected. Although simple, peak detection results in classifiers with reasonable efficiency in many applications.

4.3.2 Matched filters

Matched filter detection treats the received signal using the classical hypothesis theory[20]. In our case, two hypothesis (binary detection) should be considered:

H1: r(k) = s(k) + n(k), k = 1 . . . N (2)

H0: r(k) = n(k) (3)

where s(k) and n(k) are samples of muon and noise signals, respectively. For a given acquisition window, the number of samples (N ) is fixed (14, in our case).

The decision is based on the evaluation of the likelihood ratio: Λ(r) = fr|H1

f_r|H0

(4) which is compared with a detection threshold (η). In this formula, f_r|Hi repre-sents the distribution of the received signal under H1hypothesis[26]. Classical

criteria to define the η value have been used[30]: • Bayes

Using Bayes criterion, costs are assigned to decisions, producing the fol-lowing equation:

η = P (H0).(C10− C00) P (H1).(C01− C11)

(5) where P (Hi) is the probability of hypothesis Hi to occur and Cij is

the cost assigned to the decision taken in favor of a hypothesis i when hypothesis j is actually true.

• MAP (Maximum a posteriori probability)

Consists on a simplification of Bayes decision rule, considering the costs for correct and wrong decisions equal to zero and one, respectively. The inconvenience of both Bayes and MAP criteria is that they require the a priori probabilities, which are normally not available.

(20)

• Minimax

This rule is normally used when the costs are known but a priori proba-bilities are not. The decision threshold is derived minimizing the expected cost corresponding to the worst-case value of P (H1).

• NP (Neyman-Pearson)

In NP criteria, η is chosen in order to maximize the detection probability, keeping the false alarm probability below some specified value.

To derive the decision equations, typically, both signal and noise are con-sidered as multivariate Gaussian processes. Using the deterministic approach, the following approximations are done: s(k) is estimated from the mean value of r(k) under H1hypothesis. The noise may be considered white[28], producing

the following decision rule:

N X n=1 r(n).s(n) > 1 2 N X n=1 s(n)2 −N o 2 .ln(η) (decide for H1) (6) where N o

2 is the noise variance. As it can be seen, the deterministic approach

is attractive due to its inherent simplicity.

Using the stochastic approach, the muon signal (s) is treated as a ran-dom process (that is actually the case) and as such, it is expanded using the Karhunen-L¨oeve (KL) series[26]. KL series is a Fourier like series that decou-ples both time and statistical dependencies from the random process. Con-sidering Ks as the correlation matrix of s, Ks can be factored as (similarity

transformation)[29]:

Ks= Φs.Λs.ΦTs (7)

where Λ is a diagonal matrix whose entries are the eigenvalues (λi) of Ks,

sorted in decreasing value. The columns of the matrix Φ correspond to the eigenvectors of Ks.

Using the directions of Φ that are more relevant in terms of signal rep-resentation (higher energy, i.e, higher eigenvalues), s can be reconstructed by truncating the KL series at the M -ary term, as follows:

sM = ΦM.ΦTM.s (8)

where ΦM is formed by the selection of such more energetic columns of Φs.

The stochastic approach produces the following decision equations[31]:

(21)

where: IR= 1 No . M X i=1 _λ2 i λ2 i + No 2 . N X n=1 (r(n).φi(n))2, M << N (10) ID= M X i=1 ₁ λ2 i + No 2 . N X n=1 r(n).φi(n).m(n).φi(n) (11) (12) for which m(n) is the mean value of s(n). Forming the detection value (d), ID

and IR respond for the detection of the deterministic and random parts of the

signal process, respectively. It is interesting to observe that components with higher signal to noise ratio, i.e with higher values for λ

2 i λ2 i+ No 2 , contribute more significantly to the IR part.

Sometimes the covariance matrix of the noise (KN) shows that the noise

process is not white. The matched filter is only optimal in the sense of signal-to-noise ratio if the signal to be detected is corrupted by additive white noise. Therefore, when the signal is corrupted by colored noise, a whitening filter design is required, so that the detection task is again performed in a white noise environment[32]. In this case, the filter should be matched for the signal that comes from the output of the whitening filter, when the signal to be detected arrives to its input node. Using the whitening filter, the received signal is linearly transformed according to the formula:

rw= W.r (13)

where W is the whitening transform given by: W = Λ−

1 2

N .Φn (14)

considering ΛN and Φn matrices related to the similarity transform of the

covariance noise matrix. 5 Detection Performance

We compare the three detection approaches discussed above for the muon detection problem: peak detector and deterministic and stochastic matched filters. The stochastic approach used a whitening filter for the background noise. The decision criteria adopted was Neyman-Pearson, for which a value of 10 % of false alarm probability was fixed. Results are summarized in Table 2. As we can see, stochastic matched filter has the best performance among the

(22)

three approaches. It is interesting to observe that the matched filter technique permits the implementation of robust detection systems, even if the signal distribution is not Gaussian and the noise is not white, as it is actually the case for the muon detection problem.

The selection of the detection criterion is dependent on the application requirements. If a simple and easy implementation is needed, peak detector can be a good choice. If performance is critical, stochastic matched filter is indicated. Deterministic matched filter is an intermediate solution, exhibiting high detection performance allied to easy and fast implementation.

Method Efficiency

Peak Detector 88.5 % Deterministic MF 90.5 % Stochastic MF 93.5 %

Table 2: Comparison of detection methods: detection efficiency for 10% false alarm probability.

6 The Lab Session

The work for each lab session was split in two phases: the first concerns the design of the digital matched filter and the second covers its implementation on the DSP platform. Both phases were divided in several steps in order to allow students to understand the concepts and the techniques used. For each step, students run a given code, analyze performance plots and answer some questions.

6.1 Matched filter design

For the matched filter design, the student works with three didactic modules: data analysis, whitening filter design and matched filter development. For this, scripts were developed for the MATLAB environment.

For the data analysis module, students perform both quantitative (ob-serve signal shape, duration and amplitude) and qualitative (plot histograms and parameters) analysis for the muon signals. Figure 7 illustrates the dis-tribution of the sampling values of the muon signal. Students have to face questions concerning the signal-to-noise ratio in this experiment and whether the distributions for muon and background events would be Gaussian or not.

For the whitening filter module, the students have to observe the co-variance matrix of the noise distribution. The students should characterize it

(23)

0 100 200 300 400 500 600 0 1 2 3 4 5 6 7 8x 10 4 Amplitude Value Incidence

(24)

(whether it comes from a white process or not) and give a rough estimate of the noise variance. They should also understand the resulting covariance matrix produced when the whitening filter is applied.

For the design phase, students analyze histograms for d-value (Equation 9) considering muon and background events, aiming to establish an efficient range for the detection threshold (η). This first evaluation for d-value considers all components (M = N ) in the KL series expansion. In order to select η by the Neyman-Pearson criterion, students evaluate a design curve, which gives how the detection (PD) and false alarm (PF) probabilities vary as a function

of η value . This plot is reproduced in Figure 8.

0 1 2 3 4 5 6 7 8 9 10 0.4 0.5 0.6 0.7 0.8 0.9 1 Eta PD/PF PD PF

Figure 8: Detection threshold selection: how both PD and PF vary as function

of η.

Using the selected η, students can plot the probability of detection as a function of the number of components (NC ), selecting the minimum num-ber of components for achieving a given level of detection efficiency. This is reproduced in Figure 9. Since the η value is dependent on N C, students inter-act to design an optimal filter that comply both with complexity (number of components) and performance requirements.

(25)

0 2 4 6 8 10 12 14 0.94 0.95 0.96 0.97 0.98 0.99 1 Components PD

Figure 9: Probability of detection as function of the number of retained compo-nents in KL expansion.

(26)

6.2 DSP implementation

In this phase, the students firstly analyzed the implementation of a muon detector based on peak detection. The system was coded in C language and used VISUAL-DSP software and EZ-KIT development board. Some relevant aspects concerning the implementation are discussed with students, such as internal architecture of DSP devices, basic operation of DSP tools, why we use a DSP for this detection system, code compilation and optimization, perfor-mance analysis, and real-time applications development. The students become familiar with VISUAL-DSP interface and have to explain how the peak detec-tor code works, evaluating its computational cost. In addition, they have to define the maximum event rate achieved by this detection system when it is compiled optimization switches.

In the sequence, a prototype of a muon detection system based on matched filters is shown. This prototype implements the detection system, illustrating the online operation of the system. System implementation uses the devel-opment board EZ-KIT Lite[4], which is based on ADSP-21065 device, and it is quite similar to EZ-KIT platform. Student, through push-bottoms, would select the signal type (muon or background) and the amount (single or burst) of events to be processed by the system, while the LEDs flash according to classification results.

7 Conclusions

In this lab session, we covered some aspects of digital signal processing. Digital signal processors were presented and compared to other devices such as FPGAs, microcontrollers and general purpose processors. A detailed overview of ADSP-21160 DSP architecture and its development tools was also given.

As a practical design, the lab detailed the implementation of a muon detection system, using data from Tilecal. The theory of signal detection and practical aspects concerning the development and operation of the detection system were also discussed.

A Data Exploitation and Filter Design Objective:

The student will implement a muon detector system based on a matched filtering technique. Experimental calorimeter data will be used for filter devel-opment. Several stages which concern the development of the matched filter (data analysis, specification and performance evaluation) will be exercised. Module I: Data Analysis

(27)

In this module, the student will observe the waveforms associated to physics events and background noise.

• Step 1 - Analyzing events qualitatively Type the command show events in MATLAB command window. This command loads the dataset used and produces some useful plots. Data are loaded in two matrices: physics (muon data) and non-physics (background noise). Each line in the ma-trices corresponds to one single event. Observe the waveforms. Describe them in sense of shape, time duration and amplitude. Is the signal-to-noise ratio (SNR) high?

• Step 2 - Analyzing probability distributions Use the command show pdf to show event distributions. Characterize the observed distributions. Us-ing the zoom facility of each graphic window, evaluate the mean and RMS values for physics and background events. Should they be Gaussian? Module II: Whitening filter design

Noise characterization is fundamental for proper matched filter develop-ment.

• Step 1 - Characterizing white noise covariance matrix. Type the command produce cov white noise. This command will exhibit a co-variance matrix for a synthesized white Gaussian noise. Using the command show cov reference noise, observe the graphic produced.

• Step 2 - Observing the covariance matrix for noise. Repeat Step 1 above using the commands: produce cov non physics and

show cov non physics. Observe the covariance matrix produced. Might you say that background events are white in practice? Give a rough estimate for the variance of noise events.

• Step 3 - Whitening Principal Components Analysis (PCA) can be used for whitening background noise events. Consider x as an arbitrary event (column vector). PCA will produce a transformed vector y according to the rule:

y = Tx, (15)

This projects incoming events onto directions (eigenvector of covariance matrix) of increasing energy representation. For whitening operation, the covariance matrix of the transformed events y can become white if the T matrix is chosen as follows:

T = Q.Λ−1

(28)

• Step 3.1 - Observing original and transformed events. Type the command: show event transformed. Observe the graphics.

• Step 3.2 - Showing covariance matrix of transformed events. Re-peat Step 3.1 using the commands: produce cov transformed events and show cov transformed events. What do you observe? Is the noise process white now?

Module III: Filter Design

Matched filters produce a decision variable whose evaluation permits the detection of target events. For stochastic signal detection, the decision variable (d) is given by:

d = IR + ID (17)

where these components correspond to the random and deterministic parts of the process.

n incoming event is considered as signal if

d > η (18)

where η is the decision threshold. For our problem, physics events that satisfy the last equation contribute to the detection probability (PD). The noise signals which also satisfy this equation are misclassified as signal and contribute to the false-alarm probability (PF).

• Step 1 - Defining the decision threshold (all components) In order to establish the decision threshold the Neyman-Pearson (NP) criterion is often used. According to NP, η is chosen to maximize the de-tection probability while maintaining PF below a specified value. Type show hist eta. Observe the produced histograms and the messages pre-sented in the MATLAB command window. Define a range for η. • Step 2 - Choosing ETA (all components)

Type start eta = v1 and end eta = v2 with v1 and v2 values determined as from Step 1 above. Type choose eta. Analyze the plots shown and define a η value for a false-alarm probability fixed to 10%. Compare PD and PF for the chosen η value.

Type plot ROC curve. Explain the resulting graph.

• Step 3 - Establishing the number of principal components (signal compactation) It is necessary to determine the number of principal components used to produce ID and IR variables. Type eta = v1(v1 value defined in the last item). In the following, type evaluate nc mf. Identify

(29)

the number of relevant components necessary for a proper operation of the matched filter.

• Step 4 - Design your matched filter. Choose the number of ponents and the η value. Evaluate filter performance by com-puting PD and PF.

OBS: In order to modify the number of components in ID and IR computation, you should modify the N C variable. Example: typing N C = 2 <ENTER> in MATLAB command window, you select two components.

B DSP Execution

Module IV: Running a Peak Detector in DSP Objective:

The student will execute an online muon detection system using signal peak information. This system was built using DSP technology.

vspace0.2cm Basic Understanding of the Application: • Using Visual DSP Interface, open the project peak detector.dpj. • Observe the main routine. What does it do?

Performance Analysis and Optimizations:

In this module, we evaluate the computational cost of the proposed de-tection system. The functions init computational cost measure and

measure computational cost will be used to estimate the number of machine cycles involved in peak detection.

• Step 1 : Compile the application and verify the number of machine cycles used for the routine which detects the peak of the incoming events. For a clock frequency of 80 MHz, what is the maximum event rate achieved? • Step 2: Go in PROJECT > PROJECT OPTIONS > COMPILE. En-able the flags: INTERPROCEDURAL OPTIMIZATION and ENABLE OPTIMIZATION. Re-compile the application and repeat Step 1.

(30)

References

[1] Level-1 trigger technical report. Technical report, ATLAS TDR-12, 1998. [2] Jj Gg Ackenhusen. Real Time Signal Processing: Design and

Implemen-tation of Signal Processing Systems. Prentice Hall, 1999.

[3] ANALOG DEVICES. jtag emulator boards.

http://www.analog.com/processors/resources/crosscore/emulators/index.html. [4] Analog Devices. ADSP-21065L EZ-KIT Lite Evaluation System Manual,

3 edition, December 2000.

[5] Analog Devices. ADSP-21160M Datasheet, 2001.

[6] Analog Devices. ADSP-21160: SHARC DSP Hardware Reference, 2 edi-tion, May 2002.

[7] Analog Devices. ADSP-21160 EZ-KIT Lite: Evaluation System Manual, 3 edition, January 2003.

[8] Analog Devices. Visual DSP++ 3.0 Manual: C/C++ Compiler and Li-brary Manual for SHARC DSPs, 4 edition, January 2003.

[9] Analog Devices. Visual DSP++ 3.0 Manual: Getting Started Guided for SHARC DSPs, 4 edition, January 2003.

[10] Analog Devices. Visual DSP++ 3.0 User Guide for SHARC DSPs, 4 edition, January 2003.

[11] ARIZTIZABAL, F., et al. Calorimeter with longitudinal tile configuration. Nuclear Instrumentation and Methods, A349, 1994.

[12] ATLAS HLT/DAQ/DCS Group. Atlas: High-level trigger data aquisition and controls. Technical report, CERN, October 2003.

[13] Arnold Berger. Embedded Systems Design: An Introduction to Processes, Tools and Techniques. CMP Books, 2002.

[14] Stephen Brown and Zvonko Vranesic. Fundamentals of Digital Logic with VHDL Design. McGraw-Hill, 2000.

[15] CERN. The atlas home page. http://atlas.web.cern.ch/Atlas/Welcome.html.

[16] CERN. The large hadron collider project.

(31)

[17] Jj Hh Mc Clellan, Rr Ww Schafer, and Mm Aa Yoder. DSP First: A Multimedia Approach. Prentice Hall, 1998.

[18] Paulo Sergio Ramirez Diniz, Eduardo Antnio Barbosa da SILVA, and Ser-gio Lima Netto. Digital Signal Processing: System Analysis and Design. Cambridge University Press, 2002.

[19] Dd Dd Gajski. Principles of Digital Design. Prentice Hall, 1997.

[20] C.W. Helstrom. Elements of Signal Detection and Estimation. Prentice-Hall, 1995.

[21] Kenn Hwang and Felix Antony Briggs. Computer Architecture and Parallel Processing. McGraw-Hill, 5th edition, 1989.

[22] Intel. Intel Pentium M Processor on 90nm Process with 2-MB L2 Cache, May 2004.

[23] John Iovine. PIC Microcontroller Project Book. McGraw-Hill, 2000. [24] Gg Marven and Gg Ewers. A Simple Approach to Digital Signal Processing.

John Wiley & Sons, 1996.

[25] Alan Vv Oppenheim and Ronald Ww Schafer. Discrete-Time Signal Pro-cessing. Prentice Hall, 1989.

[26] A. Papoulis. Probability, Random Variables and Stochastic Process. McGraw-Hill, 1984.

[27] Adel Ss Sedra and Kenneth Cc Smith. Microelectronic Circuits. Oxford University Press, 1998.

[28] K. Sam Shanmugan and A. M. Breipohl. Random Signals - Detection, Estimation and Data Analysis. John Wiley & Sons, 1988.

[29] G. Strang. Linear Algebra and It’s Applications. Saunders, 1980.

[30] H.L Van Trees. Random Signals - Detection, Estimation and Data Analysis - PART I: Detection, Estimation and Linear Modulation Theory. John Wiley & Sons, 1968.

[31] H.L Van Trees. Random Signals - Detection, Estimation and Data Analysis - PART III: Sonar Signal Processing. John Wiley & Sons, 1971.