Implementation Analysis of the Hardware/Software Codesign
6.2 Main contributions
to accurately estimate the UAV position as well as the maximum of three observed landmarks for the duration of the simulation. The Serial design oers up to a 1.3×
speedup and the Parallel (2 PE) design oers up to a 1.7× speedup over a Matlab implementation when three landmarks are being observed.
Chapter 5 presents implementation results for a variety of situations, including re-source usage, power consumption and timing latency, demonstrating the exibility of the codesign library which allows for dierent parameterisation schemes. The rst example application is the same nanosatellite application presented earlier with re-sults for the Serial design and the 1, 2, 5, 10 PE cases for the Parallel and Pipeline designs. The second application is a theoretical application where the number of ob-servation variables are greater than the number of state variables in order to explore any potential biases; results for the Serial design and the 1, 2, 5, 10 PE cases of the Parallel and Pipeline designs are given. The nal application is another theoretical application where the number of processing elements per module is varied; four dif-ferent parametermisation schemes for the Parallel design only are explored. A closer look at the impact of the number of processing elements on the sig_gen, predict and update steps in only the Parallel design is given; it is seen that the Cholesky Decomposition largely does not benet from more processing elements and acts as a drag on performance. Finally, the impact of the number of augmented state variables on the latency of the design is examined. The Serial design is shown to have simi-lar time complexity, O(M2.8), to microprocessor-based implementations of the UKF.
The Parallel design reduces to quadratic complexity for numbers of augmented state variables comparable to the number of processing elements but tends to O(M2.5)for larger numbers. For numbers of augmented state variables equal to the number of processing elements or less, the complexity is below quadratic.
6.2 Main contributions
The main contribution of this thesis is the hardware/software (HW/SW) codesign of the Unscented Kalman Filter (UKF). A need for fast and accurate state estimation
6.2 Main contributions
for small aerospace systems was identied. The need for high performance in these systems is oset by the desire to limit overall costs which leads to a reduction in available physical space, computing power and electrical power; it is strongly desired to simplify development processes as well. Hardware approaches, such as using a Field Programmable Gate Array (FPGA), can provide the level of performance required and, if using a System-on-Chip, can adhere to severe physical and electrical power constraints; however, FPGAs increase development complexity compared to software approaches and so do not necessarily reduce costs.
A HW/SW codesign takes the performance gains of a hardware approach and com-bines it with the exibility and portability of a software approach. The portability means the development costs of subsequent aerospace systems are reduced, potentially back down to feasible levels. When the HW/SW codesign methodology is applied to a prolic state estimation algorithm in the UKF, the result is a high performance state estimation implementation that is also widely applicable and could be used in a generic state estimation library.
The proposed HW/SW codesign of the UKF described in this thesis splits the application-specic and the non-application application-specic parts of the UKF algorithm and implements the application-specic parts in software while implementing the non-application-specic parts in hardware as a parameterisable IP core. This allows the HW/SW codesign to make use of the simpler software development processes when moving to a new application, while still enjoying hardware acceleration for the remainder of the algorithm. The proposed HW/SW codesign includes three variations: the Serial de-sign, the Parallel design and the Pipeline design. The Serial design is the most basic and only provides a direct implementation of the UKF; the Serial design uses the least amount of resources. The Parallel design makes use of parallelism in its major datapaths to provide performance boosts; the Parallel design can use a low or high amount of resources depending on the exact parameterisation scheme. The Pipeline design makes use of top-level parallelism, in addition to parallelised datapaths, to calculate multiple instances of the UKF at once; the Pipeline design uses the most amount of resources. The overall theme of these variants is that a system designer
6.2 Main contributions
can choose the balance between resources used and performance as they desire. Thus, the proposed HW/SW codesign is both a portable and scalable implementation of the UKF.
The proposed HW/SW codesign is implemented in two illustrative example applica-tions for validation. A nanosatellite application with two related situaapplica-tions, a single nanosatellite and a nanosatellite constellation, is presented. Here, the UKF is part of the attitude determination subsystem of the nanosatellite. The HW/SW codesign is found to completely replicate the UKF with no functionality issues and provides modest performance boosts over similar purely software implementations. The sec-ond example application is the state estimation part within a Simultaneous Local-isation and Mapping (SLAM) system on a small Unmanned Aerial Vehicle (UAV).
The HW/SW codesign once again provides modest performance boosts over purely software implementations. These two example applications are representative of the aerospace systems the HW/SW codesign is targeted at and they show the HW/SW codesign does indeed boost performance while retaining portability.
A series of deeper analyses of the HW/SW codesign's physical implementation is also presented. The HW/SW codesign is implemented for a variety of parameterisation schemes in three example applications. The implementation of the nanosatellite appli-cation used for validation is expanded, a theoretical appliappli-cation with a large number of observation variables is implemented before, nally, an application where the number of processing elements (PEs) varies between modules is implemented. Two further analyses on the eect of the number of processing elements and augmented state variables on the latency of the IP core are given. These example applications and analyses show the exibility of the IP core, allowing the system designer to optimise the performance of the IP core if they desire, but still providing adequate perfor-mance if they don't. They also show the HW/SW codesign, at worst, scales as well as an ordinary software implementation of the UKF but, at best, scales far better; the choice is up to the system designer to use resources to gain additional performance.
Thus, this thesis describes a scalable, portable, FPGA-based implementation of the UKF which makes use of HW/SW codesign techniques to provide a foundation for a