Optimizations for Specific Curves or Algorithms

6.5 Literature Review of ECC Implementations

6.5.4 Optimizations for Specific Curves or Algorithms

over all sets of parameters, e.g., coordinate systems. However, structures optimized for certain operations have been presented and they provide considerable speed increases with the expense of reduced generality. A specific structure for Montgomery point multiplication was presented in [241]. Point addition and point doubling are computed in parallel with processing units build around one multiplier and several registers. Control logic is very simple: Only the bit ki

is used for selecting the inputs of point addition and point doubling processing units, see Alg. 6.2. However, the structure does not include logic for (6.12), i.e., it outputs only X and Z coordinates of the result point, Q. Similar approach was presented in [262], but their implementation included a specific processing unit also for coordinate conversions.

The structure of Fig. 6.2 can be optimized for Koblitz curves by attaching squarers to the registers holding Q [178]. This enhances the performance on Koblitz curves without sacrificing generality. The architecture of VI can be seen as an adaptation of the ideas of [241, 262] to Koblitz curves, although the architecture also utilizes point operation interleaving allowing even further improvements in speed. The structure of VII builds upon VI and introduces specific units for precomputations, τ-adic conversions, for-loop computation, and coordinate conversion.

in [144], can be seen as optimizations for specific algorithms. For example, a(x)b(x)/c(x) mod p(x) is useful for point operations in A, but it has little use with projective coordinates.

6.5.5 Comparisons

Details of published ECC designs have been collected into Table 6.3. Fair com- parison is extremely difficult (confer AES implementations) and all issues discussed in Sec. 5.3.3 cause difficulties also for ECC comparisons. However, com- paring ECC designs is even more difficult because the variety of implemented algorithms is larger and this variety has great impacts on implementation results. For instance, curves and field sizes have major influences on both speed and area requirements, as can be seen, for example, from the tables in VI which list implementation results with different field sizes. Nevertheless, certain implementations that represent the state-of-the-art are given in the following together with estimates of how designs presented in III–VII, X, and XI compare with other published designs.

To the author’s knowledge, the fastest design for general curves was presented by Chelton and Benaissa in [57] where point multiplication time 19.55 µs was achieved on NIST B-163 with a Virtex-4 FPGA. The fastest design for Koblitz curves is given in VI for NIST K-163 on which point multiplication takes 4.91 µs in a Stratix II FPGA excluding τ-adic conversions. When throughput is considered instead of computation time, the implementations presented in V and VII outperform other published designs, because they are the only ones optimized for high throughput, with the exception of [270], i.e., throughput is simply the inverse of computation time for other designs. The most compact implementations given in Table 6.3 use parameters which are insecure, but compact implementations using secure parameters are given, for example, by Liu et al. in [173] and Orlando and Paar in [220]. Compact implementations for both general and Koblitz curves are listed also in IV. However, very compact FPGA-based implementations are still missing from the literature. Support for arbitrary irreducible polynomials was achieved efficiently by Eberle et al. in [87] with a technique called partial reduction. They also included fixed reduction circuitries for specific irreducible polynomial in order to accelerate their computation [87, 114, 115]. When support for both F2m and Fp is needed, a good

option is the implementation presented in [255].

Table 6.3 shows that the designs of this thesis compare favorably with other designs from the literature. The implementations are faster than most other published implementations and, in fact, VI and VII utilizing Koblitz curves represent the fastest published ECC implementations. The fastest general curve implementation, which was presented in [57], outperforms the implementations presented in IV, but it used polynomial basis whereas IV uses a normal basis. Table 6.3 also shows that Koblitz curves give considerable enhancements in speed compared to general curves. They are more than twice as fast as general curves even including the time of τ-adic conversions. Koblitz curves are therefore highly feasible alternatives in applications requiring very fast computation.

The area requirements of implementations of this thesis are quite large. They prevent using the implementations in constrained applications, but the area requirements are in line with other published implementations. Besides, the values given in Table 6.3 reflect only the area requirements of the fastest implementa-

tions and smaller implementations were also presented in the publications. All architectures presented in the publications are also scalable in the sense that speed can be traded off for smaller area. In the publications, the target has been in high speed which has resulted in large area requirements but, for example, Tables VII and VIII of IV show that compact designs can be realized with the same architectures.

The use of fixed parameters, especially, fixed fields, give a large speed advan- tage compared to flexible designs and, hence, the implementations of this thesis are not comparable with flexible designs. It has been shown that fixed designs are at least about twice as fast as flexible designs [87]. However, because the implementations of this thesis use FPGAs, the disadvantages of fixed designs are reduced as flexibility can be achieved in many cases by reprogramming the FPGA.

Chapter 7 Results

T

his chapter summarizes the main research results of this thesis and identi- fies their significance to the research field of cryptographic implementation. Details, such as exact computation times, etc., are not considered in this chapter but they are available in the appended publications. The contributions of this thesis are twofold as they relate to either AES or ECC, and they are discussed separately in Secs. 7.1 and 7.2, respectively.

7.1 AES-related Contributions

AES was discussed in I, which surveyed and compared AES implementations, and II, which presented a fast AES implementation for FPGAs. The following two sections discuss their results.

In document Studies on high-speed hardware implementation of cryptographic algorithms (Page 102-105)