• No results found

Update Subtask Design Alternatives

In document Group Key Management (Page 129-133)

6 Real-Time and Batch Rekeying Processors

7.5 Design Approach and Performance Features

7.5.2 Update Subtask Design Alternatives

As mentioned before, the subtask Update includes all the processing steps starting with the instruction decoding and ending at determining the hash value of the last rekeying submessage. This subtask is both control-intensive, regarding the tree management, and computation-intensive, regarding the cryptographic key generation, encryption, and secure hashing. The realization of the subtask Update is affected by numerous factors, e.g.:

1. The tree management mode: static or dynamic.

2. The realization of cryptographic primitives: software, hardware, or hardware with resource sharing.

3. Memory and caching, BRAM or SDRAM, with or without caching. 4. Bus structure, PLB, OPB or OCM.

Finding the optimal design alternative is a hard problem because of the high interaction among these factors. For a comprehensive analysis, 108 design alternatives for the subtask Update were realized and evaluated. For each solution, the execution times of the worst- case disjoin and worst-case join operations were measured as a function of the group size,

7.5

D

ESIGN

A

PPROACH AND

P

ERFORMANCE

F

EATURES

116

which ranges from 0 to 131,072. The estimation of execution times is based on Algorithm

7.2. For an efficient measurement, the test software (see Figure 7.6) performs a routine,

which initializes the group with 131,072 members stepwise, and interrupts this initialization at the points corresponding to a worst-case join or worst-case disjoin. At these points, the join or disjoin operation is executed and measured, before the initialization is continued. Lastly, the test software prepares the measured timing data for a presentation using Mathematica [Wo07] and sends these data to the host over the UART interface. In the following some design alternatives are discussed representatively, which depicts some interesting issues in the design of the HiFlexRP:

7.5.2.1 Bus selection and caching

As mentioned in Chapter 4, the embedded processor PPC405 features a Harvard architecture with dedicated data and instruction memory interfaces. System memories can be connected either to the Processor Local Bus (PLB) or to the On-Chip Memory bus (OCM). The decision on the appropriate memory bus must take the overall system and the running application into consideration. While the PLB offers 64-bit data busses, compared to 32-bit in the case of OCM, the last is dedicated for memories, i.e. it is not shared by other system components like the PLB. Another decision criterion relates to the cacheability of these memories. Because of dedicating the OCM bus to storage resources, memories connected to this bus do not support caching, in contrast to PLB memories. Therefore, several design alternatives with different memory and bus configurations are evaluated for the HiFlexRP design. Particularly, the worst-case join costs were measured for three systems with OCM memories, with cached PLB memories, and with non-cached PLB memories. All these systems were tested in two cases:

1. Hardware-only realizations of the cryptographic primitives for key generation, encryption, and secure hashing, see Figure 7.7.

2. Software-only realizations of these operations, see Figure 7.8.

All these design alternatives use static tree management. Other implementations with dynamic tree management, which delivers comparable results, are not reported here for brevity.

Recall that the measured execution times here only relate to the subtask Update, i.e. the time elapsed for determining the digital signature of the hash value is not included. The diagrams presented in Figure 7.7 and Figure 7.8 conform the logarithmic relation of rekeying costs to the group size in the LKH algorithm. Note that the x-axis in these diagrams has a half-logarithmic scale.

Obviously, the best performance of the subtask Update can be obtained by using PLB instruction and data memories with caching, regardless of the implementation of the security modules and the tree management mode. In contrast, OCM memories are superior to PLB memories, if the last are used without caching. Therefore, the next experiments were performed on systems with cached PLB memories.

Figure 7.7. Worst-case join cost with hardware security modules

Figure 7.8. Worst-case join costs with software security functions

7.5.2.2 Hardware vs. software security modules

An essential design issue for the HiFlexRP relates to accelerating the time-consuming cryptographic operations. Figure 7.9 compares the performance of the subtask Update for three realizations of the cryptographic primitives: a software-only, a hardware-only, and a mixed HW/SW realization. The HW/SW alternative, referred to as HS in Figure 7.9, is based on a shared AES core for the different security functions. All these system realizations are based on dynamic tree management. Obviously, the hardware-only implementation provides the highest performance compared to the other two design alternatives. Recall that this performance substantially influences the system security and QoS. Furthermore, lower join and disjoin costs enable supporting larger dynamic groups. In spite of its high performance compared to software, the HS alternative restricts the design flexibility, e.g. in the case of using different hardware key generation module.

7.5

D

ESIGN

A

PPROACH AND

P

ERFORMANCE

F

EATURES

118

Figure 7.9. Worst-case join costs (HW vs. SW vs. HS)

7.5.2.3 Static vs. dynamic tree management

In Section 7.4.1, a quantitative comparison between static and dynamic tree management modes regarding memory utilization was given. With 4.5 MB needed to store tree auxiliary data in the dynamic case compared to 32 KB in the static case, the superiority of the last is evident. Regarding performance, Section 7.4.2 illustrated the dynamic tree management and qualitatively outlined the differences to the static management mode. To evaluate this pre-estimation, a timing measurement was performed for two design alternatives with pure hardware security modules, see Figure 7.10. Obviously, the static tree management is superior to the dynamic tree management in respect of rekeying performance, too.

In document Group Key Management (Page 129-133)