7.2 Proof-of-Concept Voting Implementation
7.2.4 Performance
To be able to claim practical performance for the addition aspect of voting, the following question must be true.
Question 3 Can all eligible voters cast their vote during the voting period, and
the result be available shortly after the ballot is finished?
Definitions of some of the terms in the question above can vary, where scope is needed. For this analysis, the voting period is 24 hours, but the majority of the votes will be cast during a 12-hour period, for a voting population of around five million. The distribution of votes cast will greatly affect when the results are available, because if all votes are cast in the last hour, the results cannot be expected to be finalised when the voting period ends. Therefore, if all votes are cast at 5am, the tally should be complete by 5pm, reducing analysis to throughput (votes per second). All results in this section were from a cloud implementation in AWS using C5.large instances, consisting of 2 virtual central processing units, 4GB memory, and up to 10Gbps network performance.
For the enhanced privacy model consisting of three fragment servers, the LUT sizes for both results and a single obfuscation LUT were 32KB and 32B respectively. In terms of memory requirements, these tables can fit easily into main memory and higher levels of cache. By doubling the number of fragment servers, the redundancy model result LUT is 16MB, but the obfuscation LUTs drop to 16Bs, as the number of states has halved (only reduces three bits plus a carry, whereas without redundancy, the reduction occurs on four bits plus the carry); however, these tables will still be able to reside in memory, meaning the memory requirements of the two implementations are very small.
The biggest overhead for any FRIBs implementation will be the network, including latency, bandwidth, packet loss and the cost of moving data to/from the networking stack. This is shown in Figure 7.5 where different RTT times are compared against multiple voting tallies (parallel tallies) for the time taken to process a single vote. Focusing on the single 24-bit tally, the difference be- tween 0ms RTT (actually 0.05ms) and 1ms (1.05ms) is practically nothing; therefore, the time to process a single vote is primary processing time on the central processing unit and transferring data to/from the networking stack.
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14
RTT between each server (ms)
Time for eac h A ddition (ms) Parallel Tallies 1 2 3 4 5 6 7 8 9 10
Figure 7.5:Votes added per second, for a three-server model
However, once more latency is added, the time to process a single vote in- creases at a near linear rate. Computing multiple 24-bit tallies in parallel hides a lot of the networking overhead, as the other data lines in Figure 7.5 show. An example is ten parallel tallies, where the time to add a single vote stays consistent even with varying latencies. The overhead for the network would therefore just be bandwidth availability and packet loss. Note that the difference between sending the data over a secure channel or unsecured channel did not affect throughput in a noticeable manner.
Adding a vote to the single 24-bit tally takes 0.013 seconds for a 10ms RTT between each fragment server, giving a throughput of 75.4 votes per second. Within a 12-hour period, over 3 million votes could be tallied–a nearly accept- able rate. However, with ten tallies in parallel, the throughput increases to 173.7 votes per second, allowing for over 7.5 million votes to be tallied in 12 hours. This is an acceptable throughput for the use case of New Zealand or Singapore. Once the tallies are complete, the ten tallies can be summed to- gether within a FRIBs environment or in plain text. Note that by parallelising
0 1 2 3 4 5 6 7 8 9 10 0 10 20 30 40 50 60
RTT between each server (ms)
Time for eac h A ddit ion (ms) Parallel Tallies 1 2 3 4 5 6 7 8 9 10
Figure 7.6:Votes added per second, for a six-server model with built-in redundancy
multiple tallies, the fragment servers wait for ten vote fragments in the queue, then reduce the ten tallies in parallel to saturate the network. The aim is to prevent the threads from waiting (going to sleep) while new data arrives from the network. The number of parallel tallies will vary on implementation and network quality between the fragment servers.
The redundancy model results for the time to process a single vote are given in Figure 7.6. Similar results to Figure 7.5 are observed; however, with more servers and more data transferred, the throughput is less.3 The maximum
throughput achieved was 30 votes per second, meaning only 1.3 million votes could be summed in twelve hours. For other application or scenarios, this per- formance would be acceptable, especially because it has built-in redundancy,
3Figure 7.6 has a small anomaly for RTT values between 1 and 4. The results were
obtained by averaging additions for all x parallel tallies for each RTT, meaning each
tally is running at the same time per RTT; therefore, any variation in the network at that time could cause the hump seen for RTT values 1 and 2. Instead of averaging the results over multiple days to remove this hump, the results were left as a single run to demonstrate how the network can affect performance.
but with the voting use case, the enhanced privacy model with redundancy is too slow for practical performance. Instead, it would be better to run three enhanced privacy model instances, giving nine fragment servers in total. The performance would be superior and also allow for one instance to be compro- mised, as the other two instances would give the same result.
In Chapter 3, partially homomorphic encryption was shown to give the best performance for secure voting in the cloud. Performance for the homomor- phic addition taking under a millisecond gives unmatchable throughput value; however, because the zero-knowledge proof is required to guarantee votes are a single bit, it also must be included in the throughput calculation. Re- evaluating the performance of the partially homomorphic encryption scheme on the same AWS instance (one of the fragment servers used for testing in this chapter) gives a result of 22ms for a 2048-bit key, and 117ms for a 4096-bit key. Given the instances are dual-core, throughput achieved is 91 or 17 votes per second, depending on key size. Using the same number of servers as the en- hanced privacy model results in six tallies, two per server with a throughput of 273 or 51 votes per second. Therefore, using the same compute power as the FRIBs implementation, partially homomorphic encryption can still produce a greater throughput; however, the client-side performance for generating the zero-knowledge proof is still an issue, especially within web browsers. However, FRIBs has client-side performance similar to that of plain text, and when com- bined with the flexibility offered by FRIBs (allowing for different operations to be computed, including advanced voting specifications), it compensates for the slightly slower throughput.