Length Management - Other Improvements - Machine learning on encrypted data

4.3 Other Improvements

4.3.3 Length Management

Recall that by default, each addition and each multiplication increase the bitlength: Ad- dition increases it by 1, whereas multiplication results in a bitlength that is the sum of the two input lengths. When performing several multiplications consecutively, this can easily lead to enormous bitlengths. However, in a scenario where the size of the values can be estimated, there is a way around this.

To see the validity of such an assumption, consider the scenario of machine learning as a service, where the person working on the encrypted data is the person who has the

algorithm for building the model. Here, it is a reasonable assumption that some factors of the model are known, e.g. from experience. For example, in the data set we will work with in Section 5.3, the value w0 always takes some value near 10000 no matter what subset of

test subjects we choose – thus, when computing on encrypted data, we might utilize this knowledge about our algorithm.

In such cases, the service provider who is doing the computations can put a bound on the lengths (i.e., he is certain that some value will not be larger in absolute value than 2q for some q). When this is the case, we can reduce the bitlength of the encrypted values to this size q + 1 by discarding the excess bits: In Two’s Complement, we can delete the most significant bits (which will all be 0 for a positive and 1 for a negative number) until we reach the desired length, whereas for Sign-Magnitude we discard the bits following the MSB (which will all be 0). To save even more computation time, we integrated this into our multiplication routine for Section 5.3, such that we not only save space, but also effort because we only compute until we reach the bound in each step.

This shortening operation can be viewed as the inversion of the sign extension introduced in Definition 3.1 and makes the entire algorithm significantly faster (see Section 5.3), as we have reduced the quadratic growth of the bitlength in multiplication.

4.4 CONCLUSION

In conclusion, we have developed two different ways of incorporating rational numbers into our FHE computations. The Fractional Encoding allows division of encrypted numbers, but has performance issues in practice. Our scaling procedure instead allows us to handle rationals as though they were integers, merely adding a new component which keeps the precision constant to the multiplication prodedure. This reduces the bitlength of the result while also hiding the function that was applied from the decrypting party. Additionally, it increases usability because the computing party does not need to keep track of the power of the scaling factor associated with each ciphertext.

In addition, we have showcased some other improvements to speed up the comparison and multiplication subroutines when certain assumptions hold true. The next chapter will present the impact of most of these improvements on the runtimes of real-world applications from the field of Machine Learning: Section 5.4.3 uses Fractional Encoding, Section 5.4.5 examines the impact of the approximate comparison, Section 5.3 shows the impact of length management, and the entire chapter (except Section 5.4.3) uses scaling with constant precision.

APPLICATION TO MACHINE LEARNING

The previous chapters have dealt with encodings for FHE computations, which comprised the green box in Figure 1.1 in Section 1.3. In this chapter, we move on to the red box – namely applying the results of the previous chapters to algorithms from the field of Machine Learning, adapting the algorithms as necessary to improve performance or make them executable under FHE at all.

To this end, we first cover some preliminaries like the required background on Machine Learning and our runtime specifications in Section 5.1.

We then examine our first Machine Learning algorithm, the Linear Means Classifier, in Section 5.2. In this section, we assume that the model has already been trained, and we predict encrypted new unknown cases with this model. The results will showcase the impact of our Hybrid Encoding from Section 3.5.

We then turn to the other task in supervised Machine Learning, namely training the model, which we do for the Perceptron (a simple Neural Network) in Section 5.3. This will show the importance of the bounding procedure from Section 4.3.3 and again the improvement due to our Hybrid Encoding.

Lastly, we will move to the area of unsupervised learning by executing the K-Means- Algorithm, a clustering algorithm, on encrypted data in Section 5.4. We attempt to use the Fractional Encoding from Section 4.1, but we will see that our concerns from Section 4.1.3 about this encoding were indeed valid. We thus opt to change the underlying K-Means-Algorithm instead to avoid division, resulting in an FHE-friendly algorithm that achieves the same task with similar accuracy as the original one.

This chapter is largely taken from [JA16] and [JA18].

5.1 PRELIMINARIES

In this section, we will discuss some preliminaries: First, we will cover the basics of Machine Learning, including related work concerning Machine Learning on encrypted data. We then also present our implementation specifications, which were used for all the runtimes given in this work.

In document Machine learning on encrypted data (Page 124-128)