Homomorphic evaluation of arithmetic circuits is exactly what FHE and SHE schemes were created for. In practice, those circuits imply the use of real numbers which seems inconsistent with the fact that the plaintext space of the most efficient and commonly used FHE/SHE schemes is of the form

$$R_t = \mathbb{Z}_t[X]/(X^d+1).$$

Thus, basically we must encode real values into polynomials, which is done by expansion with respect to a base $b$ and then replacing all occurrences of $b$ by $X$. But then the problem arises of what SHE parameters to choose. Indeed, each homomorphic operation increases the encoding coefficients, so after a few consecutive multiplications this may lead to wrapping around modulo $t$ and invalid decoding results. To solve this problem $t$ must be chosen big enough. However, at the same time it should be small to suppress the growth of the inherent noise inside ciphertexts, in order to decrypt correctly. In practice, this problem is solved by the Chinese Remainder theorem (CRT) that splits a homomorphic algorithm into a number of threads with different plaintext spaces each modulo a factor of $t$. Nevertheless, to reduce the number of threads it is desirable to be able to take $t$ as small as possible.

To decrease the plaintext modulus one should choose an encoding scheme such that the encoding coefficients are as small as possible. The first such algorithm was presented by Dowlin et al. and studied by our colleagues Costache et al. This approach uses a balanced base $b=3$ representation where each coefficient belongs to the set $\{-1,0,1\}$. However, this choice is not optimal, because the degrees of the outcomes of the homomorphic computations are much smaller than required by the security recommendations of the underlying SHE scheme.

On the picture above the grey zone denotes the available plaintext space while the dark one is what actually was used. The lion's share of the plaintext space stays untouched. To use that space more efficiently, one might reduce the degree $d$ but it induces a significant decrease in the security level. Hence, informally, the optimisation problem consists in "squashing" the grey zone such that it almost coincides with the dark one as shown on the picture below.

Our idea is to encode real numbers with long but very sparse polynomials. In this setting the non-zero coefficients group together to a lesser extent, thereby hopefully decreasing the required plaintext modulus. One such representation is already well-known, namely, w-NAF. It is a base $b=2$ representation with coefficients from the set $\{0, \pm1, \pm3, \dots, \pm2^{w-1}-1\}$ and the property that any $w$ consecutive coefficients have at most one non-zero. Bigger $w$'s reduce the average number of non-zero digits but increase their size, which is problematic for our needs as it imposes the coefficients to grow exponentially. To address this issue we switch from the base $b=2$ to a smaller non-integral base $b>1$. Then it will be possible to construct a w-NAF-like encoding but only with the smallest possible set of digits $\{-1,0,1\}$. We called this representation "w-NIBNAF" which can be read as "non-integral base w-NAF".

For any real number $a$ the w-NIBNAF algorithm generates a ternary sequence $a_0, \dots, a_n$ in the following way:

The algorithm stops when $t < \varepsilon$ for some real $\varepsilon$ which determines the precision of the encoding.

Our w-NIBNAF representation drastically reduces the average number of non-zero coefficients per representation. We encoded 10 000 random integers in the range from $-2^{40}$ to $2^{40}$ with the precision $\varepsilon = 1$ and found that even for 1-NIBNAF the percentage of non-zeros is around 50 percent while it is 67 percent for balanced ternary expansions. It means that the w-NIBNAF encoding leads to smaller coefficients in the worst case where the input values of an arithmetic circuit are chosen such that the circuit output contains the maximal possible coefficient.

To illustrate the practical impact of the new encoding scheme we applied 950-NIBNAF to a non-trivial arithmetic circuit containing multiplications and additions. The choice of the circuit was motivated by the load forecasting use-case of the HEAT discussed in our previous paper. In a nutshell, the prediction was performed there by a neural-network-like structure that can be expressed by a polynomial with real coefficients and degree 16. To preserve correct evaluation of the algorithm with balanced ternary expansion, it required to take the plaintext modulus around 104 bits long. This huge modulus imposes to use a CRT decomposition of the plaintext space that boosts up the execution time of the homomorphic algorithm to 32 seconds per run. Evaluated with 950-NIBNAF encodings the same circuit needs only 6 bits for a plaintext modulus, and as a consequence does not require to split the plaintext space. Hence, one run takes only 2.5 seconds that is a speed-up by a factor 13.

A more thorough explanation with sharp bounds on the maximal coefficients can be found via ePrint IACR archive.

The KU Leuven and NXP teams of the HEAT project.

To decrease the plaintext modulus one should choose an encoding scheme such that the encoding coefficients are as small as possible. The first such algorithm was presented by Dowlin et al. and studied by our colleagues Costache et al. This approach uses a balanced base $b=3$ representation where each coefficient belongs to the set $\{-1,0,1\}$. However, this choice is not optimal, because the degrees of the outcomes of the homomorphic computations are much smaller than required by the security recommendations of the underlying SHE scheme.

On the picture above the grey zone denotes the available plaintext space while the dark one is what actually was used. The lion's share of the plaintext space stays untouched. To use that space more efficiently, one might reduce the degree $d$ but it induces a significant decrease in the security level. Hence, informally, the optimisation problem consists in "squashing" the grey zone such that it almost coincides with the dark one as shown on the picture below.

Our idea is to encode real numbers with long but very sparse polynomials. In this setting the non-zero coefficients group together to a lesser extent, thereby hopefully decreasing the required plaintext modulus. One such representation is already well-known, namely, w-NAF. It is a base $b=2$ representation with coefficients from the set $\{0, \pm1, \pm3, \dots, \pm2^{w-1}-1\}$ and the property that any $w$ consecutive coefficients have at most one non-zero. Bigger $w$'s reduce the average number of non-zero digits but increase their size, which is problematic for our needs as it imposes the coefficients to grow exponentially. To address this issue we switch from the base $b=2$ to a smaller non-integral base $b>1$. Then it will be possible to construct a w-NAF-like encoding but only with the smallest possible set of digits $\{-1,0,1\}$. We called this representation "w-NIBNAF" which can be read as "non-integral base w-NAF".

For any real number $a$ the w-NIBNAF algorithm generates a ternary sequence $a_0, \dots, a_n$ in the following way:

- Define the base $b_w$ as the unique positive root of the polynomial \[F_w(x) = x^{w+1} - x^w - x - 1.\]
- Find a power $r$ such that $t = -b_w^r$ or $b_w^r$ is the closest number to $a$.
- $a_r =$ sgn$(t)$
- Go to step 2 with $a \leftarrow a - t$.

The algorithm stops when $t < \varepsilon$ for some real $\varepsilon$ which determines the precision of the encoding.

Our w-NIBNAF representation drastically reduces the average number of non-zero coefficients per representation. We encoded 10 000 random integers in the range from $-2^{40}$ to $2^{40}$ with the precision $\varepsilon = 1$ and found that even for 1-NIBNAF the percentage of non-zeros is around 50 percent while it is 67 percent for balanced ternary expansions. It means that the w-NIBNAF encoding leads to smaller coefficients in the worst case where the input values of an arithmetic circuit are chosen such that the circuit output contains the maximal possible coefficient.

To illustrate the practical impact of the new encoding scheme we applied 950-NIBNAF to a non-trivial arithmetic circuit containing multiplications and additions. The choice of the circuit was motivated by the load forecasting use-case of the HEAT discussed in our previous paper. In a nutshell, the prediction was performed there by a neural-network-like structure that can be expressed by a polynomial with real coefficients and degree 16. To preserve correct evaluation of the algorithm with balanced ternary expansion, it required to take the plaintext modulus around 104 bits long. This huge modulus imposes to use a CRT decomposition of the plaintext space that boosts up the execution time of the homomorphic algorithm to 32 seconds per run. Evaluated with 950-NIBNAF encodings the same circuit needs only 6 bits for a plaintext modulus, and as a consequence does not require to split the plaintext space. Hence, one run takes only 2.5 seconds that is a speed-up by a factor 13.

A more thorough explanation with sharp bounds on the maximal coefficients can be found via ePrint IACR archive.

The KU Leuven and NXP teams of the HEAT project.

## No comments:

## Post a Comment