Securing implementations of feedback-shift-register-based ciphers using compiler optimizations and co-processors

Full text

(1)U NIVERSIDAD P OLITÉCNICA DE M ADRID E SCUELA T ÉCNICA S UPERIOR DE I NGENIEROS DE T ELECOMUNICACIÓN. T ESIS D OCTORAL. S ECURING I MPLEMENTATIONS OF F EEDBACK -S HIFT-R EGISTER -B ASED C IPHERS U SING C OMPILER O PTIMIZATIONS AND C O -P ROCESSORS A UTOR :. Pedro José Malagón Marzo Ingeniero de Telecomunicación. D IRECTOR :. José Manuel Moya Fernández Doctor Ingeniero de Telecomunicación. 2015.

(2)

(3) Ph.D Thesis Título:. S ECURING I MPLEMENTATIONS OF F EEDBACK -S HIFT-R EGISTER -B ASED C IPHERS U SING C OMPILER O PTIMIZATIONS AND C O -P ROCESSORS. Autor:. P EDRO J OSÉ M ALAGÓN M ARZO. Tutor:. J OSÉ M ANUEL M OYA F ERNÁNDEZ. Departamento:. D EPARTAMENTO DE I NGENIERÍA E LECTRÓNICA. Miembros del tribunal: Presidente: Secretario: Vocal: Vocal: Vocal: Suplente: Suplente:. Los miembros del tribunal arriba nombrados acuerdan otorgar la calificación de:. Madrid,. de. de 2015.

(4)

(5) A Julia. A Martín A mi familia.

(6)

(7) Agradecimientos En primer lugar quiero agradecer a mi tutor, José Manuel Moya, el apoyo que me ha ofrecido durante el desarrollo de la tesis, que va mucho más allá del que podría haber esperado. Su optimismo, sus consejos y su guía han sido imprescindibles para terminar este trabajo. En segundo lugar a los compañeros que he tenido desde que empecé a trabajar en el grupo LSI, especialmente a los miembros Green con los que he compartido los últimos años. Gracias a vuestro apoyo he podido llegar hasta aquí. Sobre todo a Juan-Mariano de Goyeneche, que ha compartido conmigo estos años de profesor ayudante, y siempre ha estado ahí para apoyarme independientemente de su estado de ánimo. Quiero agradecer al Departamento de Ingeniería Electrónica de la ETSI Telecomunicación la oportunidad que me ha brindado de ser profesor ayudante, tanto desde el punto de vista de la docencia como para desarrollar mi trabajo de investigación. En especial a los coordinadores de los laboratorios en los que he participado, que me han ayudado en lo personal y lo profesional. Finalmente, quiero agradecer el apoyo de mi familia durante todos estos años, especialmente en los peores momentos, en los que me han dado ánimo, apoyo y algún que otro empujón hacia adelante..

(8)

(9) Abstract Feedback Shift Registers (FSR) have been traditionally used to implement pseudorandom sequence generators. These generators are used in Stream ciphers in systems with tight resource constraints, such as Remote Keyless Entry. When communicating electronic devices, the primary channel is the one used to transmit the information. Side-Channel Attack (SCA) use additional information leaking from the actual implementation, including power consumption, electromagnetic emissions or timing information. Side-Channel Attacks (SCA) are a serious threat to FSR-based applications, as an attacker usually has physical access to the devices. The main objective of this Ph.D. thesis is to provide a set of countermeasures that can be applied automatically using the available resources, avoiding a significant cost overhead and extending the useful life of deployed systems. If possible, we propose to take advantage of the inherent parallelism of FSR-based algorithms, as the state of a FSR differs from previous values only in 1-bit. We have contributed in three different levels: architecture (using a reconfigurable co-processor), using compiler optimizations, and at bit level, making the most of the resources available at the processor. We have developed a framework to evaluate implementations of an algorithm including the effects introduced by the compiler. We consider the presence of an expert attacker with great knowledge on the application and the device. Regarding SCA, we have presented a new differential SCA that performs better than traditional SCA on software FSR-based algorithms, where the leaked values are similar between rounds. SORU2 is a reconfigurable vector co-processor. It has been developed to reduce energy consumption in loop-based applications with parallelism. In addition, we propose its use for secure implementations of FSR-based algorithms. The cost overhead is discarded as the co-processor is not exclusively dedicated to the encryption algorithm. We present a co-processor configuration that executes multiple simultaneous encryptions, using different implementations and keys. From a basic implementation, which is proved to be vulnerable to SCA, we obtain an implementation where the SCA applied were unsuccessful. At compiler level, we use the framework to evaluate the effect of sequences of compiler optimization passes on a software implementation. There are many optimization passes available. The optimization sequences are combinations of the available passes. The amount of sequences is extremely high. The framework includes an algorithm for the selection of interesting sequences that require detailed evaluation. As existing compiler optimizations transform the software implementation, using different optimization sequences we can automatically generate different implementations. We propose to randomly switch between the generated implementations to increase the resistance against SCA. We propose two countermeasures. The results show that, although they increase the resistance against SCA, the resulting implementations are not secure. i.

(10) At bit level, we propose to exploit bit level parallelism of FSR-based implementations using pseudo bitslice implementation in a wireless node processor. The bitslice implementation is automatically obtained from the Algebraic Normal Form of the algorithm. The results show a performance improvement, avoiding timing information leakage, but increasing the vulnerability against differential SCA. We provide a secure version of the algorithm by randomly discarding part of the data obtained. The overhead in performance is negligible when compared to the original implementations. To summarize, we have proposed a set of original countermeasures at different levels that introduce randomness in FSR-based algorithms avoiding a heavy overhead on the resources required.. ii.

(11) Resumen Los algoritmos basados en registros de desplazamiento con realimentación (en inglés FSR) se han utilizado como generadores de flujos pseudoaleatorios en aplicaciones con recursos limitados como los sistemas de apertura sin llave. Se considera canal primario a aquel que se utiliza para realizar una transmisión de información. La aparición de los ataques de canal auxiliar (en inglés SCA), que explotan información filtrada inintencionadamente a través de canales laterales como el consumo, las emisiones electromagnéticas o el tiempo empleado, supone una grave amenaza para estas aplicaciones, dado que los dispositivos son accesibles por un atacante. El objetivo de esta tesis es proporcionar un conjunto de protecciones que se puedan aplicar de forma automática y que utilicen recursos ya disponibles, evitando un incremento sustancial en los costes y alargando la vida útil de aplicaciones que puedan estar desplegadas. Explotamos el paralelismo existente en algoritmos FSR, ya que sólo hay 1 bit de diferencia entre estados de rondas consecutivas. Realizamos aportaciones en tres niveles: a nivel de sistema, utilizando un coprocesador reconfigurable, a través del compilador y a nivel de bit, aprovechando los recursos disponibles en el procesador. Proponemos un marco de trabajo que nos permite evaluar implementaciones de un algoritmo incluyendo los efectos introducidos por el compilador considerando que el atacante es experto. En el campo de los ataques, hemos propuesto un nuevo ataque diferencial que se adapta mejor a las condiciones de las implementaciones software de FSR, en las que el consumo entre rondas es muy similar. SORU2 es un co-procesador vectorial reconfigurable propuesto para reducir el consumo energético en aplicaciones con paralelismo y basadas en el uso de bucles. Proponemos el uso de SORU2, además, para ejecutar algoritmos basados en FSR de forma segura. Al ser reconfigurable, no supone un sobrecoste en recursos, ya que no está dedicado en exclusiva al algoritmo de cifrado. Proponemos una configuración que ejecuta múltiples algoritmos de cifrado similares de forma simultánea, con distintas implementaciones y claves. A partir de una implementación sin protecciones, que demostramos que es completamente vulnerable ante SCA, obtenemos una implementación segura a los ataques que hemos realizado. A nivel de compilador, proponemos un mecanismo para evaluar los efectos de las secuencias de optimización del compilador sobre una implementación. El número de posibles secuencias de optimizaciones de compilador es extremadamente alto. El marco de trabajo propuesto incluye un algoritmo para la selección de las secuencias de optimización a considerar. Debido a que las optimizaciones del compilador transforman las implementaciones, se pueden generar automáticamente implementaciones diferentes combinamos para incrementar la seguridad ante SCA. Proponemos 2 mecanismos de aplicación de estas contramedidas, que aumentan la seguridad de la implementación original sin poder considerarse seguras. Finalmente hemos propuesto la ejecución paralela a nivel de bit del algoritmo en un procesador. Utilizamos la forma algebraica normal del algoritmo, que automáticamente se paraleliza. La implementación sobre el algoritmo evaluado mejora en rendimiento y iii.

(12) evita que se filtre información por una ejecución dependiente de datos. Sin embargo, es más vulnerable ante ataques diferenciales que la implementación original. Proponemos una modificación del algoritmo para obtener una implementación segura, descartando parcialmente ejecuciones del algoritmo, de forma aleatoria. Esta implementación no introduce una sobrecarga en rendimiento comparada con las implementaciones originales. En definitiva, hemos propuesto varios mecanismos originales a distintos niveles para introducir aleatoridad en implementaciones de algoritmos FSR sin incrementar sustancialmente los recursos necesarios.. iv.

(13) Contents 1. 2. 3. Introduction 1.1 Side-channel Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Sources of Information Leakage . . . . . . . . . . . . . . . . . . . . . . . 1.3 Overview of Countermeasures . . . . . . . . . . . . . . . . . . . . . . . 1.4 FSR-based Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 List of publications on security using Reconfigurable Hardware 1.7.2 List of publications on security using Compiler Optimizations . 1.7.3 List of publications on WSN . . . . . . . . . . . . . . . . . . . . . 1.8 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related work 2.1 Power Analysis Attacks . . . . . . . . . . . . . . . . 2.1.1 Simple Power Analysis . . . . . . . . . . . . 2.1.2 Differential Power Analysis . . . . . . . . . . 2.2 DPA countermeasures . . . . . . . . . . . . . . . . . 2.2.1 Logic level countermeasures . . . . . . . . . 2.2.2 Architecture level countermeasures . . . . . 2.2.3 Software countermeasure . . . . . . . . . . . 2.2.4 Attacks on Hiding countermeasures . . . . . 2.3 SCA in FSR-based Ciphers . . . . . . . . . . . . . . . 2.3.1 Countermeasures on FSR-based Algorithms 2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . Methodology 3.1 KeeLoq . 3.2 LLVM . . 3.3 MSP430 . 3.4 Metrics .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . . v. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . .. . . . . . . . . . . .. 1 2 5 7 7 10 10 12 12 13 13 15. . . . . . . . . . . .. 17 17 18 21 27 28 35 39 45 47 49 50. . . . .. 51 54 57 58 60.

(14) 4. 5. 6. 7. Countermeasure proposal I: reconfigurable co-processor 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 SORU2 organization . . . . . . . . . . . . . . . . . . . . 4.2.1 SORU2 Decode stage . . . . . . . . . . . . . . . . 4.2.2 SORU2 Execution stage . . . . . . . . . . . . . . 4.2.3 SORU2 Write back stage . . . . . . . . . . . . . . 4.2.4 Programming interface . . . . . . . . . . . . . . 4.3 SORU simulation platform . . . . . . . . . . . . . . . . . 4.3.1 General structure . . . . . . . . . . . . . . . . . . 4.3.2 Particular structure for the SORU2 co-processor 4.4 Attack avoidance with SORU2 . . . . . . . . . . . . . . 4.4.1 Low-power characteristics . . . . . . . . . . . . . 4.4.2 Non determinism . . . . . . . . . . . . . . . . . . 4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. Countermeasure proposal II: existing compiler optimization 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Evaluation of optimization passes . . . . . . . . . . . . . . 5.3 Countermeasures Using standard compiler optimizations . 5.3.1 Combination of subsequences . . . . . . . . . . . . . 5.3.2 Loop-unroll . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . .. 65 65 67 67 69 69 70 70 70 72 74 74 75 75 80. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 83 83 84 92 93 96 101. Countermeasure proposal III: bit-level parallelism optimization 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 LUT-based implementations . . . . . . . . . . . . . . 6.2.2 ANF-based implementations . . . . . . . . . . . . . . 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 103 103 105 105 107 116 118 120. Conclusions 123 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126. Appendices. 127. A LLVM optimization passes. 129. B KeeLoq implementations. 147. vi.

(15) List of Figures 1.1 1.2. . . . . . . . . . CMOS logic. . . . . . . . . . . . . . . . . . .. 4. 1.3. Structure of a typical side-channel attack. . . . . . . SPICE simulation result of a 2 input NAND gate Figure from [86]. Used with permission. . . . . . . . LFSR vs NLFSR . . . . . . . . . . . . . . . . . . . . .. 2.1. Correlation Power Analysis (CPA) attack results. . . . . . . . . . . . . . . . .. 23. 3.1 3.2 3.3 3.4 3.5 3.6 3.7. Proposed analysis framework for SCA resistance . KeeLoq packet structure. Retrieved from [177] . . KeeLoq structure. Retrieved from [187] . . . . . . KeeLoq learning mechanism. Retrieved from [177] LLVM compilation flow . . . . . . . . . . . . . . . CPA over KeeLoq software implementation . . . . CPA vs DCPA . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. 52 55 55 56 57 63 64. Overview of a SORU2 system. . . . . . . . . . . . . . . . . . . . . . . . . . . . SORU2 datapath. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SORU2 Simulator System Blocks . . . . . . . . . . . . . . . . . . . . . . . . . CPA on SORU: basic KeeLoq configuration . . . . . . . . . . . . . . . . . . . Resistance against CPA on SORU: fill with random or extra KeeLoq (same key) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Resistance against CPA on SORU: x3 KeeLoq configuration . . . . . . . . . . 4.7 Resistance against CPA on SORU: x2 KeeLoq configuration . . . . . . . . . . 4.8 SORU: x4 KeeLoq register evolution . . . . . . . . . . . . . . . . . . . . . . . 4.9 Resistance against CPA on SORU: x4 KeeLoq configuration . . . . . . . . . . 4.10 Resistance against CPA on SORU: x4 combined KeeLoq configuration . . .. 66 68 73 76. 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9. 86 88 89 90 90 91 94 94 95. . . . . . . .. . . . . . in static . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 4.1 4.2 4.3 4.4 4.5. Optimized bitcode generation process: positive accumulative . . . . . Distribution function of starting instant for round 9 . . . . . . . . . . . CPA on the o_009_pa implementation . . . . . . . . . . . . . . . . . . Resistance to CPA of the o_009_pa implementation . . . . . . . . . . . DCPA of the o_009_pa implementation . . . . . . . . . . . . . . . . . . Summary of successful DCPA against optimized programs . . . . . . . Distribution function of execution time per round . . . . . . . . . . . . Distribution function of execution time for combined implementations DSCA on combined implementations: 2 sequences . . . . . . . . . . . . vii. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 5 9. 77 77 78 79 79 80.

(16) 5.10 DSCA on combined implementations: 2 sequences . . . . . . . . . . . . . . . 95 5.11 Number of samples with a similar deviation ratio from common code . . . . 96 5.12 Resistance of implementation with a partial loop unrolling of 2 iterations with mspsim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.13 Resistance of implementation with a partial loop unrolling of 2 iterations with mspsim with 5 cycle window integration . . . . . . . . . . . . . . . . . 98 5.14 Resistance with a partial loop unrolling of 2 iterations with LLVM interpreter 98 5.15 Resistance of implementation that switches randomly between 3 implementations (2, 3 iterations and no unrolling) with LLVM interpreter . . . . . . . 99 5.16 Histogram of manipulation of intermediate value. . . . . . . . . . . . . . . . 100 5.17 Correlation trace for correct key guess using windowed CPA . . . . . . . . . 100 5.18 Resistance of implementation that switches randomly between 3 implementations (2, 3 iterations and no unrolling) attacking with window of size 10 with LLVM interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.1 6.2 6.3 6.4 6.5 6.6 6.7. CPA of LUT-based implementation . . . . . . . . CPA of basic ANF-based implementation . . . . Bitslice setup of consecutive blocks . . . . . . . . CPA of bitslice implementation . . . . . . . . . . Update data . . . . . . . . . . . . . . . . . . . . . CPA of sec-bitslice implementation . . . . . . . . Data memory access in the inner and outer loops. . . . . . . .. 108 110 111 114 114 116 117. A.1 A.1 A.2 A.2 A.3 A.3 A.4 A.4 A.5 A.5 A.6 A.6. CPA on keeloq_tb041 generated implementations . . . . . . . . . . . . . . . CPA on keeloq_tb041 generated implementations . . . . . . . . . . . . . . . Zoomed CPA on keeloq_tb041 generated implementations . . . . . . . . . . Zoomed CPA on keeloq_tb041 generated implementations . . . . . . . . . . Resistance against CPA of keeloq_tb041 generated implementations . . . . . Resistance against CPA of keeloq_tb041 generated implementations . . . . . Resistance against zoomed CPA of keeloq_tb041 generated implementations Resistance against zoomed CPA of keeloq_tb041 generated implementations DCPA on keeloq_tb041 generated implementations . . . . . . . . . . . . . . DCPA on keeloq_tb041 generated implementations . . . . . . . . . . . . . . Resistance against DCPA of keeloq_tb041 generated implementations . . . . Resistance against DCPA of keeloq_tb041 generated implementations . . . .. 134 135 136 137 138 139 140 141 143 144 145 146. viii. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . ..

(17) List of Tables 2.1 2.2 2.3. SPA summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DSCA summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DPL summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.1. Benchmark tests and simulation results for the maximum number of elements 66. 5.1 5.2. Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extra cycles associated to each input data bit . . . . . . . . . . . . . . . . . .. 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11. Performance evaluation of LUT-based implementations . . . . . . . Extra cycles when input bit active of LUT-based implementations . Summary of resistance against CPA of LUT-based implementations Performance evaluation of basic ANF implementation . . . . . . . . Extra cycles when input bit active of ANF-based . . . . . . . . . . . Performance evaluation of bitslice implementation . . . . . . . . . . Extra cycles when input bit active of bitslice . . . . . . . . . . . . . . Performance evaluation of secured bitslice implementation . . . . . Performance evaluation of KeeLoq implementations . . . . . . . . . Extra cycles when input bit active of LUT-based implementations . Summary of resistance against CPA of KeeLoq implementations . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. 21 25 34. 87 87 106 106 107 109 109 113 113 115 116 117 118. A.1 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132. ix.

(18) x.

(19) Chapter 1. Introduction In traditional cryptanalysis, encrypting is viewed as a black box operation that transforms plaintext into ciphertext using a secret key. Algorithm designers assume that input and output data might be available to attackers, but no other information related to the key is available. Many existing encryption algorithms have no practical known weaknesses, and the only way to unlock the secret key is to try all possible combinations. As a result, as long as the number of combinations is large enough such that a complete search becomes de facto impossible, the encryption algorithm is said to be secure. For instance, the RSA public key algorithm using 2048 bits key can be used at least till the year 2030, before the expected computing power will be available to do the integer factorization of a 2048 bit number. However, the cipher has to be implemented on a real device, which will leak additional information that can be used to determine the secret key. Indeed, in a similar way as feel or sound can help to find the combination of a padlock, the power consumption or time delay of the device can reveal the value of the secret key. For instance, early smart card systems implemented the modular exponentiation of the RSA algorithm using the text book version of the square-and-multiply algorithm, in which the multiplication is only executed if the exponent bit equals 1. Since a multiplication does not have the same power signature as a squaring operation, it was possible to find out, by observing the power consumption, when a multiplication took place and thus, to read off the private key from a single power trace. The power consumption and the electromagnetic emissions of any hardware circuit can be modelled as a function of the switching activity at the wires inside it. Since the switching activity (and hence, power consumption) is data dependent, it is not surprising that special care should be taken when sensitive data has to be communicated between System on Chip (SoC) components or to the outside. A common approach to implementing tamper-resistant systems involves the use of a separate secure co-processor module [149], which is dedicated to processing all sensitive information in the system. Any sensitive information that needs to be sent out of the secure co-processor is encrypted. The attacks, that use additional information leaking from the actual implementation, 1.

(20) Chapter 1. Introduction. are also known as SCA. First proposed in 1996 [100], they have been used since then to extract cryptographic material of symmetric and public key encryption algorithms running on microprocessors, Digital Signal Processors (DSPs), Field Programmable Gate Arrays (FPGAs), Application-Specific Integrated Circuits (ASICs) and high performance processors from variations in power consumption, time delay or electromagnetic radiation. Note that SCAs are non-invasive attacks, which means that they observe the device in normal operation mode without any physical harm to the device thus, making the device tamperproof does not protect against these attacks. For certain side-channel attacks, it is not even necessary to possess the device or to be in close proximity. This is the case of a remote attack that successfully found key material of an OpenSSL web server from non-constant execution time due to conditional branches in the algorithm [37]. A careful design can increase the resistance against attacks. Security adds a new dimension to embedded system design in addition to already existing dimensions such as area, performance and power consumption optimization [148]. SCA resistance, which can be a show-stopper to achieve security, is no exception and the thus far prevailing strategies rarely come cheap. Integrating a co-processor in the SoCs with hardware countermeasures only to compute the sensitive operations of an encryption algorithm is an example. It is also not very well understood how to analyze the strength of a design, and the precise cost of a mitigation strategy is seldom fully and clearly communicated. This makes the increase in resistance hard to quantify and the design trade-offs difficult to make. This is the main context in which this Ph.D. thesis can be described, the proposal and evaluation of low effort SCA countermeasures offering interesting trade-offs for embedded systems. This Ph.D. thesis is focused on providing countermeasures for FSRbased encryption algorithms, a concrete type of encryption algorithms typically used in systems with limited resources for applications such as Remote Keyless Entry (RKE) systems. The following introductory sections present the basics of side-channel attacks, countermeasures and FSR-based ciphers.. 1.1. Side-channel Attacks. A cryptographic primitive can be considered from two points of view: on the one hand, it can be viewed as an abstract mathematical object or black box; on the other hand, a primitive will be implemented in hardware or in software that will run on a given processor, in a given environment, and will therefore present specific characteristics. The first point of view leads to classical cryptanalysis; the second one leads to physical security analysis. Physical attacks on cryptographic devices take advantage of implementation-specific characteristics to recover the secret parameters involved in the computation. They are therefore much less general (since they are specific to a given implementation) but often much more powerful than classical cryptanalysis, and are considered very seriously by manufacturers of cryptographic devices. Such physical attacks are numerous and can be classified in many ways. The literature 2.

(21) 1.1. Side-channel Attacks. usually sorts them among two orthogonal axes: 1. Invasive vs. non-invasive: invasive attacks require depackaging the chip to get direct access to its inside components; a typical example of this is the connection of a wire on a data bus to see the data transfers. Non-invasive attacks only exploit externally available information (the emission of which is often unintentional) such as running time, power consumption, electromagnetic emissions, etc. 2. Active vs. passive: active attacks try to tamper with the device proper functioning; for example, fault-induction attacks will try to induce errors in the computation. As opposed, passive attacks will simply observe the device behavior during the processing without disturbing it. SCA attacks are closely related to the existence of a physically observable phenomenon caused by the execution of computing tasks in present microelectronic devices. For example, microprocessors consume time and power to perform their assigned tasks. They also generate an electromagnetic field, dissipate heat and even make some noise [73]. As a matter of fact, there are plenty of information sources leaking from actual computers that can consequently be exploited by malicious adversaries. This Ph.D. thesis is mainly focused on power consumption, which is a frequently considered side-channel in practical attacks. The results obtained will be valid for electromagentic field as well, as it is closely related to power consumption. SCA is one of the most successful approaches to reveal secret data, such as cryptographic keys, from black-box secure cryptographic algorithms implemented in embedded devices. Differential Side-Channel Attack (DSCA) exploits (small) differences in a set of measurements using statistical analysis and is particularly well suited for the power analysis of block cipher implementations. A side-channel attack works as follows (see Figure 1.1): it compares observations of the side-channel leakage (i.e. measurement samples of the supply current, execution time, or electromagnetic radiation) with estimations of the side-channel leakage. The leakage estimation comes from a leakage model of the device requiring a guess on the secret key. The correct key is found by identifying the best match between the measurements and the leakage estimations of the different key guesses. Furthermore, by limiting the leakage model to only a small piece of the algorithm, only a small part of the key must be guessed and the complete key can be found using a divide-and-conquer approach. For instance, an attack on the Advanced Encryption Standard (AES) generally estimates the leakage caused by a single key byte and as a result the 128-bit key can be found with a mere 16 · 28 tests, corresponding to 8-bits of the key for each of the 16 rounds of AES algorithm. Finally, as the observations might be noisy and the model might be approximate, statistical methods are often used to derive the secret from many measurements. In the last decade, various attack methodologies have been put forward. As a consequence of the need for secure embedded devices such as smart cards, mobile phones and PDAs, research is also conducted in the field of SCA prevention. Early countermeasures include algorithmic masking schemes, noise generators, and random process interrupts. All of them have in common that they do not address the 3.

(22) Chapter 1. Introduction. unknown secret key. cryptographic device. measurements statistical analysis. input leakage model. estimations. key fragment guesses. Figure 1.1: Structure of a typical side-channel attack.. issue of side channel leakage directly, but aim at obfuscating the observables. Most of these countermeasures have been proven to be either insecure or circumventable, e.g. with higher-order attacks or with digital signal processing. In recent years, research and industry have started to approach the issue of side channel leakage right where it arises: at the gate level. There is a considerable body of research on gate level masking schemes which again aim at obfuscating the leakage, and differential logic styles which focus on reducing the leakage. However attacks against these secured logic styles have been published. Most of them exploiting circuit “anomalies” as for example glitches or the early propagation effect. This PhD. thesis is focused on the power consumption side channel covering both, power analysis attacks and countermeasures to these attacks. Power analysis attacks exploit the physical dependency of a power consumption of a device and the data it is processing. Since these attacks are non-invasive, passive and can generally be performed using relatively cheap equipment, they pose a serious threat to the security of most devices requiring to handle sensitive information. Such devices range from personal computers to small embedded devices such as smart cards and Radio Frequency Identification Device (RFID). Their proliferation in a continuously larger spectrum of applications has turned the physical security and side-channel issue into a real, practical concern. For this purpose, we start by covering the basics of side-channel attacks. We discuss the origin of unintended leakages in recent microelectronic technologies and describe how simple measurement setups can be used to recover and exploit these physical features. Then, we introduce some of the proposed countermeasures to avoid these leakages or to make them difficult to analyze. 4.

(23) 1.2. Sources of Information Leakage. Figure 1.2: SPICE simulation result of a 2 input NAND gate in static CMOS logic. Figure from [86]. Used with permission.. 1.2. Sources of Information Leakage. The single most important parameter used for in-chip or off-chip secret communications is the secret key used for encryption and decryption of data. While the key stays constant for the duration of the encryption, the input values for each sub-round of encryption are always changing. The input-dependent characteristic of regular digital circuit will leak enough power consumption information for a skilled adversary to successfully obtain the secret key. Currently, the most widely used logic style to implement digital integrated circuits is CMOS (Complementary Metal-Oxide Semiconductor). A main characteristic of CMOS logic is that it requires primarily dynamic power while its static power consumption is almost zero (see Figure 1.2). The dynamic power consumption is caused by transitions of logic signals that occur in the CMOS circuit. The type and the probability of signal transitions depend on the logical function of a circuit and on the processed data. As a result, the power consumption of a CMOS circuit depends on the data that is being processed and hence, Differential Power Analysis (DPA) attacks as described in [99] are possible. For current CMOS technology, power consumption of a circuit is mainly contributed by dynamic power consumption and static power leakage consumption. Analyzing these types of power consumption helps us to find a solution on better controlling the total power consumption. The sources of dynamic power information leakage come mainly from two categories: 5.

(24) Chapter 1. Introduction. • A digital circuit functions by evaluating the input voltage level and setting the output voltage level based on the input through a set of logic gates. In current CMOS technology, the logic value of a gate actually depends on the amount of charge stored on the parasitic capacitor at the output of the logic gate. A fully-charged capacitor represents logic-high (logic-1) whereas a depleted capacitor represents logic-low (logic-0). For each binary logic gate, there can only be four types of transitions on the gate output wire. Only one transition, from logic-0 to logic-1, actually draws current from the power supply to charge up the parasitic capacitor. By monitoring the amount of current consumed by the digital circuit at all times, we can get an idea on the relative amount of logic gates that are switching at any given time. This gives us some information about the circuit based on power consumption.. • Parasitic capacitances are not uniform for every gate. They depend on the type of gate, fanout of the gate, and also the length of the wire or net in between current gate and its driven gates. Taking the length of wires as an example, even if two exactly same gates with same number of fanout are connected to same set of successor gates, if the routing of the wires are different, the capacitance will differ. If both had a power-consuming transition, the amount of power consumed will be different, thus leaking important power information about the circuit. Static power leakage consumption is a characteristic of the process used to manufacture the circuit. The exact amount of leakage for a given gate within a circuit is not controllable by a logic designer. Assuming the gates are manufactured exactly the same, then the static power leakage does not pose a threat. But this is not the case in the real world. Process variation plays an important role in the balancing of static power leakage. As process variation increases, the variation of the amount of charge leaked for every gate during a fixed period of time also increases. Unfortunately, the effect of the static power leakage due to process variations can not be evaluated at design time, thus can only be seen through actually measurements on the finished product, whether it is an ASIC design or a design for FPGA.. The current consumed by the circuit can be measured directly, using a resistor, or indirectly, measuring the EM radiation generated with high-sensitive scopes. It might even generate noise that depends on the transitions. Moreover, even if there is no information available on the details of the current footprint, the implementation might execute different functional units or instructions depending on critical data. It is another source of information leakage when an attacker can observe these imbalances, even from a remote place. A difference in the instruction path executed could be translated in a difference in the time needed to complete the encryption algorithm, if different paths are not balanced in time. Even if that is the case, if the consumption pattern is different, an attacker might be able to distinguish which path has been selected, leaking information about the data that has been manipulated. 6.

(25) 1.3. Overview of Countermeasures. 1.3. Overview of Countermeasures. Countermeasures against side-channel attacks range among a large variety of solutions. However, in the present state-of-the-art, no single technique allows to provide perfect security. Protecting implementations against physical attacks consequently intend to make the attacks harder. In this context, the implementation cost of a countermeasure is of primary importance and must be evaluated with respect to the additional security obtained. Side-channel attacks work because there exists some leakage that depends on intermediate values of the executed cryptographic algorithm. Therefore, the goal of a countermeasure is to avoid or at least to reduce these dependencies. Depending on the way these dependencies are avoided, current countermeasures against side-channel attacks can be classified into two main families: hiding techniques and masking techniques. In the case of hiding, data-dependent leakages are avoided by breaking the link between the leaked magnitude and the processed data values. Hence, protected devices execute cryptographic algorithms in the same way as unprotected devices, but hiding countermeasures make it difficult to find exploitable information in the power traces. The power consumption of a cryptographic device can be made independent of the processed data values in two different ways: by randomizing the power consumption in each clock cycle, or by equalizing the power consumption in every clock cycle. And both techniques can be applied in the time dimension (by shuffling operations or inserting dummy operations, for example) and the amplitude dimension (by increasing the non-data-dependent power consumption or by reducing the data-dependent power consumption). In most cases these countermeasures imply increasing significantly the required resources, but they are quite robust and attack-independent. On the other hand, masking techniques attempt to remove the correlation between power consumption and secret data by randomizing the power consumption such that the correlation is destroyed. This is achieved by concealing intermediate values with a random number (mask). The operation performed with the mask depends on the cryptographic algorithm, but it is usually the Boolean exclusive-or function, the modular addition, or the modular multiplication. An advantage of this approach is that it can be implemented at the algorithm level without changing the power consumption characteristics of the cryptographic device. It can also be implemented at the logic level.. 1.4. FSR-based Ciphers. FSR-based ciphers are included in security components of portable wireless digital communication systems, including GSM’s A5/1 [35], Bluetooth’s E0 [161] and RKE implementations with KeeLoq encryption systems [93], that must cipher every single bit transmitted without a significant performance impact, battery-life reduction or buffering needs. 7.

(26) Chapter 1. Introduction. In 1945, Shannon proved that if a sequence of plaintext digits is combined, one at a time, with a keystream composed of a (truly random) sequence of digits, the system is perfectly secure in the sense that the resulting ciphertext digits give no information about the plaintext to a passive eavesdropper [160]. In practice, however, the implementation of such a system, known as one-time pad, is quite cumbersome, not only because the key must be generated fully at random, but also because its size must be at least the same length as the plaintext, it can not be ever reused, neither in whole nor in part, and must be kept secret, with the evident difficulties associated with the tasks of concealing and securely distributing such long keys to the parties. Although one-time pads have been used sometimes in the context of espionage (and not always in the right way), simpler schemes that reduce these problems with the keys have been developed: instead of generating, storing and transmitting those as-long-as-the-plaintext keys, keystreams that get combined with the plaintext digits as in the one-time pad case are pseudorandomly generated using some kind of finite state automaton whose initial state is determined by a seed that now constitutes the new, much smaller, secret key. Although the keystreams are not truly random anymore, but pseudorandom (if you know the current state you can calculate the next state), and thus Shannon’s proof of security no longer holds, the results are usually good enough. For the internals of the finite state automatons that generate the keystreams, shift registers are generally used because they have very low hardware complexity, low power consumption and they are very fast. This makes them specially suitable for batteryoperated devices that need to provide some kind of encryption, such as Wireless Sensor Networks (WSN) or RKE applications. A shift register is a cascade of registers sharing the same clock, where the output of one register is directly connected to the input of the next one in the chain. The register in one end is configured as the output of the shift register while the one in the other end is its input. To be used as a keystream generator, the shift register’s input bit is generated as a linear combination —usually an exclusive-or (XOR)— of the previous state of the shift register. So, this linear combination of some bits of the overall shift register constitutes the feedback that generates the next input bit, and the whole system, structured in this way, is a Linear Feedback Shift Register (LFSR). The characteristics of LFSRs are quite favorable. Their mathematics are well known, they generate sequences (i.e., there is an integer such that the output bit st = st+ ∀t) with good statistical properties, and their period is quite long: = 2n for a LFSR of length n. For the keystream to be secure it can not be reused, neither in whole nor in part. So, given the output of the LFSR is a sequence of period , it is necessary that be very long in order to avoid repeating the sequence before the plaintext finishes, and that requirement is met. The linear feedback origin of those nice properties is also the cause of a fundamental weakness: by observing just 2n output bits of the LFSR it is possible, using linear algebra or the ad-hoc Berlekamp-Massey algorithm [120], to reconstruct the feedback structure and internal state of the LFSR, breaking the encryption algorithm. To avoid this, several solutions exist that in one way or another try to hide the linearity of an internal LFSR converting them in Non-Linear Feedback Shift Register (NLFSR). FSR-based algorithms combine pseudorandom sequence generators with input data 8.

(27) 1.4. FSR-based Ciphers. Figure 1.3: LFSR vs NLFSR. which needs to be secured and a key known by transmitter and receiver to generate a ciphertext. There are two main families of symmetric key encryption algorithms that can use FSR structures: Stream ciphers and Block ciphers. Stream ciphers apply the algorithm and secret key to each binary digit in an input data stream, updating the state of the FSR after each encoding. The goal of Stream ciphers is to generate different outputs for the same input if computed at different instants. As the algorithms depend on pseudorandom sequences, both transmitter and receiver need to be initialized with the same seed to be synchronized, and the seed should not be repeated. The algorithm uses a secret symmetric key, which both devices share, and an initial value (IV), which is shared at the beginning of synchronization process, and should not be repeated. On the other hand, Block ciphers are deterministic encryption algorithms that always generate the same output for a given input, via repeated application of an invertible transformation. The transformation is known as round and it is applied several times, being the number of rounds a characteristic of the algorithm. Applications using block ciphers typically introduce counters in the input in order to avoid an exact replica of the message. The FSR-based block ciphers use the FSR structure inside the round transformation. The main difference with Stream Ciphers is that the FSR structure has no history; i.e., the state nor the seed have information about previous executions. The algorithm does not require synchronization mechanisms. However, the system using the algorithm might require synchronization if a counter is used inside the input data and it is not provided in plaintext. The algorithms can be divided according to the interaction between the FSR, the key and the input data. They can be combined “a priori”, generating a pseudorandom sequence that depends on the key and input data, or “a posteriori”, obtaining a value from the pseudorandom sequence generator, customized by a symmetric key, that is combined with the input data to obtain the ciphertext. There have been successful practical attacks performed over most of these encryption algorithms, that will be presented in Section 2.3. There are only few countermeasure proposals for these algorithms, and most of them suggest the usage of secured logic, described in Section 2.2, more appropriate for hardware implementations. 9.

(28) Chapter 1. Introduction. 1.5. Objectives. The objective of this Ph.D. thesis is to provide countermeasures to increase the resistance against side-channel attacks of FSR-based ciphers. The research is focused on hiding countermeasures, introducing randomness in the execution, that can be automatically applied by the compiler. The research is directed to obtain good countermeasures based on performance-security and complexity-security trade-offs (from the developer point of view). The main goal is to obtain automatic countermeasures (or, at least, with low complexity overhead for the developer), and suitable to be applied to existing architectures. There are three methods that will be considered and evaluated: • Co-processor: many embedded systems have an external co-processor that performs operations. This co-processor is typically memory mapped or connected through an internal bus. We will propose and evaluate countermeasures using the external coprocessor to perform the algorithm that manipulates the sensitive information. • Existing compiler optimizations: We will analyze and evaluate the effect of existing compiler optimizations in the resistance against SCA. A profiling can be done, prior to distribution, to check which optimization pass combination provides better tradeoff between performance and security for a concrete software implementation of an encryption algorithm. • New compiler optimizations: the objective in area is to propose new compiler optimizations that take data dependency into account. The hypothesis is that random execution of operations increases the resistance against SCA. At compile time, transformations can be done to the software implementation to have more operations or input data to be chosen. We must implement bit-level dataflow analysis for FSR-based cipher algorithms. Bit-level dataflow analysis in software compilers has not been done before to the best of our knowledge.. 1.6. Contributions. The first contribution of this Ph.D. thesis is the implementation of a framework that automatizes the evaluation of the resistance of software implementations against SCA. The framework is customizable to include different analysis or leakage models. It is divided in blocks which are specified for a concrete solution: compiler, possible optimizations, target and attack. There are other contributions regarding the blocks of the framework: • We have implemented a modular co-processor simulator with flexible accuracy. The modules that do not provide leakage information are implemented using a high level language while the modules manipulating sensitive data are simulated with clock cycle accuracy. We use it to simulate the SORU2 co-processor. 10.

(29) 1.6. Contributions. • We have adapted an existing instruction accurate simulator for the target MSP430 to be cycle accurate. There were no cycle accurate simulators for MSP430 when this thesis started.. • We have introduced a new differential side-channel attack, named Differential Correlation Power Attack, in order to avoid the effect of “ghost peaks” in FSRbased algorithms. These algorithms manipulate at multiple instants data that is very similar. It is easier to find correlations between traces not related to the target data manipulation. This correlation peaks mask the correlation factor target of the attack and reduces the efficiency of the attacks.. The second contribution of this Ph.D. thesis is the proposal of using a reconfigurable vector unit to implement FSR-based algorithms with higher security. We have cooperated in the design and simulation of SORU, a reconfigurable vector unit for embedded systems. SORU is a co-processor which can be connected to any target. We have implemented the control mechanisms in the compiler to configure the SORU co-processor. We have proposed and evaluated different configurations using our framework. While the basic ones are very vulnerable to SCA, we have introduced randomness in the execution that reduces this vulnerability. The third contribution of this thesis is the framework to evaluate existing compiler optimization passes and its effect on the security against SCA. We conclude that there are no generic optimization sequences that make an unsecured implementation to be secure. However, there are differences in the resistance against SCA of implementations generated using different optimization passes, and not optimizing nor using the standard optimization sequence are not the best solution. We propose a mechanism to introduce randomness in the execution path using generic compiler optimizations, reducing the vulnerability against SCA. The mechanism switches randomly between software implementations generated from selected optimization sequences. The fourth contribution of this thesis is the introduction of a new compiler optimization that makes FSR-based to be secure. To the best of our knowledge, it is the first time that a FSR-based algorithm has been implemented in software using bitslice to increase the speed of execution. We have implemented an optimization pass that automatically generates the bitslice implementation from a Look-Up Tables (LUT) definition. Previous bitslice implementations of other encryption algorithms are considered a countermeasure against SCA. However, we have proven that the bitslice implementation we propose leaks no timing information but it is more vulnerable than conventional implementations to statistical analysis. Therefore, we present an implementation derived from the bitslice implementation that is more secure against both SCA. Thus, the global contribution of this thesis is the introduction of randomness to improve the security against SCA of FSR-based algorithms. We exploit the hardware resources available and provide previously unexplored mechanisms to secure the implementations. 11.

(30) Chapter 1. Introduction. 1.7. Publications. The results of the thesis, together with other related research, have been published in international conferences and journals. The list of publications can be divided in three major categories, the first one being the proposal using reconfigurable hardware, the second one consists of the implementations of the compiler optimization based security, while the third one is the research on WSN, applications, frameworks and security. The aim of this Section is to present these publications. The preliminary overview on state of the art in the field of SCA and related countermeasures done for this thesis has been published as a book chapter: José Manuel Moya, Juan-Mariano de Goyeneche, Pedro Malagón. Security Issues in SoC Communication. Book chapter in Communication Architectures for Systems-on-Chip (Embedded Systems Series). 2010, CRC Press.. 1.7.1. List of publications on security using Reconfigurable Hardware. Jose M. Moya, Javier Rodríguez, Julio Martín, Pedro Malagón, Juan C. Vallejo, Alvaro Araujo, Juan-M. de Goyeneche, Agustín Rubio, Elena Romero, Daniel Villanueva, Octavio Nieto-Taladriz, Carlos A. Lopez Barrio. A low-power reconfigurable architecture for adaptable embedded systems. In Proceedings of the HiPEAC Workshop on Reconfigurable Computing (WRC 2009), Paphos, Cyprus. January 2009. Jose M. Moya, Javier Rodríguez, Julio Martín, Pedro Malagón, Juan C. Vallejo, Alvaro Araujo, Juan-M. de Goyeneche, Agustín Rubio, Elena Romero, Daniel Villanueva, Octavio Nieto-Taladriz, Carlos A. Lopez Barrio. SORU: A reconfigurable vector unit for adaptable embedded systems. In Proceedings of International Workshop on Applied Reconfigurable Computing (ARC 2009), Karlsruhe, Germany. March 2009. Jose M. Moya, Zorana Bankovic, Álvaro Araujo, Juan-Mariano de Goyeneche, Marina Zapater, Pedro Malagón, David Fraga, Juan Carlos Vallejo, Elena Romero, Javier Blesa, Daniel Villanueva, Octavio Nieto-Taladriz, Carlos A. López-Barrio. The SORU2 Reconfigurable Coprocessor and Its Applications for Embedded Systems Security. In Proceedings of the 25th Conference on Design of Circuits and Integrated Systems (DCIS 2010), Lanzarote, Spain. November 2010 Marina Zapater, Pedro Malagón, Jose M. Moya, Juan-Mariano de Goyeneche, Álvaro Araujo, David Fraga, Juan Carlos Vallejo, Elena Romero, Javier Blesa, Daniel Villanueva, Octavio Nieto-Taladriz, Carlos A. López-Barrio. System simulation platform for the design of the SORU reconfigurable coprocessor In Proceedings of the 25th Conference on Design of Circuits and Integrated Systems (DCIS 2010), Lanzarote, Spain. November 2010. Moreover, a patent was applied in Spain using SORU for security. Inventores (p.o. de firma): José Manuel Moya Fernández, Álvaro Araujo Pinto, Octavio Nieto-Taladriz García, 12.

(31) 1.7. Publications. David Fraga Aydillo, Juan-Mariano de Goyeneche y Vázquez de Seyas, Juan Carlos Vallejo López, Pedro Malagón Marzo, Agustín Rubio Mingorance, Elena Romero Perales, Daniel Villanueva González. Dispositivo para la mejora de la seguridad de microprocesadores en sistemas empotrados. N. de patente: P200900442 My contribution to these articles, that include many authors, is centered on three aspects: 1. Simulator: I have contributed to the development of the SystemC + TLM2.0 simulator, specially the Load/Store Units and the execution stage of SORU, including the instruction definition and parser. 2. Compiler support: I have developed the mechanism to communicate the main processor with the SORU simulator using intrinsics of the LLVM compiler. 3. BRU configuration: I have developed the SORU configurations considered in this Ph.D. thesis, including the secured KeeLoq implementation.. 1.7.2. List of publications on security using Compiler Optimizations. Pedro Malagón, Juan-Mariano de Goyeneche, Marina Zapater, Jose M. Moya, Zorana Bankovic, David Fraga. Effects of compiler optimizations on side-channel attacks. In Proceedings of the 26th Conference on Design of Circuits and Integrated Systems (DCIS 2011), Albufeira, Portugal, November 2011. Pedro Malagón, Juan-Mariano de Goyeneche, Marina Zapater, Jose M. Moya, Zorana Bankovic. Improving resistance against side-channel analysis in low-cost devices using compiler optimizations. In Proceedings of the 5th International Symposium on Ubiquitous Computing and Ambient Intelligence (UCAmi 2011). Riviera Maya, México, December 2012. Pedro Malagón, Juan-Mariano de Goyeneche, Marina Zapater, Jose M. Moya, Zorana Bankovic. Compiler Optimizations as a Countermeasure against Side-Channel Analysis in MSP430-Based Devices. Sensors 2012, 12, 7994-8012. Pedro Malagón, Juan-Mariano de Goyeneche, Jose M. Moya. Exploiting parallelism opportunities in non-parallel architectures to improve NLFSR software implementatios. In Proceedings of the 27th Conference on Design of Circuits and Integrated Systems (DCIS 2013), San Sebastián, Spain, November 2013.. 1.7.3. List of publications on WSN. Pedro Malagón, Juan C. Vallejo, Jose M. Moya, Alvaro Araujo, Octavio NietoTaladriz. Dynamic environment evaluation for reliable AmI applications based on 13.

(32) Chapter 1. Introduction. untrusted sensors. In Proceedings of the International Conference on Emerging Security Information, Systems and Technologies, Valencia, Spain. June 2007. Silvia Jiménez, Antonio Cobo, Alvaro Araujo, Pedro Malagón, Octavio Nieto-Taladriz, Paula de Toledo, Francisco del Pozo. Wireless Sensor Network to Support Home Care. Book chapter in Encyclopedia of healthcare information systems, 2008 Jose M. Moya, Juan C. Vallejo, Pedro Malagón, Álvaro Araujo, Juan-M. de Goyeneche, Octavio Nieto-Taladriz. A scalable security framework for reliable AmI applications based on untrusted sensors . In Proceedings of the International Conference on Wired/Wireless Internet Communications (WWIC 2009), Twente, Netherlands. May 2009 Javier Blesa, Pedro Malagon, Alvaro Araujo, Jose M. Moya, Juan C.Vallejo, Juan M. de Goyeneche, Elena Romero Daniel Villanueva Octavio Nieto-Taladriz. Modular framework for smart home applications. In Proceedings of the International Workshop on Ambient Assisted Living in IWANN’09 (IWAAL 2009), Salamanca, Spain. June 2009. Arash Parsa, Ali O. Ercan, Pedro Malagon, Fred burghardt, Jan Rabaey, Adam Wolisz. Connectivity Brokerage: From Coexistence to Collaboration. In Proceedings of the IEEE Radio and Wireless Symposium (RWS 2010), New Orleans, USA. January 2010 Javier Blesa, Pedro Malagón, Juan-Mariano de Goyeneche, Jose M. Moya, Alvaro Araujo. Distributed platform for wireless sensor networks monitoring. In Proceedings of the 2nd International Workshop on Ambient Assisted Living in CEDI2010 ( IWAAL 2010), Valencia, Spain. September 2010. Zorana Bankovic, David Fraga, José Manuel Moya, Juan Carlos Vallejo, Álvaro Araujo, Pedro Malagón, Juan-Mariano de Goyeneche, Daniel Villanueva, Elena Romero, Javier Blesa. Detecting and Confining Sybil Attack in Wireless Sensor Networks based on Reputation Systems coupled with Self-organizing Maps. In Proceedings of the 6th IFIP Conference on Artificial Intelligence Applications & Innovations (AIAI’10), Larnaca, Cyprus. October 2010. Zorana Bankovic, Juan Carlos Vallejo, Pedro Malagón, José M. Moya, Álvaro Araujo. Eliminating Routing Protocol Anomalies in Wireless Sensor Networks using AI Techniques. In Proceedings of the 3rd Workshop on Artificial Intelligence and Security (AISec’10), Chicago, USA. October 2010 Iván Álvarez; Pedro Malagón; Marina Zapater; Juan-Mariano de Goyeneche; José M. Moya. RFID performance in localization systems. In Proceedings of the 3rd International Workshop on Ambient Assisted Living (IWAAL 2011), Málaga, Spain. June 2011 Zorana Bankovic, José M. Moya, Juan Carlos Vallejo, David Fraga, Pedro Malagón. Holistic Solution for Confining Insider Attacks in Wireless Sensor Networks using Rep14.

(33) 1.8. Structure. utation Systems coupled with Clustering Techniques. In Proceedings of 10th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom 2011), Changsha, China. November 2011 Marina Zapater, David Fraga, Pedro Malagón, Zorana Bankovic, and Jose M. Moya. Self-organizing maps versus growing neural gas in detecting anomalies in data centers. Logic journal of the igpl 2014.. 1.8. Structure. The methodology followed to complete this Ph.D. thesis can be divided in three stages: basic knowledge, analysis framework setup and countermeasure research and contributions. The Ph.D. thesis is structured according to the stages of the methodology. The basic knowledge needed and related work revised is presented in Chapter 2. Firstly, the chapter offers a panoramic view of the existing side-channel attacks and its fundamentals. A survey is mandatory in order to offer appropriate contributions. Secondly, a survey of existing countermeasures at different levels of abstraction of the electronic system is presented. At last, this chapter presents existing attacks and countermeasures on FSRbased ciphers, which are concrete applications of some of the previous described attacks and countermeasures. The analysis framework setup is described in Chapter 3. The framework described includes the complete process applied to evaluate the resistance against SCA. The experiments applied in this thesis are applied to concrete targets and encryption algorithms, described in detail in Chapter 3. The countermeasure research and contributions are presented in Chapters 4, 5 and 6. Chapter 4 presents a contribution based on the inclusion of a pipelined co-processor. Chapter 5 presents the effect of different compiler optimizations on the resistance against SCA and proposes combinations of optimization passes to increase this resistance. Chapter 6 presents an implementation style that increases resistance against SCA with small overhead or even performance enhancement. This style can be applied automatically using a new compiler optimization pass. Finally, Chapter 7 draws some conclusions on the contributions of this Ph.D. thesis.. 15.

(34) Chapter 1. Introduction. 16.

(35) Chapter 2. Related work This chapter reviews the related work published in the area of power side-channel analysis. The most important existing attacks are described, with details on their structure and the information leakage they exploit. The main existing countermeasures are presented, divided by the abstraction level at which they are applied. This chapter reviews with special emphasis the attacks and countermeasures applied to FSR-based ciphers.. 2.1. Power Analysis Attacks. Side-Channel Attack (SCA) are non-invasive attacks that exploit information nonintentional leaked from the physical environment of a cryptosystem to acquire knowledge about its secrets. These robust attacks are often called “external monitoring attacks” As we mention in section 1.1 we focus on Power Analysis Attacks (PAA), which exploit the physical dependency of the power consumption of a device and the data it is processing. Power Analysis attacks are performed by measuring the power consumption of a device as it operates, and then using these measurements to determine secret information processed (such as secret keys and/or user PINs). The attacks and countermeasures available for power analysis can also be applied to Electro-Magnetic Analysis (EMA) [71]. The information leaked in both cases is related to the current consumption. Electro-Magnetic Analysis (EMA) typically provides lower Signal-Noise Ratio (SNR) in the measurement, although it can provide more local information, considering only the leakage in a concrete hardware element of the device. Throughout the Ph.D thesis we mention PAA although we consider also EMA. In this section we describe the most important PAA according to the process applied to the measurements in order to increase the knowledge on the secret data. Simple Power Analysis (SPA) attacks recover the secret keys from direct observation of individual power consumption measurements. They are most effective when there is a significant amount of sensitive information leakage. Stochastic methods create a precise model of the target by observing the execution of the algorithms in an identical device. The secret can be recovered by comparing an 17.

(36) Chapter 2. Related work. individual measurement with the model previously created. DSCA employ statistical techniques to extract information from multiple power consumption measurements. They are highly effective at extracting secrets even when the information available within any individual measurement is much smaller than unknown electrical activity, measurement error, and other noise sources. The most relevant DSCA are DPA, CPA and Mutual Information Analysis (MIA). Under these attacks, the device under attack performs its ordinary cryptographic processing operations. As a result, the attacks generally cannot be stopped through traditional anti-tamper mechanisms such as intrusion sensors or other attack detectors. The above mentioned attacks are effective against small single-chip devices, large SoCs, and multi-chip products. For systems where the cryptographic processing is only a small contributor to the overall variation in power consumption, SPA is typically used to detect the moment where it is being executed to reduce the size of each observation. These SCA are normally classified as requiring a low to moderate degree of attacker sophistication. The hardware typically used for the process consists of a PC and a digital storage oscilloscope. Suitable oscilloscopes are widely available, and are sold used for under $500. Once automated, SPA and Stochastic attacks are virtually instantaneous, and typical DSCA attacks on unprotected devices take a few minutes to a few hours to complete. Both SPA and DSCA are serious threats to wireless nodes and, as presented in Section 2.3, have been applied to obtain information on real devices executing FSR-based algorithms. We describe them in order to detect the vulnerabilities they exploit and select an appropriate evaluation method for our proposals.. 2.1.1. Simple Power Analysis. Simple Power Analysis (SPA) is a side-channel attack first introduced by Kocher et al. in [101] as “a technique that involves directly interpreting power consumption measurements collected during cryptographic operations”. The goal of SPA attacks is to obtain information about the device under attack working from few power traces, even just one. The information revealed covers from the algorithm to the cryptographic key in a completely successful attack. Let’s suppose the attacker localizes an instant when an instruction that manipulates sensitive information is executed (e.g. load part of the secret key to the accumulator). Depending on the Hamming Weight (HW) (number of ’1’) of the key data manipulated, the amplitude of the power trace in that instant varies. If the attacker is expert and has a consumption reference model, he can estimate the HW of the key from the power amplitude. Another scenario could be an implementation where the instructions executed depend on data (e.g. conditional branch depending on a bit value). If the attacker has information about the implementation and localizes the execution of the algorithm in the power trace, it can derive the data processed from the duration of the cycles. If the execution duration is different for different instructions, it is possible to assign sections of the power trace to 18.

(37) 2.1. Power Analysis Attacks. concrete instructions executed. We have two different sources for an attack, although authors typically refer SPA to the amplitude based ones. SPA attacks require detailed knowledge about the implementation of the algorithm in the device. The attack process starts with a thorough analysis of the target device and its implementation of the algorithm. Useful information includes algorithm implementation, points of interest or the target device. We can find descriptions of profiling methods since 1999, when Biham and Shamir [28] described a method to map parts of a power trace to the key scheduling operation of AES algorithm. Power traces of a device executing the AES algorithm were analyzed in a device similar to the target of the attack. Profiling does not require a lot of traces of the device under attack, which is one of the limitations that SPA overcomes, but traces of similar devices available are needed to get experience and create a model. In [68], Fahn and Pearson describe the profiling stage of their attack Inferential Power Analysis (IPA). In the process, they included in the profiling stage every round of the Data Encryption Standard (DES) algorithm, as it is the same code with different arguments. In [122] and [6], the authors describe the process followed to extract a model of an 8-bit microcontroller before attacking a device. In [7], authors obtain their own model of smart-cards from experience after realizing that stated models were not suitable for their devices. Simple Power Analysis includes three major attack families: Visual Inspection, Template and Collision Attacks. Visual Inspection requires great personal knowledge from the attacker about the implementation of the algorithm and the device. An example is [101], where the authors highlight the visual recognition of the DES algorithm and its 16 rounds. In higher resolution views authors point out different rotations of the key based on the repetition of a concrete pattern inside round power trace. Moreover, they distinguish between instructions, so they can conclude if a conditional branch skips a jump instruction. All this results can lead to useful information for other attacks, even if they fail in the extraction of the key. It is possible to reverse engineering the code executed by a device when executing the encryption algorithm using SCA, a technique known as SCA Reverse Engineering (SCARE). It is first applied in [146] using Self-Organized Maps. In [66] authors extract the code executed of an 8-bit PIC microcontroller with hidden Markov Models. If the software implementation is known and there are data-dependent or key-dependent branches, using this technique it is possible to distinguish which path has been chosen, which gives information on the data manipulated. Template attacks were introduced by Chari et al. [43]. In a template-based power analysis attack, the attacker is assumed to know the power consumption characteristics of some instructions of a device. This characterization is stored and called template. Templates are then used as follows. In a template-based DPA attack, the attacker matches the templates based on different key hypotheses with the recorded power traces. The templates that match best indicate the key. This type of attack is the best attack in an 19.

(38) Chapter 2. Related work. information theoretic sense, see [43]. Template attacks compare the power trace of the attack with templates created from previous analysis following the maximum-likelihood decision rule. In the profiling stage, the characterization phase, a template is created as a multivariate normal distribution, defined by its mean vector and its covariance matrix, from power traces. We can have templates for a pair instruction-operand, or for a pair data-key. Once every possible value has its template (e.g. every pair key-value with its template) the comparison with the power trace of the device under attack can be done. This second stage, the matching phase, involves calculating the probability density function of the multivariate normal distribution with every template. The template with the highest probability indicates the correct key. Some difficulties emerge when considering the practical characterization of a device. Power traces from the same source data (instruction-operand, data-key) are grouped, the points of interest are set and the mean vector and covariance matrix are calculated. More interesting points involve more information. On the other hand, they grow the covariance matrix quadratically. The attacker must arrive at a compromise solution depending on the device under attack. These attacks were first described in [43]. Collision attacks exploit the coincidence of an intermediate value in two different encryption runs. If two different plaintexts (or ciphertexts) have a common intermediate value detected through SPA, the collection of possible key values is reduced to a subset. There are more than one point to detect the collision of the two encryptions, because intermediate values are manipulated in more than one point: load into accumulator, operate, save in memory, ... To detect a collision, the previous attacks are used (mainly template attack). Collision attacks were first applied by Wiemers and Schramm et al. [159], identifying collisions in a power trace of a DES implementation, following to its application to AES [157]. These attacks were enhanced by [108] including almostcollisions in the attack (much more points of interest reducing possible key values) for Feistel ciphers. In order to reduce the number of traces required, the attacker also looks for one-byte collisions inside one execution. In [29], the target is the S-box transformation of AES, which is executed 16 times per round, 160 times per execution. The attack is has 99% probability of success with only 7 traces, known plaintext and 234.74 offline operations. The cryptanalytic method employed modifies the operations needed offline and the power of the attack. In [29] a linear systems of equations of the operations that collide is used, from the linear operations of the S-box. In [33] a set of non-linear systems of equations is built using linear and non-linear operations of the S-box. The collision detection might not be feasible with high noise level. Multiple-Differential Side-Channel Collision Attack (MDCA) [31] presents the combinations of methods to detect collisions with the abovementioned cryptanalytic methods. The proposed methods include average, binary and ternary voting, and several measurement with the same input are required, so it is unfeasible in some scenarios. The binary voting scheme consists on discarding the traces that differ over a set threshold from the rest before averaging the trace. The ternary voting performs the voting comparing with a reference trace, with a profiling stage.. 20.

No results found