摘要
Public key cryptography plays a vital role in the security of nowadays global digital information systems. However, with the development of quantum computing and the emergence of Shor's algorithm, the security of public key cryptography has been potentially greatly threatened. Therefore, cryptographic algorithms that can resist the attack from an adversary even has access to a quantum computer have begun to attract the attention of the cryptography community. The National Institute of Standards and Technology (NIST) has launched a global solicitation for the post-quantum cryptography algorithms standard.Among the participating algorithms, the lattice-based algorithm scheme achieves a good trade-off in security, key size, and operation speed, so it is the most potential post-quantum encryption algorithm scheme. CRYSTALS-KYBER, as a lattice-based Key Encapsulation Mechanism (KEM) algorithm, passed three rounds of the global solicitation for post-quantum cryptography algorithms standard. For post-quantum cryptographic algorithms, the hardware implementation efficiency of the algorithm is an important evaluation index. Therefore, this article explores the realization and optimization space of hardware design for the three main modules of CRYSTALS-KYBER (Key generation, key encapsulation, and key decapsulation) under different parameter sets, using the high-level synthesis tools (High-level synthesis, HLS). As a high-level hardware design method, HLS can be used to efficiently and conveniently explore the hardware implementation of different algorithms. This paper uses the HLS tools to analyze the software implementation of CRYSTALS-KYBER, and try different combination strategies to optimize the HLS hardware implementation results, and finally obtain the most optimized hardware structure. At the same time, this paper provides a tcl-perl collaboration script to automatically search for the optimal optimization strategy and obtain the optimal hardware structure. The experimental results show that the performance of the obtained hardware can be greatly improved by moderately optimizing the loops and timing constraints. In comparison with the state-of-the-art software implementation, this paper shows an obvious performance advantage. In comparison with the state-of-the-art HLS implementation, our optimizations of Kyber-512 improve the performance by up to 75% for key encapsulation algorithm and 55.1% for key decapsulation algorithm. And compared with the baseline, the performance was improved by 44.2% in the key generation algorithm. For the other two parameter sets (Kyber-768 and Kyber-1024), the same optimization effect is obtained.
-
单位中国科学院大学; 计算机体系结构国家重点实验室