Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • High-Speed NTT Accelerator ...
    Nguyen, Trong-Hung; Kieu-Do-Nguyen, Binh; Pham, Cong-Kha; Hoang, Trong-Thuc

    IEEE access, 2024, Letnik: 12
    Journal Article

    The efficiency of polynomial multiplication execution majorly impacts the performance of lattice-based post-quantum cryptosystems. In this research, we propose a high-speed hardware architecture to accelerate polynomial multiplication based on the Number Theoretic Transform (NTT) in CRYSTAL-Kyber and CRYSTAL-Dilithium. We design a Digital Signal Processing (DSP) architecture for modular multiplication in butterfly and Point-Wise Multiplication (PWM) operations. Our method reduces the critical path delay of an <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>-bit multiplier to that of a (<inline-formula> <tex-math notation="LaTeX">2n </tex-math></inline-formula>-2)-bit adder, optimizing both area and speed. These dedicated DSPs are employed in butterfly and PWM operations, completely eliminating the pre-process and post-process of NTT transforms. Furthermore, we introduce a novel unified pipelined architecture for the NTT and Inverse NTT (INTT) transformations of Kyber and Dilithium, with corresponding high-speed (Radix-2) and ultra-high-speed (Radix-4) versions. Lastly, we construct a complete hardware accelerator for polynomial matrix-vector multiplication in Kyber. The Field-Programmable Gate Array (FPGA) implementation results have proven that our designs have significantly improved execution time by <inline-formula> <tex-math notation="LaTeX">3.4\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">9.6\times </tex-math></inline-formula> for the NTT transforms in Dilithium and <inline-formula> <tex-math notation="LaTeX">1.36\times </tex-math></inline-formula>-<inline-formula> <tex-math notation="LaTeX">34.16\times </tex-math></inline-formula> for Kyber polynomial multiplication, compared to previous studies reported to date. Additionally, the hardware footprint results indicate that our proposed architectures exhibit superior hardware performance in Area-Time-Product (ATP), corresponding to a 44%-96% improvement. The proposed architectures are efficient and well-suited for high-performance lattice-based cryptography systems.