The Kalray MPPA-256 processor is based on a recent low-energy manycore architecture. In this article, we investigate its performance in multiprecision arithmetic for number-theoretic applications. We ...have developed a library for modular arithmetic that takes full advantage of the particularities of this architecture. This is in turn used in an implementation of the ECM, an algorithm for integer factorization using elliptic curves. For parameters corresponding to a cryptanalytic context, our implementation compares well to state-of-the-art implementations on GPU, while using much less energy.
In this paper, we present an efficient FPGA implementation of the SHA-3 hash function candidate Shabal 7. Targeted at the recent Xilinx Virtex-5 FPGA family, our design achieves a relatively high ...throughput of 2 Gbit/s at a cost of only 153 slices, yielding a throughput-vs.-area ratio of 13.4 Mbit/s per slice. Our work can also be ported to Xilinx Spartan-3 FPGAs, on which it supports a throughput of 800 Mbit/s for only 499 slices, or equivalently 1.6 Mbit/s per slice.
According to the SHA-3 Zoo website 1, this work is among the smallest reported FPGA implementations of SHA-3 candidates, and ranks first in terms of throughput per area.
As FPGAs are increasingly being used for floating-point computing, the feasibility of a library of floating-point elementary functions for FPGAs is discussed. An initial implementation of such a ...library contains parameterized operators for the logarithm and exponential functions. In single precision, those operators use a small fraction of the FPGA’s resources, have a smaller latency than their software equivalent on a high-end processor, and provide about ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floating-point operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor without even resorting to parallelism. The presented library is freely available from
http://www.ens-lyon.fr/LIP/Arenaire/.
For applications requiring a large dynamic, real numbers may be represented either in floating-point, or in the logarithm number system (LNS). Which system is best for a given application is ...difficult to know in advance, because the cost and performance of LNS operators depend on the target accuracy in a highly non linear way. Therefore, a comparison of the pros and cons of both number systems in terms of cost, performance and overall accuracy is only relevant on a per-application basis. To make such a comparison possible, two concurrent libraries of parameterized arithmetic operators, targeting recent field-programmable gate arrays, are presented. They are unbiased in the sense that they strive to reflect the state-of-the-art for both number systems. These libraries are freely available at www.ens-lyon.fr/LIP/Arenaire/.
This paper is devoted to the design of fast parallel accelerators for the cryptographic \eta_T pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose ...here a novel hardware implementation of Miller's algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies that we considered to design our multiplier, we point out the intrinsic parallelism of Miller's loop and outline the architecture of coprocessors for the \eta_T pairing over {\bf F}_{2^m} and {\bf F}_{3^m}. Thanks to a careful choice of algorithms for the tower field arithmetic associated with the \eta_T pairing, we manage to keep the pipelined multiplier at the heart of each coprocessor busy. A final exponentiation is still required to obtain a unique value, which is desirable in most cryptographic protocols. We supplement our pairing accelerators with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources. According to our place-and-route results on Xilinx FPGAs, our designs improve both the computation time and the area-time trade-off compared to previously published coprocessors.
Arithmetic Operators for Pairing-Based Cryptography Beuchat, Jean-Luc; Brisebarre, Nicolas; Detrey, Jérémie ...
Cryptographic Hardware and Embedded Systems - CHES 2007,
01/2007
Book Chapter, Conference Proceeding
Recenzirano
Odprti dostop
Since their introduction in constructive cryptographic applications, pairings over (hyper)elliptic curves are at the heart of an ever increasing number of protocols. Software implementations being ...rather slow, the study of hardware architectures became an active research area. In this paper, we first study an accelerator for the ηT pairing over \documentclass12pt{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$\mathbb{F}_3x/(x^{97}+x^{12}+2)$\end{document}. Our architecture is based on a unified arithmetic operator which performs addition, multiplication, and cubing over \documentclass12pt{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$\mathbb{F}_{3^{97}}$\end{document}. This design methodology allows us to design a compact coprocessor (1888 slices on a Virtex-II Pro 4 FPGA) which compares favorably with other solutions described in the open literature. We then describe ways to extend our approach to any characteristic and any extension field.
This paper is devoted to the design of fast parallel accelerators for the cryptographic Tate pairing in characteristic three over supersingular elliptic curves. We propose here a novel hardware ...implementation of Miller’s loop based on a pipelined Karatsuba-Ofman multiplier. Thanks to a careful selection of algorithms for computing the tower field arithmetic associated to the Tate pairing, we manage to keep the pipeline busy. We also describe the strategies we considered to design our parallel multiplier. They are included in a VHDL code generator allowing for the exploration of a wide range of operators. Then, we outline the architecture of a coprocessor for the Tate pairing over \documentclass12pt{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$\mathbb{F}_{3^m}$\end{document}. However, a final exponentiation is still needed to obtain a unique value, which is desirable in most of the cryptographic protocols. We supplement our pairing accelerator with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources.
According to our place-and-route results on Xilinx FPGAs, our design improves both the computation time and the area-time trade-off compared to previoulsy published coprocessors.
In this paper, we focus on the relation collection step of the Function Field Sieve (FFS), which is to date the best algorithm known for computing discrete logarithms in small-characteristic finite ...fields of cryptographic sizes. Denoting such a finite field by F p n , where p is much smaller than n, the main idea behind this step is to find polynomials of the form a(t)-b(t)x in F p tx which, when considered as principal ideals in carefully selected function fields, can be factored into products of low-degree prime ideals. Such polynomials are called "relations", and current record-sized discrete-logarithm computations need billions of those. Collecting relations is therefore a crucial and extremely expensive step in FFS, and a practical implementation thereof requires heavy use of cache-aware sieving algorithms, along with efficient polynomial arithmetic over F p t. This paper presents the algorithmic and arithmetic techniques which were put together as part of a new public implementation of FFS, aimed at medium-to record-sized computations.