High Throughput Lattice-Based Signatures on GPUs: Comparing Falcon and Mitaka,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High Throughput Lattice-Based Signatures on GPUs: Comparing Falcon and Mitaka
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2024-02-20 , DOI: 10.1109/tpds.2024.3367319
Wai-Kong Lee ₁ , Raymond K. Zhao ₂ , Ron Steinfeld ₃ , Amin Sakzad ₃ , Seong Oun Hwang ₁

Affiliation

The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was proposed as an alternative to Falcon, allowing parallel execution of most of its operations. These recent advancements motivate us to develop high throughput implementations of Falcon and Mitaka signature schemes on Graphics Processing Units (GPUs), a massively parallel architecture widely available on cloud service platforms. In this article, we propose the first parallel implementation of Falcon on various GPUs. We develop an iterative version of the sampling process in Falcon, which is also the most time-consuming Falcon operation. This allows us to implement Falcon signature generation without relying on expensive recursive function calls on GPUs. In addition, we propose a parallel random samples generation approach to accelerate the performance of Mitaka on GPUs. We evaluate our implementation techniques on state-of-the-art GPU architectures (RTX 3080, A100, T4 and V100). Experimental results show that our Falcon-512 implementation achieves 58,595 signatures/second and 2,721,562 verifications/second on an A100 GPU, which is

$20.03\times$

and

$29.51\times$

faster than the highly optimized AVX2 implementation on CPU. Our Mitaka implementation achieves 161,985 signatures/second and 1,421,046 verifications/second on the same GPU. Due to the adoption of a parallelizable sampling process, Mitaka signature generation enjoys

$\approx 2$

–

$20 \times$

higher throughput than Falcon on various GPUs. The high throughput signature generation and verification achieved by this work can be very useful in various emerging applications, including the Internet of Things.

中文翻译：

GPU 上基于格的高吞吐量签名：比较 Falcon 和 Mitaka

美国国家标准与技术研究所于2017年启动了后量子密码学的标准化进程，旨在选择能够抵御新兴量子计算机威胁的关键封装机制和签名方案。2022年，Falcon被选为标准签名方案之一，最终吸引了人们的努力来优化Falcon在各种硬件架构上的实现以适应实际应用。最近，Mitaka 被提议作为 Falcon 的替代品，允许并行执行其大部分操作。这些最新进展激励我们在图形处理单元 (GPU) 上开发 Falcon 和 Mitaka 签名方案的高吞吐量实现，GPU 是一种在云服务平台上广泛使用的大规模并行架构。在本文中，我们提出了 Falcon 在各种 GPU 上的第一个并行实现。我们在Falcon中开发了一个迭代版本的采样过程，这也是Falcon中最耗时的操作。这使我们能够实现 Falcon 签名生成，而无需依赖 GPU 上昂贵的递归函数调用。此外，我们提出了一种并行随机样本生成方法来加速 Mitaka 在 GPU 上的性能。我们在最先进的 GPU 架构（RTX 3080、A100、T4 和 V100）上评估我们的实现技术。实验结果表明，我们的 Falcon-512 实现在 A100 GPU 上实现了每秒 58,595 个签名和每秒 2,721,562 个验证，这是

$20.03\次$

和

$29.51\次$

比 CPU 上高度优化的 AVX2 实现更快。我们的 Mitaka 实现在同一 GPU 上实现了每秒 161,985 个签名和每秒 1,421,046 个验证。由于采用了可并行采样过程，Mitaka 签名生成享有以下优势：

$\约2$

–

$20 \次$

在各种 GPU 上，吞吐量均高于 Falcon。这项工作实现的高吞吐量签名生成和验证在包括物联网在内的各种新兴应用中非常有用。

更新日期：2024-02-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南