ACM Transactions on Reconfigurable Technology and Systems期刊最新论文, 计算机, 其他类期刊,

PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-04-18
Ahmed F. AbouElhamayed, Angela Cui, Javier Fernandez-Marques, Nicholas D. Lane, Mohamed S. Abdelfattah

Conventional multiply-accumulate (MAC) operations have long dominated computation time for deep neural networks (DNNs), especially convolutional neural networks (CNNs). Recently, product quantization (PQ) has been applied to these workloads, replacing MACs with memory lookups to pre-computed dot products. To better understand the efficiency tradeoffs of product-quantized DNNs (PQ-DNNs), we create a

更新日期：2024-04-18

详情收藏

Toward FPGA Intellectual Property (IP) Encryption from Netlist to Bitstream

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-04-12
Daniel Hutchings, Adam Taylor, Jeffrey Goeders

Current IP encryption methods offered by FPGA vendors use an approach where the IP is decrypted during the CAD flow, and remains unencrypted in the bitstream. Given the ease of accessing modern bitstream-to-netlist tools, encrypted IP is vulnerable to inspection and theft from the IP user. While the entire bitstream can be encrypted, this is done by the user, and is not a mechanism to protect confidentiality

更新日期：2024-04-12

详情收藏

HierCGRA: A Novel Framework for Large-Scale CGRA with Hierarchical Modeling and Automated Design Space Exploration

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-04-08
Sichao Chen, Chang Cai, Su Zheng, Jiangnan Li, Guowei Zhu, Jingyuan Li, Yazhou Yan, Yuan Dai, Wenbo Yin, Lingli Wang

Coarse-grained reconfigurable arrays (CGRAs) are promising design choices in computation-intensive domains since they can strike a balance between energy efficiency and flexibility. A typical CGRA comprises processing elements (PEs) that can execute operations in applications and interconnections between them. Nevertheless, most CGRAs suffer from the ineffectiveness of supporting flexible architecture

更新日期：2024-04-08

详情收藏

R-Blocks: an Energy-Efficient, Flexible, and Programmable CGRA

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-04-08
Barry de Bruin, Kanishkan Vadivel, Mark Wijtvliet, Pekka Jääskeläinen, Henk Corporaal

Emerging data-driven applications in the embedded, e-Health, and internet of things (IoT) domain require complex on-device signal analysis and data reduction to maximize energy efficiency on these energy-constrained devices. Coarse-grained reconfigurable architectures (CGRAs) have been proposed as a good compromise between flexibility and energy efficiency for ultra-low power (ULP) signal processing

更新日期：2024-04-08

详情收藏

DANSEN: Database Acceleration on Native Computational Storage by Exploiting NDP

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-04-04
Sajjad Tamimi, Arthur Bernhardt, Florian Stock, Ilia Petrov, Andreas Koch

This paper introduces DANSEN, the hardware accelerator component for neoDBMS, a full-stack computational storage system designed to manage on-device execution of database queries/transactions as a Near-Data Processing (NDP)-operation. The proposed system enables Database Management Systems (DBMS) to offload NDP-operations to the storage while maintaining control over data through a native storage interface

更新日期：2024-04-04

详情收藏

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-04-04
Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang

Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment in inference workloads. While hardware accelerators for Transformer-based models have been extensively studied, the majority of existing approaches rely on temporal architectures that reuse hardware units for different network layers and operators. However

更新日期：2024-04-04

详情收藏

HLPerf: Demystifying the Performance of HLS-based Graph Neural Networks with Dataflow Architectures

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-04-02
Chenfeng Zhao, Clayton J. Faber, Roger D. Chamberlain, Xuan Zhang

The development of FPGA-based applications using HLS is fraught with performance pitfalls and large design space exploration times. These issues are exacerbated when the application is complicated and its performance is dependent on the input data set, as is often the case with graph neural network approaches to machine learning. Here, we introduce HLPerf, an open-source, simulation-based performance

更新日期：2024-04-02

详情收藏

PTME: A Regular Expression Matching Engine Based on Speculation and Enumerative Computation on FPGA

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-04-01
Mingqian Sun, Guangwei Xie, Fan Zhang, Wei Guo, Xitian Fan, Tianyang Li, Li Chen, Jiayu Du

Fast regular expression matching is an essential task for deep packet inspection. In previous works, the regular expression matching engine on FPGA struggled to achieve an ideal balance between resource consumption and throughput. Speculation and enumerative computation exploits the statistical properties of deterministic finite automata, allowing for more efficient pattern matching. Existing related

更新日期：2024-04-01

详情收藏

Design and implementation of hardware-software architecture based on hashes for SPHINCS+

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-27
Jonathan López-Valdivieso, René Cumplido

Advances in quantum computing have posed a future threat to today’s cryptography. With the advent of these quantum computers, security could be compromised. Therefore, the National Institute of Standards and Technology (NIST) has issued a request for proposals to standardize algorithms for post-quantum cryptography (PQC), which is considered difficult to solve for both classical and quantum computers

更新日期：2024-03-28

详情收藏

FADO: Floorplan-Aware Directive Optimization Based on Synthesis and Analytical Models for High-Level Synthesis Designs on Multi-Die FPGAs

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-20
Linfeng Du, Tingyuan Liang, Xiaofeng Zhou, Jinming Ge, Shangkun Li, Sharad Sinha, Jieru Zhao, Zhiyao Xie, Wei Zhang

Multi-die FPGAs are widely adopted for large-scale accelerators, but optimizing high-level synthesis designs on these FPGAs faces two challenges. First, the delay caused by die-crossing nets creates an NP-hard floorplanning problem. Second, traditional directive optimization cannot consider resource constraints on each die or the timing issue incurred by the die-crossings. Furthermore, the high algorithmic

更新日期：2024-03-20

详情收藏

Designing an IEEE-compliant FPU that supports configurable precision for soft processors

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-15
Chris Keilbart, Yuhui Gao, Martin Chua, Eric Matthews, Steven J.E. Wilton, Lesley Shannon

Field Programmable Gate Arrays (FPGAs) are commonly used to accelerate floating-point (FP) applications. Although researchers have extensively studied FPGA FP implementations, existing work has largely focused on standalone operators and frequency-optimized designs. These works are not suitable for FPGA soft processors which are more sensitive to latency, impose a lower frequency ceiling, and require

更新日期：2024-03-16

详情收藏

L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous Platform

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-14
Chaoqiang Liu, Xiaofei Liao, Long Zheng, Yu Huang, Haifeng Liu, Yi Zhang, Haiheng He, Haoyan Huang, Jingyi Zhou, Hai Jin

Due to the high complexity of constructing exact k-nearest neighbor graphs, approximate construction has become a popular research topic. The NN-Descent algorithm is one of the representative in-memory algorithms. To effectively handle large datasets, existing state-of-the-art solutions combine the divide-and-conquer approach and the NN-Descent algorithm, where large datasets are divided into multiple

更新日期：2024-03-14

详情收藏

XVDPU: A High-Performance CNN Accelerator on the Versal Platform Powered by the AI Engine

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-13
Xijie Jia, Yu Zhang, Guangdong Liu, Xinlin Yang, Tianyu Zhang, Jia Zheng, Dongdong Xu, Zhuohuan Liu, Mengke Liu, Xiaoyang Yan, Hong Wang, Rongzhang Zheng, Li Wang, Dong Li, Satyaprakash Pareek, Jian Weng, Lu Tian, Dongliang Xie, Hong Luo, Yi Shan

Today, convolutional neural networks (CNNs) are widely used in computer vision applications. However, the trends of higher accuracy and higher resolution generate larger networks. The requirements of computation or I/O are the key bottlenecks. In this article, we propose XVDPU: the AI Engine (AIE)-based CNN accelerator on Versal chips to meet heavy computation requirements. To resolve the IO bottleneck

更新日期：2024-03-14

详情收藏

ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA Compilation

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-13
Yuanlong Xiao, Dongjoon Park, Zeyu Jason Niu, Aditya Hota, André Dehon

Partial Reconfiguration (PR) is a key technique in the application design on modern FPGAs. However, current PR tools heavily rely on the developer to manually conduct PR module definition, floorplanning, and flow control at a low level. The existing PR tools do not consider High-Level-Synthesis languages either, which are of great interest to software developers. We propose HiPR, an open-source framework

更新日期：2024-03-13

详情收藏

GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-13
Jonas Dann, Daniel Ritter, Holger Fröning

Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing

更新日期：2024-03-13

详情收藏

The Open-source DeLiBA2 Hardware/Software Framework for Distributed Storage Accelerators

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-13
Babar Khan, Carsten Heinz, Andreas Koch

With the trend towards ever larger “big data” applications, many of the gains achievable by using specialized compute accelerators become diminished due to the growing I/O overheads. While there have been several research efforts into computational storage and FPGA implementations of the NVMe interface, to our knowledge, there have been only very limited efforts to move larger parts of the Linux block

更新日期：2024-03-13

详情收藏

Design, Calibration, and Evaluation of Real-time Waveform Matching on an FPGA-based Digitizer at 10 GS/s

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-13
Jens Trautmann, Paul Krüger, Andreas Becher, Stefan Wildermann, Jürgen Teich

Digitizing side-channel signals at high sampling rates produces huge amounts of data, while side-channel analysis techniques only need those specific trace segments containing Cryptographic Operations (COs). For detecting these segments, waveform-matching techniques have been established comparing the signal with a template of the CO’s characteristic pattern. Real-time waveform matching requires highly

更新日期：2024-03-13

详情收藏

DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLS

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-03-05
Linus Y. Wong, Jialiang Zhang, Jing (Jane) Li

Rapid growth in data size poses significant computational and memory challenges to data processing. FPGA accelerators and near-storage processing have emerged as compelling solutions for tackling the growing computational and memory requirements. Many FPGA-based accelerators have shown to be effective in processing large data sets by leveraging the storage capability of either host-attached or FPGA-attached

更新日期：2024-03-06

详情收藏

ScalaBFS2: A High Performance BFS Accelerator on an HBM-enhanced FPGA Chip

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-29
Kexin Li, Shaoxian Xu, Zhiyuan Shao, Ran Zheng, Xiaofei Liao, Hai Jin

The introduction of High Bandwidth Memory (HBM) to the FPGA chip makes it possible for an FPGA-based accelerator to leverage the huge memory bandwidth of HBM to improve its performance when implementing a specific algorithm, which is especially true for the Breadth-First Search (BFS) algorithm that demands a high bandwidth on accessing the graph data stored in memory. Different from traditional FPGA-DRAM

更新日期：2024-02-29

详情收藏

AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-19
Siva Satyendra Sahoo, Salim Ullah, Akash Kumar

With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML inference

更新日期：2024-02-20

详情收藏

Eciton: Very Low-power Recurrent Neural Network Accelerator for Real-time Inference at the Edge

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-12
Jeffrey Chen, Sang-Woo Jun, Sehwan Hong, Warrick He, Jinyeong Moon

This article presents Eciton, a very low-power recurrent neural network accelerator for time series data within low-power edge sensor nodes, achieving real-time inference with a power consumption of 17 mW under load. Eciton reduces memory and chip resource requirements via 8-bit quantization and hard sigmoid activation, allowing the accelerator as well as the recurrent neural network model parameters

更新日期：2024-02-17

详情收藏

Introduction to the FPL 2021 Special Section

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-12
Diana Göhringer, Georgios Keramidas, Akash Kumar

更新日期：2024-02-16

详情收藏

An Efficient FPGA-based Depthwise Separable Convolutional Neural Network Accelerator with Hardware Pruning

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-12
Zhengyan Liu, Qiang Liu, Shun Yan, Ray C. C. Cheung

Convolutional neural networks (CNNs) have been widely deployed in computer vision tasks. However, the computation and resource intensive characteristics of CNN bring obstacles to its application on embedded systems. This article proposes an efficient inference accelerator on Field Programmable Gate Array (FPGA) for CNNs with depthwise separable convolutions. To improve the accelerator efficiency, we

更新日期：2024-02-16

详情收藏

Evaluating the Impact of Using Multiple-Metal Layers on the Layout Area of Switch Blocks for Tile-Based FPGAs in FinFET 7nm

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-12
Sajjad Rostami Sani, Andy Ye

A new area model for estimating the layout area of switch blocks is introduced in this work. The model is based on a realistic layout strategy. As a result, it not only takes into consideration the active area that is needed to construct a switch block but also the number of metal layers available and the actual dimensions of these metals. The model assigns metal layers to the routing tracks in a way

更新日期：2024-02-16

详情收藏

An All-digital Compute-in-memory FPGA Architecture for Deep Learning Acceleration

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-12
Yonggen Li, Xin Li, Haibin Shen, Jicong Fan, Yanfeng Xu, Kejie Huang

Field Programmable Gate Array (FPGA) is a versatile and programmable hardware platform, which makes it a promising candidate for accelerating Deep Neural Networks (DNNs). However, FPGA’s computing energy efficiency is low due to the domination of energy consumption by interconnect data movement. In this article, we propose an all-digital Compute-in-memory FPGA architecture for deep learning acceleration

更新日期：2024-02-16

详情收藏

Exploring FPGA Switch-Blocks without Explicitly Listing Connectivity Patterns

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-12
Stefan Nikolić, Paolo Ienne

Increased lower metal resistance makes physical aspects of Field-Programmable Gate Array (FPGA) switch-blocks more relevant than before. The need to navigate a design space where each individual switch can have significant impact on the FPGA’s performance in turn makes automated switch-pattern exploration techniques increasingly appealing. However, most existing exploration techniques have a fundamental

更新日期：2024-02-12

详情收藏

High-Efficiency Compressor Trees for Latest AMD FPGAs

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-02-10
Konstantin J. Hoßfeld, Hans Jakob Damsgaard, Jari Nurmi, Michaela Blott, Thomas B. Preußer

High-fan-in dot product computations are ubiquitous in highly relevant application domains, such as signal processing and machine learning. Particularly, the diverse set of data formats used in machine learning poses a challenge for flexible efficient design solutions. Ideally, a dot product summation is composed from a carry-free compressor tree followed by a terminal carry-propagate addition. On

更新日期：2024-02-10

详情收藏

A Hardware Accelerator for the Semi-Global Matching Stereo Algorithm: An Efficient Implementation for the Stratix V and Zynq UltraScale+ FPGA Technology

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
John Kalomiros, John Vourvoulakis, Stavros Vologiannidis

The semi-global matching stereo algorithm is a top performing algorithm in stereo vision. The recursive nature of the computations involved in this algorithm introduces an inherent data dependency problem, hindering the progressive computations of disparities at pixel clock. In this work, a novel hardware implementation of the semi-global matching algorithm is presented. A hardware structure of parallel

更新日期：2024-01-28

详情收藏

Designing Deep Learning Models on FPGA with Multiple Heterogeneous Engines

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Miguel Reis, Mário Véstias, Horácio Neto

Deep learning models are becoming more complex and heterogeneous with new layer types to improve their accuracy. This brings a considerable challenge to the designers of accelerators of deep neural networks. There have been several architectures and design flows to map deep learning models on hardware, but they are limited to a particular model and/or layer types. Also, the architectures generated

更新日期：2024-01-27

详情收藏

A Partitioned CAM Architecture with FPGA Acceleration for Binary Descriptor Matching

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Parastoo Soleimani, David W. Capson, Kin Fun Li

An efficient architecture for image descriptor matching that uses a partitioned content-addressable memory (CAM)-based approach is proposed. CAM is frequently used in high-speed content-matching applications. However, due to its lack of functionality to support approximate matching, conventional CAM is not directly useful for image descriptor matching. Our modifications improve the CAM architecture

更新日期：2024-01-27

详情收藏

Tailor: Altering Skip Connections for Resource-Efficient Inference

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Olivia Weng, Gabriel Marcano, Vladimir Loncar, Alireza Khodamoradi, Abarajithan G, Nojan Sheybani, Andres Meza, Farinaz Koushanfar, Kristof Denolf, Javier Mauricio Duarte, Ryan Kastner

Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this article, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network’s skip

更新日期：2024-01-27

详情收藏

Programmable Analog System Benchmarks Leading to Efficient Analog Computation Synthesis

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Jennifer Hasler, Cong Hao

This effort develops the first rich suite of analog and mixed-signal benchmark of various sizes and domains, intended for use with contemporary analog and mixed-signal designs and synthesis tools. Benchmarking enables analog-digital co-design exploration as well as extensive evaluation of analog synthesis tools and the generated analog/mixed-signal circuit or device. The goals of this effort are defining

更新日期：2024-01-27

详情收藏

FDRA: A Framework for a Dynamically Reconfigurable Accelerator Supporting Multi-Level Parallelism

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Yunhui Qiu, Yiqing Mao, Xuchen Gao, Sichao Chen, Jiangnan Li, Wenbo Yin, Lingli Wang

Coarse-grained reconfigurable architectures (CGRAs) have emerged as promising accelerators due to their high flexibility and energy efficiency. However, existing open source works often lack integration of CGRAs with CPU systems and corresponding toolchains. Moreover, there is rare support for the accelerator instruction pipelining to overlap data communication, computation, and configuration across

更新日期：2024-01-27

详情收藏

Strega: An HTTP Server for FPGAs

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Fabio Maschi, Gustavo Alonso

The computer architecture landscape is being reshaped by the new opportunities, challenges, and constraints brought by the cloud. On the one hand, high-level applications profit from specialised hardware to boost their performance and reduce deployment costs. On the other hand, cloud providers maximise the CPU time allocated to client applications by offloading infrastructure tasks to hardware accelerators

更新日期：2024-01-27

详情收藏

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Rafael Fão De Moura, Luigi Carro

As the massive usage of artificial intelligence techniques spreads in the economy, researchers are exploring new techniques to reduce the energy consumption of Neural Network (NN) applications, especially as the complexity of NNs continues to increase. Using analog Resistive RAM devices to compute matrix-vector multiplication in O(1) time complexity is a promising approach, but it is true that these

更新日期：2024-01-27

详情收藏

Automated Buffer Sizing of Dataflow Applications in a High-level Synthesis Workflow

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Alexandre Honorat, Mickaël Dardaillon, Hugo Miomandre, Jean-François Nezan

High-Level Synthesis (HLS) tools are mature enough to provide efficient code generation for computation kernels on FPGA hardware. For more complex applications, multiple kernels may be connected by a dataflow graph. Although some tools, such as Xilinx Vitis HLS, support dataflow directives, they lack efficient analysis methods to compute the buffer sizes between kernels in a dataflow graph. This article

更新日期：2024-01-27

详情收藏

Montgomery Multiplication Scalable Systolic Designs Optimized for DSP48E2

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-27
Louis Noyez, Nadia El Mrabet, Olivier Potin, Pascal Veron

This article describes an extensive study of the use of DSP48E2 Slices in Ultrascale FPGAs to design hardware versions of the Montgomery Multiplication algorithm for the hardware acceleration of modular multiplications. Our fully scalable systolic architectures result in parallelized, DSP48E2-optimized scheduling of operations analogous to the FIOS block variant of the Montgomery Multiplication. We

更新日期：2024-01-27

详情收藏

High Throughput FPGA-Based Object Detection via Algorithm-Hardware Co-Design

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-15
Anupreetham Anupreetham, Mohamed Ibrahim, Mathew Hall, Andrew Boutros, Ajay Kuzhively, Abinash Mohanty, Eriko Nurvitadhi, Vaughn Betz, Yu Cao, Jae-Sun Seo

Object detection and classification is a key task in many computer vision applications such as smart surveillance and autonomous vehicles. Recent advances in deep learning have significantly improved the quality of results achieved by these systems, making them more accurate and reliable in complex environments. Modern object detection systems make use of lightweight convolutional neural networks (CNNs)

更新日期：2024-01-15

详情收藏

A Hardware Design Framework for Computer Vision Models Based on Reconfigurable Devices

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2024-01-15
Zimeng Fan, Wei Hu, Fang Liu, Dian Xu, Hong Guo, Yanxiang He, Min Peng

In computer vision, the joint development of the algorithm and computing dimensions cannot be separated. Models and algorithms are constantly evolving, while hardware designs must adapt to new or updated algorithms. Reconfigurable devices are recognized as important platforms for computer vision applications because of their reconfigurability. There are two typical design approaches: customized and

更新日期：2024-01-15

详情收藏

CSAIL2019 Crypto-Puzzle Solver Architecture

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-12-29
Sergey Gribok, Bogdan Pasca, Martin Langhammer

The CSAIL2019 time-lock puzzle is an unsolved cryptographic challenge introduced by Ron Rivest in 2019, replacing the solved LCS35 puzzle. Solving these types of puzzles requires large amounts of intrinsically sequential computations, with each iteration performing a very large (3072-bit for CSAIL2019) modular multiplication operation. The complexity of each iteration is several times greater than

更新日期：2023-12-31

详情收藏

AEKA: FPGA Implementation of Area-Efficient Karatsuba Accelerator for Ring-Binary-LWE-based Lightweight PQC

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-12-11
Tianyou Bao, Pengzhou He, Jiafeng Xie, H S. Jacinto

Lightweight PQC-related research and development have gradually gained attention from the research community recently. Ring-Binary-Learning-with-Errors (RBLWE)-based encryption scheme (RBLWE-ENC), a promising lightweight PQC based on small parameter sets to fit related applications (but not in favor of deploying popular fast algorithms like number theoretic transform). To solve this problem, in this

更新日期：2023-12-11

详情收藏

Introduction to the Special Section on FCCM 2022

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-12-05
Jing Li, Martin Herbordt

No abstract available.

更新日期：2023-12-06

详情收藏

CHIP-KNNv2: A Configurable and High-Performance K-Nearest Neighbors Accelerator on HBM-based FPGAs

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-12-05
Kenneth Liu, Alec Lu, Kartik Samtani, Zhenman Fang, Licheng Guo

The k-nearest neighbors (KNN) algorithm is an essential algorithm in many applications, such as similarity search, image classification, and database query. With the rapid growth in the dataset size and the feature dimension of each data point, processing KNN becomes more compute and memory hungry. Most prior studies focus on accelerating the computation of KNN using the abundant parallel resource

更新日期：2023-12-05

详情收藏

TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-12-05
Licheng Guo, Yuze Chi, Jason Lau, Linghao Song, Xingyu Tian, Moazin Khatti, Weikang Qiao, Jie Wang, Ecenur Ustun, Zhenman Fang, Zhiru Zhang, Jason Cong

In this article, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient APIs that allows users to easily express flexible and complex inter-task communication structures. Second, TAPA adopts a coarse-grained floorplanning

更新日期：2023-12-05

详情收藏

High-efficiency TRNG Design Based on Multi-bit Dual-ring Oscillator

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-12-05
Yingchun Lu, Yun Yang, Rong Hu, Huaguo Liang, Maoxiang Yi, Huang Zhengfeng, Yuanming Ma, Tian Chen, Liang Yao

Unpredictable true random numbers are required in security technology fields such as information encryption, key generation, mask generation for anti-side-channel analysis, algorithm initialization, and so on. At present, the true random number generator (TRNG) is not enough to provide fast random bits by low-speed bits generation. Therefore, it is necessary to design a faster TRNG. This work presents

更新日期：2023-12-05

详情收藏

Covert-channels in FPGA-enabled SmartSSDs

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-12-04
Theodoros Trochatos, Anthony Etim, Jakub Szefer

Cloud computing providers today offer access to a variety of devices, which users can rent and access remotely in a shared setting. Among these devices are SmartSSDs, which a solid-state disks (SSD) augmented with an FPGA, enabling users to instantiate custom circuits within the FPGA, including potentially malicious circuits for power and temperature measurement. Normally, cloud users have no remote

更新日期：2023-12-04

详情收藏

Across Time and Space: Senju’s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-11-29
Emanuele Del Sozzo, Davide Conficconi, Kentaro Sano

Stencil-based applications play an essential role in high-performance systems as they occur in numerous computational areas, such as partial differential equation solving. In this context, Iterative Stencil Loops (ISLs) represent a prominent and well-known algorithmic class within the stencil domain. Specifically, ISL-based calculations iteratively apply the same stencil to a multi-dimensional point

更新日期：2023-11-29

详情收藏

On the Malicious Potential of Xilinx’ Internal Configuration Access Port (ICAP)

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-11-17
Nils Albartus, Maik Ender, Jan-Niklas Möller, Marc Fyrbiak, Christof Paar, Russell Tessier

FPGAs have become increasingly popular in computing platforms. With recent advances in bitstream format reverse engineering, the scientific community has widely explored static FPGA security threats. For example, it is now possible to convert a bitstream to a netlist, revealing design information, and apply modifications to the static bitstream based on this knowledge. However, a systematic study of

更新日期：2023-11-17

详情收藏

HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-11-07
Geng Yang, Jie Lei, Zhenman Fang, Yunsong li, Jiaqing Zhang, Weiying Xie

Binary neural network (BNN), where both the weight and the activation values are represented with one bit, provides an attractive alternative to deploy highly efficient deep learning inference on resource-constrained edge devices. However, our investigation reveals that, to achieve satisfactory accuracy gains, state-of-the-art (SOTA) BNNs, such as FracBNN and ReActNet, usually have to incorporate various

更新日期：2023-11-07

详情收藏

Constraint-Aware Multi-Technique Approximate High-Level Synthesis for FPGAs

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-10-09
Marcos T. Leipnitz, Gabriel L. Nazar

Numerous approximate computing (AC) techniques have been developed to reduce the design costs in error-resilient application domains, such as signal and multimedia processing, data mining, machine learning, and computer vision, to trade-off computation accuracy with area and power savings or performance improvements. Selecting adequate techniques for each application and optimization target is complex

更新日期：2023-10-11

详情收藏

FPGA-based Deep Learning Inference Accelerators: Where Are We Standing?

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-10-09
Anouar Nechi, Lukas Groth, Saleh Mulhem, Farhad Merchant, Rainer Buchty, Mladen Berekovic

Recently, artificial intelligence applications have become part of almost all emerging technologies around us. Neural networks, in particular, have shown significant advantages and have been widely adopted over other approaches in machine learning. In this context, high processing power is deemed a fundamental challenge and a persistent requirement. Recent solutions facing such a challenge deploy hardware

更新日期：2023-10-09

详情收藏

RapidStream 2.0: Automated Parallel Implementation of Latency–Insensitive FPGA Designs Through Partial Reconfiguration

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-09-01
Licheng Guo, Pongstorn Maidee, Yun Zhou, Chris Lavin, Eddie Hung, Wuxi Li, Jason Lau, Weikang Qiao, Yuze Chi, Linghao Song, Yuanlong Xiao, Alireza Kaviani, Zhiru Zhang, Jason Cong

Field-programmable gate arrays (FPGAs) require a much longer compilation cycle than conventional computing platforms such as CPUs. In this article, we shorten the overall compilation time by co-optimizing the HLS compilation (C-to-RTL) and the back-end physical implementation (RTL-to-bitstream). We propose a split compilation approach based on the pipelining flexibility at the HLS level, which allows

更新日期：2023-09-01

详情收藏

Topgun: An ECC Accelerator for Private Set Intersection

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-09-01
Guiming Wu, Qianwen He, Jiali Jiang, Zhenxiang Zhang, Yuan Zhao, Yinchao Zou, Jie Zhang, Changzheng Wei, Ying Yan, Hui Zhang

Elliptic Curve Cryptography (ECC), one of the most widely used asymmetric cryptographic algorithms, has been deployed in Transport Layer Security (TLS) protocol, blockchain, secure multiparty computation, and so on. As one of the most secure ECC curves, Curve25519 is employed by some secure protocols, such as TLS 1.3 and Diffie-Hellman Private Set Intersection (DH-PSI) protocol. High-performance implementation

更新日期：2023-09-01

详情收藏

An FPGA Accelerator for Genome Variant Calling

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-09-01
Tiancheng Xu, Scott Rixner, Alan L. Cox

In genome analysis, it is often important to identify variants from a reference genome. However, identifying variants that occur with low frequency can be challenging, as it is computationally intensive to do so accurately. LoFreq is a widely used program that is adept at identifying low-frequency variants. This article presents a design framework for an FPGA-based accelerator for LoFreq. In particular

更新日期：2023-09-01

详情收藏

Resource Sharing in Dataflow Circuits

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-09-01
Lana Josipović, Axel Marmet, Andrea Guerrieri, Paolo Ienne

To achieve resource-efficient hardware designs, high-level synthesis (HLS) tools share (i.e., time-multiplex) functional units among operations of the same type. This optimization is typically performed in conjunction with operation scheduling to ensure the best possible unit usage at each point in time. Dataflow circuits have emerged as an alternative HLS approach to efficiently handle irregular and

更新日期：2023-09-01

详情收藏

Parallelising Control Flow in Dynamic-scheduling High-level Synthesis

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-09-01
Jianyi Cheng, Lana Josipović, John Wickerson, George A. Constantinides

Recently, there is a trend to use high-level synthesis (HLS) tools to generate dynamically scheduled hardware. The generated hardware is made up of components connected using handshake signals. These handshake signals schedule the components at runtime when inputs become available. Such approaches promise superior performance on “irregular” source programs, such as those whose control flow depends

更新日期：2023-09-01

详情收藏

Logic Shrinkage: Learned Connectivity Sparsification for LUT-Based Neural Networks

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-09-01
Erwei Wang, Marie Auffret, Georgios-Ilias Stavrou, Peter Y. K. Cheung, George A. Constantinides, Mohamed S. Abdelfattah, James J. Davis

Field-programmable gate array (FPGA)–specific deep neural network (DNN) architectures using native lookup tables (LUTs) as independently trainable inference operators have been shown to achieve favorable area-accuracy and energy-accuracy trade-offs. The first work in this area, LUTNet, exhibited state-of-the-art performance for standard DNN benchmarks. In this article, we propose the learned optimization

更新日期：2023-09-01

详情收藏

A Reconfigurable Architecture for Real-time Event-based Multi-Object Tracking

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-09-01
Yizhao Gao, Song Wang, Hayden Kwok-Hay So

Although advances in event-based machine vision algorithms have demonstrated unparalleled capabilities in performing some of the most demanding tasks, their implementations under stringent real-time and power constraints in edge systems remain a major challenge. In this work, a reconfigurable hardware-software architecture called REMOT, which performs real-time event-based multi-object tracking on

更新日期：2023-09-01

详情收藏

Increasing the Robustness of TERO-TRNGs Against Process Variation

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-07-27
Christian Skubich, Peter Reichel, Marc Reichenbach

The transition effect ring oscillator is a popular design for building entropy sources because it is compact, built from digital elements only, and is very well suited for FPGAs. However, it is known to be quite sensitive to process variation. Although the latter is useful for building physical unclonable functions, it is interfering with the application as an entropy source. In this article, we investigate

更新日期：2023-07-27

详情收藏

BLOOP: Boolean Satisfiability-based Optimized Loop Pipelining

ACM Trans. Reconfig. Technol. Syst. (IF 2.3) Pub Date : 2023-07-27
Nicolai Fiege, Peter Zipf

Modulo scheduling is the premier technique for throughput maximization of loops in high-level synthesis by interleaving consecutive loop iterations. The number of clock cycles between data insertions is called the initiation interval (II). For throughput maximization, this value should be as low as possible; therefore, its minimization is the main optimization goal. Despite its long historical existence

更新日期：2023-07-27

详情收藏