An automated framework for selectively tolerating SDC errors based on rigorous instruction-level vulnerability assessment,Future Generation Computer Systems

当前位置： X-MOL 学术 › Future Gener. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An automated framework for selectively tolerating SDC errors based on rigorous instruction-level vulnerability assessment
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2024-04-06 , DOI: 10.1016/j.future.2024.04.006
Hussien Al-haj Ahmad , Yasser Sedaghat

The recent trend in most processor manufacturing technologies has significantly increased the vulnerability of embedded systems operating in harsh environments against soft errors. These errors can cause Silent Data Corruptions (SDCs) that produce erroneous execution results silently, disturbing the system’s execution and potentially leading to severe financial, human or environmental disasters. The use of fault tolerance techniques that take into account the performance and constraints of safety-critical systems is therefore essential to improve system reliability efficiently. Given the significant overhead imposed by conventional techniques, e.g., performance loss, increased memory usage, and additional hardware costs, researchers have developed cost-effective software-based techniques for fault tolerance. However, as detection rates grow, these techniques can increase code size and execution time significantly, which creates a challenge. This paper proposes an automated framework for selective fault tolerance of SDCs in software running on different architectures. The framework comprises a sequence of several consecutive techniques executed automatically. It offers a software-based technique that operates at the microarchitecture level and evaluates the vulnerability of program instructions against SDC errors. The framework conducts vulnerability assessment at the binary code level using a non-intrusive, runtime fault injection mechanism. It can inject faults at different granularity levels to maximize fault activation, including fine-grained injection at specific instruction fields or encoding bits, and coarse-grained injection into the entire software system. The framework makes minor modifications to the software being tested, enabling it to run at near-native speed. When SDC vulnerable instructions are identified, the framework selectively protects them automatically using a compiler extension, achieving a more appropriate trade-off between SDC detection and overhead by avoiding overprotection. Our framework was evaluated by conducting a large number of fault injection-based experiments on real-world benchmark programs using the cycle-accurate Gem5 simulator. Leveraging the accurate vulnerability assessment results provided by our framework, the proposed selective technique reduces SDC errors by up to 99% by selectively protecting only 45% of the program’s static instructions, with a performance overhead ranging from 8% to 35%.

中文翻译：

一种基于严格指令级漏洞评估选择性容忍 SDC 错误的自动化框架

大多数处理器制造技术的最新趋势显着增加了在恶劣环境中运行的嵌入式系统针对软错误的脆弱性。这些错误可能会导致静默数据损坏 (SDC)，静默地产生错误的执行结果，扰乱系统的执行，并可能导致严重的财务、人员或环境灾难。因此，使用考虑安全关键系统的性能和约束的容错技术对于有效提高系统可靠性至关重要。考虑到传统技术带来的显着开销，例如性能损失、增加的内存使用和额外的硬件成本，研究人员开发了具有成本效益的基于软件的容错技术。然而，随着检测率的增长，这些技术可能会显着增加代码大小和执行时间，这带来了挑战。本文提出了一种自动化框架，用于在不同架构上运行的软件中对 SDC 进行选择性容错。该框架包含一系列自动执行的连续技术。它提供了一种基于软件的技术，在微架构级别运行并评估程序指令针对 SDC 错误的脆弱性。该框架使用非侵入式运行时故障注入机制在二进制代码级别进行漏洞评估。它可以在不同粒度级别注入故障，以最大限度地激活故障，包括针对特定指令字段或编码位的细粒度注入，以及对整个软件系统的粗粒度注入。该框架对正在测试的软件进行了微小的修改，使其能够以接近原生的速度运行。当识别出 SDC 易受攻击的指令时，框架会使用编译器扩展有选择地自动保护它们，通过避免过度保护，在 SDC 检测和开销之间实现更适当的权衡。我们的框架是通过使用周期精确的 Gem5 模拟器对真实世界的基准程序进行大量基于故障注入的实验来评估的。利用我们的框架提供的准确的漏洞评估结果，所提出的选择性技术通过选择性地仅保护程序的 45% 的静态指令，将 SDC 错误减少高达 99%，性能开销在 8% 到 35% 之间。

更新日期：2024-04-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>