Systemizing Interprocedural Static Analysis of Large-scale Systems Code with Graspan,ACM Transactions on Computer Systems

当前位置： X-MOL 学术 › ACM Trans. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Systemizing Interprocedural Static Analysis of Large-scale Systems Code with Graspan
ACM Transactions on Computer Systems ( IF 1.5 ) Pub Date : 2021-07-29 , DOI: 10.1145/3466820
Zhiqiang Zuo ₁ , Kai Wang ₂ , Aftab Hussain ₃ , Ardalan Amiri Sani ₃ , Yiyu Zhang ₁ , Shenming Lu ₁ , Wensheng Dou ₄ , Linzhang Wang ₅ , Xuandong Li ₅ , Chenxi Wang ₂ , Guoqing Harry Xu ₂

Affiliation

There is more than a decade-long history of using static analysis to find bugs in systems such as Linux. Most of the existing static analyses developed for these systems are simple checkers that find bugs based on pattern matching. Despite the presence of many sophisticated interprocedural analyses, few of them have been employed to improve checkers for systems code due to their complex implementations and poor scalability. In this article, we revisit the scalability problem of interprocedural static analysis from a “Big Data” perspective. That is, we turn sophisticated code analysis into Big Data analytics and leverage novel data processing techniques to solve this traditional programming language problem. We propose Graspan , a disk-based parallel graph system that uses an edge-pair centric computation model to compute dynamic transitive closures on very large program graphs. We develop two backends for Graspan, namely, Graspan-C running on CPUs and Graspan-G on GPUs, and present their designs in the article. Graspan-C can analyze large-scale systems code on any commodity PC, while, if GPUs are available, Graspan-G can be readily used to achieve orders of magnitude speedup by harnessing a GPU’s massive parallelism. We have implemented fully context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases written in multiple languages such as Linux and Apache Hadoop demonstrates that their Graspan implementations are language-independent, scale to millions of lines of code, and are much simpler than their original implementations. Moreover, we show that these analyses can be used to uncover many real-world bugs in large-scale systems code.

中文翻译：

使用 Graspan 系统化大规模系统代码的过程间静态分析

使用静态分析来查找 Linux 等系统中的错误已有十多年的历史。为这些系统开发的大多数现有静态分析都是基于模式匹配发现错误的简单检查器。尽管存在许多复杂的过程间分析，但由于其复杂的实现和较差的可扩展性，很少有人使用它们来改进系统代码的检查器。在本文中，我们从“大数据”的角度重新审视了过程间静态分析的可扩展性问题。也就是说，我们将复杂的代码分析变成大数据分析并利用新颖的数据处理技术来解决这个传统的编程语言问题。我们建议格拉斯潘，一个基于磁盘的并行图系统，它使用边对计算中心计算模型动态传递闭包在非常大的程序图上。我们为 Graspan 开发了两个后端，即，格拉斯潘-C在 CPU 和格拉斯潘-G在 GPU 上，并在文章中介绍他们的设计。Graspan-C 可以在任何商用 PC 上分析大规模系统代码，而如果 GPU 可用，则 Graspan-G 可以通过利用 GPU 的大规模并行性轻松实现数量级的加速。我们已经实施完全上下文相关Graspan 上的指针/别名和数据流分析。对使用多种语言（如 Linux 和 Apache Hadoop）编写的大型代码库的这些分析评估表明，它们的 Graspan 实现与语言无关，可扩展到数百万行代码，并且比其原始实现简单得多。此外，我们表明这些分析可用于发现大规模系统代码中的许多实际错误。

更新日期：2021-07-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>