research-article

Effective Detection of Sleep-in-atomic-context Bugs in the Linux Kernel

Authors:
Jia-Ju Bai

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Julia Lawall

Sorbonne University/Inria/LIP6, Paris, France

Sorbonne University/Inria/LIP6, Paris, France

0000-0001-7507-6542
View Profile

,
Shi-Min Hu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China

0000-0001-7507-6542
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 36 Issue 4Article No.: 10pp 1–30https://doi.org/10.1145/3381990

Published:17 April 2020Publication History

ACM Transactions on Computer Systems

Abstract

Atomic context is an execution state of the Linux kernel in which kernel code monopolizes a CPU core. In this state, the Linux kernel may only perform operations that cannot sleep, as otherwise a system hang or crash may occur. We refer to this kind of concurrency bug as a sleep-in-atomic-context (SAC) bug. In practice, SAC bugs are hard to find, as they do not cause problems in all executions.

In this article, we propose a practical static approach named DSAC to effectively detect SAC bugs in the Linux kernel. DSAC uses three key techniques: (1) a summary-based analysis to identify the code that may be executed in atomic context, (2) a connection-based alias analysis to identify the set of functions referenced by a function pointer, and (3) a path-check method to filter out repeated reports and false bugs. We evaluate DSAC on Linux 4.17 and find 1,159 SAC bugs. We manually check all the bugs and find that 1,068 bugs are real. We have randomly selected 300 of the real bugs and sent them to kernel developers. 220 of these bugs have been confirmed, and 51 of our patches fixing 115 bugs have been applied.

References

Allocation 2018. Linux kernel documentation for memory allocation. Retrieved from https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html.Google Scholar
Sidney Amani, Peter Chubb, Alastair F. Donaldson, Alexander Legg, Keng Chai Ong, Leonid Ryzhyk, and Yanjin Zhu. 2014. Automatic verification of active device drivers. ACM SIGOPS Op. Syst. Rev. 48, 1 (2014), 106--118.Google ScholarDigital Library
Lars Ole Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. University of Cophenhagen.Google Scholar
Zachary R. Anderson, Eric A. Brewer, Jeremy Condit, Robert Ennals, David Gay, Matthew Harren, George C. Necula, and Feng Zhou. 2007. Beyond bug-finding: Sound program analysis for Linux. In Proceedings of the 11th International Workshop on Hot Topics in Operating Systems (HotOS’07). 1--6.Google Scholar
Jia-Ju Bai, Julia Lawall, Wende Tan, and Shi-Min Hu. 2019. DCNS: Automated detection of conservative non-sleep defects in the Linux kernel. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). 287--299.Google ScholarDigital Library
Jia-Ju Bai, Hu-Qiu Liu, Yu-Ping Wang Wang, and Hu Shi-Min. 2014. aComment: Mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 21st Asia-Pacific Software Engineering Conference (APSEC’14). 407--414.Google Scholar
Jia-Ju Bai, Yu-Ping Wang, Julia Lawall, and Shi-Min Hu. 2018. DSAC: Effective static analysis of sleep-in-atomic-context bugs in kernel modules. In Proceedings of the USENIX ATC Conference (USENIX ATC’18). 587--600.Google Scholar
Jia-Ju Bai, Yu-Ping Wang, Hu-Qiu Liu, and Shi-Min Hu. 2016. Mining and checking paired functions in device drivers using characteristic fault injection. Inf. Softw. Technol. 73 (2016), 122--133.Google ScholarDigital Library
Arati Baliga, Vinod Ganapathy, and Liviu Iftode. 2011. Detecting kernel-level rootkits using data structure invariants. IEEE Trans. Depend. Sec. Comput. 8, 5 (2011), 670--684.Google ScholarDigital Library
Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner. 2006. Thorough static analysis of device drivers. In Proceedings of the 1st European Conference on Computer Systems (EuroSys’06). 73--85.Google ScholarDigital Library
BlockLock 2014. Website for “Faults in Linux: ten years later.” Retrieved from http://faultlinux.lip6.fr/.Google Scholar
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th International Conference on Operating Systems Design and Implementation (OSDI’08). 209--224.Google Scholar
Yan Cai, Jian Zhang, Lingwei Cao, and Jian Liu. 2016. A deployable sampling strategy for data race detection. In Proceedings of the 24th International Symposium on Foundations of Software Engineering (FSE’16). 810--821.Google ScholarDigital Library
Lee Chew and David Lie. 2010. Kivati: Fast detection and prevention of atomicity violations. In Proceedings of the 5th European Conference on Computer Systems (EuroSys’10). 307--320.Google ScholarDigital Library
Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. 2001. An empirical study of operating systems errors. In Proceedings of the 18th International Symposium on Operating Systems Principles (SOSP’01). 73--88.Google ScholarDigital Library
Clang 2018. Clang compiler. Retrieved from http://clang.llvm.org/.Google Scholar
CLOC 2018. CLOC: counting lines of code. Retrieved from https://github.com/AlDanial/cloc.Google Scholar
Jonathan Corbet. 2008. Atomic context and kernel API design. Retrieved from https://lwn.net/Articles/274695/.Google Scholar
Domenico Cotroneo, Roberto Natella, and Stefano Russo. 2009. Assessment and improvement of hang detection in the Linux operating system. In Proceedings of the 28th International Symposium on Reliable Distributed Systems (SRDS’09). 288--294.Google ScholarDigital Library
Pantazis Deligiannis, Alastair F. Donaldson, and Zvonimir Rakamaric. 2015. Fast and precise symbolic analysis of concurrency bugs in device drivers. In Proceedings of the 30th International Conference on Automated Software Engineering (ASE’15). 166--177.Google ScholarDigital Library
Jyotirmoy Deshmukh, E. Allen Emerson, and Sriram Sankaranarayanan. 2009. Symbolic deadlock analysis in concurrent libraries and their clients. In Proceedings of the 24th International Conference on Automated Software Engineering (ASE’09). 480--491.Google ScholarDigital Library
Maryam Emami, Rakesh Ghiya, and Laurie J. Hendren. 1994. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’94). 242--256.Google Scholar
Dawson Engler and Ken Ashcraft. 2003. RacerX: Effective, static detection of race conditions and deadlocks. In Proceedings of the 19th International Symposium on Operating Systems Principles (SOSP’03). 237--252.Google ScholarDigital Library
Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. 2000. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of the 4th International Conference on Operating Systems Design and Implementation (OSDI’00). 1--16.Google ScholarCross Ref
John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, and Kirk Olynyk. 2010. Effective data-race detection for the kernel. In Proceedings of the 9th International Conference on Operating Systems Design and Implementation (OSDI’10). 151--162.Google ScholarDigital Library
Pedro Fonseca, Cheng Li, and Rodrigo Rodrigues. 2011. Finding complex concurrency bugs in large multi-threaded applications. In Proceedings of the 6th European Conference on Computer Systems (EuroSys’11). 215--228.Google ScholarDigital Library
Pedro Fonseca, Rodrigo Rodrigues, and Björn B. Brandenburg. 2014. SKI: Exposing kernel concurrency bugs through systematic schedule exploration. In Proceedings of the 11th International Conference on Operating Systems Design and Implementation (OSDI’14). 415--431.Google Scholar
Vinod Ganapathy, Matthew J. Renzelmann, Arini Balakrishnan, Michael M. Swift, and Somesh Jha. 2008. The design and implementation of microdrivers. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’08). 168--178.Google ScholarDigital Library
Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the 9th International Symposium on Code Generation and Optimization (CGO’11). 289--298.Google ScholarCross Ref
Nevin Heintze and Olivier Tardieu. 2001. Ultra-fast aliasing analysis using CLA: A million lines of C code in a second. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’01). 254--263.Google ScholarDigital Library
Christopher LaRosa, Li Xiong, and Ken Mandelberg. 2008. Frequent pattern mining for kernel trace data. In Proceedings of the ACM Symposium on Applied Computing. 880--885.Google ScholarDigital Library
Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’07). 278--289.Google ScholarDigital Library
Julia L. Lawall, Julien Brunel, Nicolas Palix, René Rydhof Hansen, Henrik Stuart, and Gilles Muller. 2009. WYSIWIB: A declarative approach to finding API protocols and bugs in Linux code. In Proceedings of the 39th International Conference on Dependable Systems and Networks (DSN’09). 43--52.Google ScholarCross Ref
Ben Leslie, Peter Chubb, Nicholas Fitzroy-Dale, Stefan Götz, Charles Gray, Luke Macpherson, Daniel Potts, Yue-Ting Shen, Kevin Elphinstone, and Gernot Heiser. 2005. User-level device drivers: Achieved performance. J. Comput. Sci. Technol. 20, 5 (2005), 654--664.Google ScholarCross Ref
Qiwei Li, Yanyan Jiang, Tianxiao Gu, Chang Xu, Jun Ma, Xiaoxing Ma, and Jian Lu. 2016. Effectively manifesting concurrency bugs in Android apps. In Proceedings of the 23rd Asia-Pacific Software Engineering Conference (APSEC’16). 209--216.Google ScholarCross Ref
Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 13th International Symposium on Foundations of Software Engineering (FSE’05). 306--315.Google ScholarDigital Library
Haopeng Liu, Guangpu Li, Jeffrey F. Lukman, Jiaxin Li, Shan Lu, Haryadi S. Gunawi, and Chen Tian. 2017. DCatch: Automatically detecting distributed concurrency bugs in cloud systems. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). 677--691.Google ScholarDigital Library
Hu-Qiu Liu, Yu-Ping Wang, Jia-Ju Bai, and Shi-Min Hu. 2016. PF-Miner: A practical paired functions mining method for Android kernel in error paths. J. Syst. Softw. 121 (2016), 234--246.Google ScholarDigital Library
LLVM 2018. LLVM compiler infrastructure. Retrieved from https://llvm.org/.Google Scholar
Junjie Mao, Yu Chen, Qixue Xiao, and Yuanchun Shi. 2016. RID: Finding reference count bugs with inconsistent path pair checking. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). 531--544.Google ScholarDigital Library
Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2004. Precise call graphs for C programs with function pointers. Autom. Softw. Eng. 11, 1 (2004), 7--26.Google ScholarDigital Library
Changwoo Min, Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, and Taesoo Kim. 2015. Cross-checking semantic correctness: The case of finding file system bugs. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). 361--377.Google ScholarDigital Library
MySQL 2018. MYSQL database. Retrieved from https://www.mysql.com/.Google Scholar
Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective static race detection for Java. In Proceedings of the 27th International Conference on Programming Language Design and Implementation (PLDI’06). 308--319.Google ScholarDigital Library
Yoann Padioleau, Julia Lawall, René Rydhof Hansen, and Gilles Muller. 2008. Documenting and automating collateral evolutions in Linux device drivers. In Proceedings of the 3rd European Conference on Computer Systems (EuroSys’08). 247--260.Google ScholarDigital Library
Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Gilles Muller, and Julia Lawall. 2014. Faults in Linux 2.6. ACM Trans. Comput. Syst. 32, 2 (2014), 4:1--4:40.Google ScholarDigital Library
Matthew J. Renzelmann and Michael M. Swift. 2009. Decaf: Moving device drivers to a modern language. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’09). 1--14.Google Scholar
Leonid Ryzhyk, Yanjin Zhu, and Gernot Heiser. 2010. The case for active device drivers. In Proceedings of the 1st Aisa-Pacific Workshop on Systems (APSys’10). 25--30.Google ScholarDigital Library
Suman Saha, Jean-Pierre Lozi, Gaël Thomas, Julia L. Lawall, and Gilles Muller. 2013. Hector: Detecting resource-release omission faults in error-handling code for systems software. In Proceedings of the 43rd International Conference on Dependable Systems and Networks (DSN’13). 1--12.Google ScholarDigital Library
Anirudh Santhiar and Aditya Kanade. 2017. Static deadlock detection for asynchronous C# programs. In Proceedings of the 38th International Conference on Programming Language Design and Implementation (PLDI’17). 292--305.Google ScholarDigital Library
Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd International Symposium on Principles of Programming Languages (POPL’96). 32--41.Google ScholarDigital Library
Michael M. Swift, Brian N. Bershad, and Henry M. Levy. 2003. Improving the reliability of commodity operating systems. In Proceedings of the 19th International Symposium on Operating Systems Principles (SOSP’03). 207--222.Google Scholar
Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: Mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). 11--20.Google ScholarDigital Library
Vesal Vojdani, Kalmer Apinis, Vootele Rõtov, Helmut Seidl, Varmo Vene, and Ralf Vogler. 2016. Static race detection for device drivers: The Goblint approach. In Proceedings of the 31st International Conference on Automated Software Engineering (ASE’16). 391--402.Google ScholarDigital Library
Dasarath Weeratunge, Xiangyu Zhang, William N. Sumner, and Suresh Jagannathan. 2010. Analyzing concurrency bugs using dual slicing. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA’10). 253--264.Google ScholarDigital Library
Amy Williams, William Thies, and Michael D. Ernst. 2005. Static deadlock detection for Java libraries. In Proceedings of the 19th European Conference on Object-Oriented Programming (ECOOP’05). 602--629.Google Scholar
Thomas Witkowski, Nicolas Blanc, Daniel Kroening, and Georg Weissenbacher. 2007. Model checking concurrent Linux device drivers. In Proceedings of the 22nd International Conference on Automated Software Engineering (ASE’07). 501--504.Google ScholarDigital Library
Jinlin Yang, David Evans, Deepali Bhardwaj, Thirumalesh Bhat, and Manuvir Das. 2006. Perracotta: Mining temporal API rules from imperfect traces. In Proceedings of the 28th International Conference on Software Engineering (ICSE’06). 282--291.Google ScholarDigital Library
Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISan: Sanitizing API usages through semantic cross-checking. In Proceedings of the USENIX Security Symposium. 363--378.Google Scholar
Yian Zhu, Yue Li, Jingling Xue, Tian Tan, Jialong Shi, Yang Shen, and Chunyan Ma. 2012. What is system hang and how to handle it. In Proceedings of the 23rd International Symposium on Software Reliability Engineering (ISSRE’12). 141--150.Google ScholarDigital Library

Index Terms

Effective Detection of Sleep-in-atomic-context Bugs in the Linux Kernel
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Reliability
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Recommendations

Path-sensitive and alias-aware typestate analysis for detecting OS bugs
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Operating system (OS) is the cornerstone for modern computer systems. It manages devices and provides fundamental service for user-level applications. Thus, detecting bugs in OSes is important to improve reliability and security of computer systems. ...
Read More
Detect Related Bugs from Source Code Using Bug Information
COMPSAC '10: Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications Conference

Open source projects often maintain open bug repositories during development and maintenance, and the reporters often point out straightly or implicitly the reasons why bugs occur when they submit them. The comments about a bug are very valuable for ...
Read More
Scalable and systematic detection of buggy inconsistencies in source code
OOPSLA '10

Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 36, Issue 4
Section: Best of ATC 2019 and Regular Paper
November 2018
115 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/3394910
Editor:
Michael Swift
University of Wisconsin, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2020
- Accepted: 1 December 2019
- Revised: 1 September 2019
- Received: 1 October 2018
Published in tocs Volume 36, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Static analysis
atomic context
bug detection
operating system
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 1,122
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Effective Detection of Sleep-in-atomic-context Bugs in the Linux Kernel

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Path-sensitive and alias-aware typestate analysis for detecting OS bugs

Detect Related Bugs from Source Code Using Bug Information

Scalable and systematic detection of buggy inconsistencies in source code

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Effective Detection of Sleep-in-atomic-context Bugs in the Linux Kernel

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Path-sensitive and alias-aware typestate analysis for detecting OS bugs

Detect Related Bugs from Source Code Using Bug Information

Scalable and systematic detection of buggy inconsistencies in source code

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media