skip to main content
research-article

Effective Detection of Sleep-in-atomic-context Bugs in the Linux Kernel

Published:17 April 2020Publication History
Skip Abstract Section

Abstract

Atomic context is an execution state of the Linux kernel in which kernel code monopolizes a CPU core. In this state, the Linux kernel may only perform operations that cannot sleep, as otherwise a system hang or crash may occur. We refer to this kind of concurrency bug as a sleep-in-atomic-context (SAC) bug. In practice, SAC bugs are hard to find, as they do not cause problems in all executions.

In this article, we propose a practical static approach named DSAC to effectively detect SAC bugs in the Linux kernel. DSAC uses three key techniques: (1) a summary-based analysis to identify the code that may be executed in atomic context, (2) a connection-based alias analysis to identify the set of functions referenced by a function pointer, and (3) a path-check method to filter out repeated reports and false bugs. We evaluate DSAC on Linux 4.17 and find 1,159 SAC bugs. We manually check all the bugs and find that 1,068 bugs are real. We have randomly selected 300 of the real bugs and sent them to kernel developers. 220 of these bugs have been confirmed, and 51 of our patches fixing 115 bugs have been applied.

References

  1. Allocation 2018. Linux kernel documentation for memory allocation. Retrieved from https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html.Google ScholarGoogle Scholar
  2. Sidney Amani, Peter Chubb, Alastair F. Donaldson, Alexander Legg, Keng Chai Ong, Leonid Ryzhyk, and Yanjin Zhu. 2014. Automatic verification of active device drivers. ACM SIGOPS Op. Syst. Rev. 48, 1 (2014), 106--118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lars Ole Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. University of Cophenhagen.Google ScholarGoogle Scholar
  4. Zachary R. Anderson, Eric A. Brewer, Jeremy Condit, Robert Ennals, David Gay, Matthew Harren, George C. Necula, and Feng Zhou. 2007. Beyond bug-finding: Sound program analysis for Linux. In Proceedings of the 11th International Workshop on Hot Topics in Operating Systems (HotOS’07). 1--6.Google ScholarGoogle Scholar
  5. Jia-Ju Bai, Julia Lawall, Wende Tan, and Shi-Min Hu. 2019. DCNS: Automated detection of conservative non-sleep defects in the Linux kernel. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’19). 287--299.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jia-Ju Bai, Hu-Qiu Liu, Yu-Ping Wang Wang, and Hu Shi-Min. 2014. aComment: Mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 21st Asia-Pacific Software Engineering Conference (APSEC’14). 407--414.Google ScholarGoogle Scholar
  7. Jia-Ju Bai, Yu-Ping Wang, Julia Lawall, and Shi-Min Hu. 2018. DSAC: Effective static analysis of sleep-in-atomic-context bugs in kernel modules. In Proceedings of the USENIX ATC Conference (USENIX ATC’18). 587--600.Google ScholarGoogle Scholar
  8. Jia-Ju Bai, Yu-Ping Wang, Hu-Qiu Liu, and Shi-Min Hu. 2016. Mining and checking paired functions in device drivers using characteristic fault injection. Inf. Softw. Technol. 73 (2016), 122--133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Arati Baliga, Vinod Ganapathy, and Liviu Iftode. 2011. Detecting kernel-level rootkits using data structure invariants. IEEE Trans. Depend. Sec. Comput. 8, 5 (2011), 670--684.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner. 2006. Thorough static analysis of device drivers. In Proceedings of the 1st European Conference on Computer Systems (EuroSys’06). 73--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. BlockLock 2014. Website for “Faults in Linux: ten years later.” Retrieved from http://faultlinux.lip6.fr/.Google ScholarGoogle Scholar
  12. Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th International Conference on Operating Systems Design and Implementation (OSDI’08). 209--224.Google ScholarGoogle Scholar
  13. Yan Cai, Jian Zhang, Lingwei Cao, and Jian Liu. 2016. A deployable sampling strategy for data race detection. In Proceedings of the 24th International Symposium on Foundations of Software Engineering (FSE’16). 810--821.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lee Chew and David Lie. 2010. Kivati: Fast detection and prevention of atomicity violations. In Proceedings of the 5th European Conference on Computer Systems (EuroSys’10). 307--320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. 2001. An empirical study of operating systems errors. In Proceedings of the 18th International Symposium on Operating Systems Principles (SOSP’01). 73--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Clang 2018. Clang compiler. Retrieved from http://clang.llvm.org/.Google ScholarGoogle Scholar
  17. CLOC 2018. CLOC: counting lines of code. Retrieved from https://github.com/AlDanial/cloc.Google ScholarGoogle Scholar
  18. Jonathan Corbet. 2008. Atomic context and kernel API design. Retrieved from https://lwn.net/Articles/274695/.Google ScholarGoogle Scholar
  19. Domenico Cotroneo, Roberto Natella, and Stefano Russo. 2009. Assessment and improvement of hang detection in the Linux operating system. In Proceedings of the 28th International Symposium on Reliable Distributed Systems (SRDS’09). 288--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pantazis Deligiannis, Alastair F. Donaldson, and Zvonimir Rakamaric. 2015. Fast and precise symbolic analysis of concurrency bugs in device drivers. In Proceedings of the 30th International Conference on Automated Software Engineering (ASE’15). 166--177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jyotirmoy Deshmukh, E. Allen Emerson, and Sriram Sankaranarayanan. 2009. Symbolic deadlock analysis in concurrent libraries and their clients. In Proceedings of the 24th International Conference on Automated Software Engineering (ASE’09). 480--491.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Maryam Emami, Rakesh Ghiya, and Laurie J. Hendren. 1994. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’94). 242--256.Google ScholarGoogle Scholar
  23. Dawson Engler and Ken Ashcraft. 2003. RacerX: Effective, static detection of race conditions and deadlocks. In Proceedings of the 19th International Symposium on Operating Systems Principles (SOSP’03). 237--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. 2000. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of the 4th International Conference on Operating Systems Design and Implementation (OSDI’00). 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  25. John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, and Kirk Olynyk. 2010. Effective data-race detection for the kernel. In Proceedings of the 9th International Conference on Operating Systems Design and Implementation (OSDI’10). 151--162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Pedro Fonseca, Cheng Li, and Rodrigo Rodrigues. 2011. Finding complex concurrency bugs in large multi-threaded applications. In Proceedings of the 6th European Conference on Computer Systems (EuroSys’11). 215--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pedro Fonseca, Rodrigo Rodrigues, and Björn B. Brandenburg. 2014. SKI: Exposing kernel concurrency bugs through systematic schedule exploration. In Proceedings of the 11th International Conference on Operating Systems Design and Implementation (OSDI’14). 415--431.Google ScholarGoogle Scholar
  28. Vinod Ganapathy, Matthew J. Renzelmann, Arini Balakrishnan, Michael M. Swift, and Somesh Jha. 2008. The design and implementation of microdrivers. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’08). 168--178.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the 9th International Symposium on Code Generation and Optimization (CGO’11). 289--298.Google ScholarGoogle ScholarCross RefCross Ref
  30. Nevin Heintze and Olivier Tardieu. 2001. Ultra-fast aliasing analysis using CLA: A million lines of C code in a second. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’01). 254--263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Christopher LaRosa, Li Xiong, and Ken Mandelberg. 2008. Frequent pattern mining for kernel trace data. In Proceedings of the ACM Symposium on Applied Computing. 880--885.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. In Proceedings of the International Conference on Programming Language Design and Implementation (PLDI’07). 278--289.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Julia L. Lawall, Julien Brunel, Nicolas Palix, René Rydhof Hansen, Henrik Stuart, and Gilles Muller. 2009. WYSIWIB: A declarative approach to finding API protocols and bugs in Linux code. In Proceedings of the 39th International Conference on Dependable Systems and Networks (DSN’09). 43--52.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ben Leslie, Peter Chubb, Nicholas Fitzroy-Dale, Stefan Götz, Charles Gray, Luke Macpherson, Daniel Potts, Yue-Ting Shen, Kevin Elphinstone, and Gernot Heiser. 2005. User-level device drivers: Achieved performance. J. Comput. Sci. Technol. 20, 5 (2005), 654--664.Google ScholarGoogle ScholarCross RefCross Ref
  35. Qiwei Li, Yanyan Jiang, Tianxiao Gu, Chang Xu, Jun Ma, Xiaoxing Ma, and Jian Lu. 2016. Effectively manifesting concurrency bugs in Android apps. In Proceedings of the 23rd Asia-Pacific Software Engineering Conference (APSEC’16). 209--216.Google ScholarGoogle ScholarCross RefCross Ref
  36. Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 13th International Symposium on Foundations of Software Engineering (FSE’05). 306--315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Haopeng Liu, Guangpu Li, Jeffrey F. Lukman, Jiaxin Li, Shan Lu, Haryadi S. Gunawi, and Chen Tian. 2017. DCatch: Automatically detecting distributed concurrency bugs in cloud systems. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). 677--691.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hu-Qiu Liu, Yu-Ping Wang, Jia-Ju Bai, and Shi-Min Hu. 2016. PF-Miner: A practical paired functions mining method for Android kernel in error paths. J. Syst. Softw. 121 (2016), 234--246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. LLVM 2018. LLVM compiler infrastructure. Retrieved from https://llvm.org/.Google ScholarGoogle Scholar
  40. Junjie Mao, Yu Chen, Qixue Xiao, and Yuanchun Shi. 2016. RID: Finding reference count bugs with inconsistent path pair checking. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). 531--544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ana Milanova, Atanas Rountev, and Barbara G. Ryder. 2004. Precise call graphs for C programs with function pointers. Autom. Softw. Eng. 11, 1 (2004), 7--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Changwoo Min, Sanidhya Kashyap, Byoungyoung Lee, Chengyu Song, and Taesoo Kim. 2015. Cross-checking semantic correctness: The case of finding file system bugs. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). 361--377.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. MySQL 2018. MYSQL database. Retrieved from https://www.mysql.com/.Google ScholarGoogle Scholar
  44. Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective static race detection for Java. In Proceedings of the 27th International Conference on Programming Language Design and Implementation (PLDI’06). 308--319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yoann Padioleau, Julia Lawall, René Rydhof Hansen, and Gilles Muller. 2008. Documenting and automating collateral evolutions in Linux device drivers. In Proceedings of the 3rd European Conference on Computer Systems (EuroSys’08). 247--260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Gilles Muller, and Julia Lawall. 2014. Faults in Linux 2.6. ACM Trans. Comput. Syst. 32, 2 (2014), 4:1--4:40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Matthew J. Renzelmann and Michael M. Swift. 2009. Decaf: Moving device drivers to a modern language. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’09). 1--14.Google ScholarGoogle Scholar
  48. Leonid Ryzhyk, Yanjin Zhu, and Gernot Heiser. 2010. The case for active device drivers. In Proceedings of the 1st Aisa-Pacific Workshop on Systems (APSys’10). 25--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Suman Saha, Jean-Pierre Lozi, Gaël Thomas, Julia L. Lawall, and Gilles Muller. 2013. Hector: Detecting resource-release omission faults in error-handling code for systems software. In Proceedings of the 43rd International Conference on Dependable Systems and Networks (DSN’13). 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Anirudh Santhiar and Aditya Kanade. 2017. Static deadlock detection for asynchronous C# programs. In Proceedings of the 38th International Conference on Programming Language Design and Implementation (PLDI’17). 292--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd International Symposium on Principles of Programming Languages (POPL’96). 32--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Michael M. Swift, Brian N. Bershad, and Henry M. Levy. 2003. Improving the reliability of commodity operating systems. In Proceedings of the 19th International Symposium on Operating Systems Principles (SOSP’03). 207--222.Google ScholarGoogle Scholar
  53. Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: Mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). 11--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Vesal Vojdani, Kalmer Apinis, Vootele Rõtov, Helmut Seidl, Varmo Vene, and Ralf Vogler. 2016. Static race detection for device drivers: The Goblint approach. In Proceedings of the 31st International Conference on Automated Software Engineering (ASE’16). 391--402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Dasarath Weeratunge, Xiangyu Zhang, William N. Sumner, and Suresh Jagannathan. 2010. Analyzing concurrency bugs using dual slicing. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA’10). 253--264.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Amy Williams, William Thies, and Michael D. Ernst. 2005. Static deadlock detection for Java libraries. In Proceedings of the 19th European Conference on Object-Oriented Programming (ECOOP’05). 602--629.Google ScholarGoogle Scholar
  57. Thomas Witkowski, Nicolas Blanc, Daniel Kroening, and Georg Weissenbacher. 2007. Model checking concurrent Linux device drivers. In Proceedings of the 22nd International Conference on Automated Software Engineering (ASE’07). 501--504.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jinlin Yang, David Evans, Deepali Bhardwaj, Thirumalesh Bhat, and Manuvir Das. 2006. Perracotta: Mining temporal API rules from imperfect traces. In Proceedings of the 28th International Conference on Software Engineering (ICSE’06). 282--291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Insu Yun, Changwoo Min, Xujie Si, Yeongjin Jang, Taesoo Kim, and Mayur Naik. 2016. APISan: Sanitizing API usages through semantic cross-checking. In Proceedings of the USENIX Security Symposium. 363--378.Google ScholarGoogle Scholar
  60. Yian Zhu, Yue Li, Jingling Xue, Tian Tan, Jialong Shi, Yang Shen, and Chunyan Ma. 2012. What is system hang and how to handle it. In Proceedings of the 23rd International Symposium on Software Reliability Engineering (ISSRE’12). 141--150.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Effective Detection of Sleep-in-atomic-context Bugs in the Linux Kernel

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer Systems
          ACM Transactions on Computer Systems  Volume 36, Issue 4
          Section: Best of ATC 2019 and Regular Paper
          November 2018
          115 pages
          ISSN:0734-2071
          EISSN:1557-7333
          DOI:10.1145/3394910
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 April 2020
          • Accepted: 1 December 2019
          • Revised: 1 September 2019
          • Received: 1 October 2018
          Published in tocs Volume 36, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format