research-article

Test Generation Strategies for Building Failure Models and Explaining Spurious Failures

Authors:
Baharin A. Jodat

University of Ottawa, Ottawa, Canada

University of Ottawa, Ottawa, Canada

0009-0006-0110-8488
View Profile

,
Abhishek Chandar

University of Ottawa, Ottawa, Canada

University of Ottawa, Ottawa, Canada

0009-0006-5089-4059
View Profile

,
Shiva Nejati

University of Ottawa, Ottawa, Canada

University of Ottawa, Ottawa, Canada

0000-0002-0281-8231
View Profile

,
Mehrdad Sabetzadeh

University of Ottawa, Ottawa, Canada

University of Ottawa, Ottawa, Canada

0000-0002-4711-8319
View Profile

ACM Transactions on Software Engineering and Methodology Volume 33 Issue 4Article No.: 93pp 1–32https://doi.org/10.1145/3638246

Published:17 April 2024Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic. Failures resulting from invalid or unrealistic test inputs are spurious. Avoiding spurious failures improves the effectiveness of testing in exercising the main functions of a system, particularly for compute-intensive (CI) systems where a single test execution takes significant time. In this article, we propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures. We examine two alternative strategies for building failure models: (1) machine learning (ML)-guided test generation and (2) surrogate-assisted test generation. ML-guided test generation infers boundary regions that separate passing and failing test inputs and samples test inputs from those regions. Surrogate-assisted test generation relies on surrogate models to predict labels for test inputs instead of exercising all the inputs. We propose a novel surrogate-assisted algorithm that uses multiple surrogate models simultaneously, and dynamically selects the prediction from the most accurate model. We empirically evaluate the accuracy of failure models inferred based on surrogate-assisted and ML-guided test generation algorithms. Using case studies from the domains of cyber-physical systems and networks, we show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%, significantly outperforming ML-guided test generation and two baselines. Further, our approach learns failure-inducing rules that identify genuine spurious failures as validated against domain knowledge.

REFERENCES

[1] (Accessed: June 2023). Autopilot Online Benchmark. Retrieved from https://www.mathworks.com/matlabcentral/fileexchange/41490-autopilot-demo-for-arp4754a-do-178c-and-do-331?focused=6796756&tab=modelGoogle Scholar
[2] (Accessed: June 2023). Benchmark for Simulink Models. Retrieved from https://github.com/anonpaper23/testGenStrat/tree/main/Benchmark/Simulink%20ModelsGoogle Scholar
[3] (Accessed: June 2023). Code to Generate Results of each Research Questions. Retrieved from https://github.com/anonpaper23/testGenStrat/tree/main/EvaluationGoogle Scholar
[4] (Accessed: June 2023). Code to SoTA Implementation for NTSS Case Study. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Code/NTSS/SoTA.pyGoogle Scholar
[5] (Accessed: June 2023). Code to SoTA Implementation for Simulink Model Case Study. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Code/Simulink/Algorithms/decisiontreeSoTA.mGoogle Scholar
[6] (Accessed: June 2023). CPS and NTSS Requirements. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Benchmark/Formalization/CPS_and_NTSS_Formalization.pdfGoogle Scholar
[7] (Accessed: June 2023). ENRICH – Non-robustnEss aNalysis for tRaffIC sHaping. Retrieved from https://github.com/baharin/ENRICHGoogle Scholar
[8] (Accessed: June 2023). Figure 16 to Figure 21 – Precision and Recall Results Obtained by Varying Time Budget in RQ2. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[9] (Accessed: June 2023). Figure 9 – Comparing Dataset Sizes for Dynamic SA Algorithm and Seven Individual SA Algorithms in RQ1. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[10] (Accessed: June 2023). Lockheed Martin. Retrieved from https://www.lockheedmartin.comGoogle Scholar
[11] (Accessed: June 2023). Logistic Regression. Retrieved from http://faculty.cas.usf.edu/mbrannick/regression/Logistic.htmlGoogle Scholar
[12] (Accessed: June 2023). OpenWrt. Retrieved from www.openwrt.orgGoogle Scholar
[13] (Accessed: June 2023). Raw Datasets Obtained from each Algorithm for CPS and NTSS. Retrieved from https://github.com/anonpaper23/testGenStrat/tree/main/Data/DatasetGoogle Scholar
[14] (Accessed: June 2023). Replication Package of Alhazen Framework. Retrieved from https://zenodo.org/records/3902142Google Scholar
[15] (Accessed: June 2023). Results of each Research Question. Retrieved from https://github.com/anonpaper23/testGenStrat/tree/main/Evaluation%20ResultsGoogle Scholar
[16] (Accessed: June 2023). Results of Statistical Analysis. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Evaluation%20Results/RQ2/RQ2StatisticalResults.xlsxGoogle Scholar
[17] (Accessed: June 2023). Rules Obtained for each CI Subject. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Evaluation%20Results/RQ4/APandNTSS_Rules.xlsxGoogle Scholar
[18] (Accessed: June 2023). Source Codes of Algorithms for CPS and NTSS. Retrieved from https://github.com/anonpaper23/testGenStrat/tree/main/CodeGoogle Scholar
[19] (Accessed: June 2023). Table 15 to Table 20 – Average Accuracy, Recall and Precision Over all Runs of Algorithms by Varying Execution Time Budget in RQ2. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[20] (Accessed: June 2023). Table 21 to Table 24 – Full Set of Rules Obtained for NTSS, AP1, AP2 and AP3 in RQ4. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[21] (Accessed: June 2023). Table 3 – Parameter Names, Descriptions and Values used by SoTA. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[22] (Accessed: June 2023). Table 5 – Time Budgets Given to Non-CI Subjects in RQ1. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[23] (Accessed: June 2023). Table 6 – Statistical Tests for Dataset Size and Percentage of Incorrect Labels Over Dataset Size in RQ1. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[24] (Accessed: June 2023). Table 7 – Time Budget Considered for CI Subjects in RQ2. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[25] (Accessed: June 2023). Table 8 – Maximum Number of Test Executions for Non-CI Subjects in RQ2. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[26] (Accessed: June 2023). Table 9 to Table 14 – Statistical Tests for Accuracy, Recall and Precision by Varying Execution Time Budget in RQ2. Retrieved from https://github.com/anonpaper23/testGenStrat/blob/main/Supplementary_Material.pdfGoogle Scholar
[27] (Accessed: June 2023). tc-cake. Retrieved from https://man7.org/linux/man-pages/man8/tc-cake.8.htmlGoogle Scholar
[28] Annpureddy Yashwanth, Liu Che, Fainekos Georgios, and Sankaranarayanan Sriram. 2011. S-TaLiRo: A tool for temporal logic falsification for hybrid systems. In Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems.Abdulla Parosh Aziz and Leino K. Rustan M. (Eds.), Springer, 254–257.Google ScholarCross Ref
[29] Arrieta Aitor, Wang Shuai, Markiegi Urtzi, Arruabarrena Ainhoa, Etxeberria Leire, and Sagardui Goiuria. 2019. Pareto efficient multi-objective black-box test case selection for simulation-based testing. Information and Software Technology 114 (2019), 137–154.Google ScholarDigital Library
[30] Arrieta Aitor, Wang Shuai, Markiegi Urtzi, Sagardui Goiuria, and Etxeberria Leire. 2017. Search-based test case generation for cyber-physical systems. In Proceedings of the 2017 IEEE Congress on Evolutionary Computation. IEEE, 688–697.Google ScholarDigital Library
[31] (ASA) Federal Aviation Administration (FAA)/Aviation Supplies & Academics. 2009. Advanced Avionics Handbook. Aviation Supplies & Academics, Incorporated. Retrieved from https://books.google.lu/books?id=2xGuPwAACAAJGoogle Scholar
[32] Aschermann Cornelius, Frassetto Tommaso, Holz Thorsten, Jauernig Patrick, Sadeghi Ahmad-Reza, and Teuchert Daniel. 2019. NAUTILUS: Fishing for deep bugs with grammars. In Proceedings of the NDSS.Google Scholar
[33] Bastani Osbert, Sharma Rahul, Aiken Alex, and Liang Percy. 2017. Synthesizing program input grammars. ACM SIGPLAN Notices 52, 6 (2017), 95–110.Google ScholarDigital Library
[34] Beglerovic Halil, Stolz Michael, and Horn Martin. 2017. Testing of autonomous vehicles using surrogate models and stochastic optimization. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems. IEEE, 1–6.Google ScholarDigital Library
[35] Abdessalem Raja Ben, Nejati Shiva, Briand Lionel C., and Stifter Thomas. 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 63–74.Google ScholarDigital Library
[36] Böhme Marcel, Geethal Charaka, and Pham Van-Thuan. 2020. Human-in-the-loop automatic program repair. In Proceedings of the 2020 IEEE 13th International Conference on Software Testing, Validation and Verification. IEEE, 274–285.Google Scholar
[37] Brindescu Caius, Ahmed Iftekhar, Leano Rafael, and Sarma Anita. 2020. Planning for untangling: Predicting the difficulty of merge conflicts. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 801–811.Google Scholar
[38] Chaturvedi Devendra K.. 2017. Modeling and Simulation of Systems using MATLAB® and Simulink®. CRC press.Google Scholar
[39] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.Google ScholarCross Ref
[40] Cohen William W.. 1995. Fast effective rule induction. In Proceedings of the Machine Learning Proceedings 1995. Elsevier, 115–123.Google ScholarCross Ref
[41] Moura Leonardo De and Bjørner Nikolaj. 2008. Z3: An efficient SMT solver. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.Google ScholarCross Ref
[42] Derler Patricia, Lee Edward A., Tripakis Stavros, and Törngren Martin. 2013. Cyber-physical system design contracts. In Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems. 109–118.Google ScholarDigital Library
[43] Díaz-Manríquez Alan, Toscano Gregorio, Barron-Zambrano Jose Hugo, and Tello-Leal Edgar. 2016. A review of surrogate assisted multiobjective evolutionary algorithms. Computational Intelligence and Neuroscience 2016 (2016), 14 Pages.Google Scholar
[44] Dushatskiy Arkadiy, Alderliesten Tanja, and Bosman Peter A. N.. 2021. A novel surrogate-assisted evolutionary algorithm applied to partition-based ensemble learning. In Proceedings of the Genetic and Evolutionary Computation Conference. 583–591.Google ScholarDigital Library
[45] Feldt Robert and Yoo Shin. 2020. Flexible probabilistic modeling for search based test data generation. In Proceedings of the 13th International Workshop on Search-Based Software Testing. 537–540.Google Scholar
[46] Friese Martina, Bartz-Beielstein Thomas, and Emmerich Michael. 2016. Building ensembles of surrogates by optimal convex combination. Bioinspired Optimization Methods and their Applications (2016), 131–143.Google Scholar
[47] Gaaloul Khouloud, Menghi Claudio, Nejati Shiva, Briand Lionel C., and Wolfe David. 2020. Mining assumptions for software components using machine learning. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 159–171.Google ScholarDigital Library
[48] Ghotra Baljinder, McIntosh Shane, and Hassan Ahmed E.. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. IEEE, 789–800.Google Scholar
[49] Giannakopoulou Dimitra, Pasareanu Corina S., and Barringer Howard. 2002. Assumption generation for software component verification. In Proceedings of the International Conference on Automated Software Engineering. IEEE, 3–12.Google Scholar
[50] Giannakopoulou Dimitra, Pressburger Thomas, Mavridou Anastasia, and Schumann Johann. 2021. Automated formalization of structured natural language requirements. Information and Software Technology 137 (2021), 106590. DOI:Google ScholarCross Ref
[51] Gopinath Rahul, Kampmann Alexander, Havrikov Nikolas, Soremekun Ezekiel O., and Zeller Andreas. 2020. Abstracting failure-inducing inputs. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 237–248.Google ScholarDigital Library
[52] Hanford Kenneth V.. 1970. Automatic generation of test cases. IBM Systems Journal 9, 4 (1970), 242–257.Google ScholarDigital Library
[53] Haq Fitash Ul, Shin Donghwan, Nejati Shiva, and Briand Lionel. 2021. Can offline testing of deep neural networks replace their online testing? A case study of automated driving systems. Empirical Software Engineering 26, 5 (2021), 90.Google Scholar
[54] Harman Mark, Kim Sung Gon, Lakhotia Kiran, McMinn Phil, and Yoo Shin. 2010. Optimizing for the number of tests generated in search based test data generation with an application to the oracle cost problem. In Proceedings of the 2010 3rd International Conference on Software Testing, Verification, and Validation Workshops. IEEE, 182–191.Google ScholarDigital Library
[55] Harman Mark and McMinn Phil. 2009. A theoretical and empirical study of search-based testing: Local, global, and hybrid search. IEEE Transactions on Software Engineering 36, 2 (2009), 226–247.Google ScholarDigital Library
[56] Henzinger Thomas A., Qadeer Shaz, and Rajamani Sriram K.. 1998. You assume, we guarantee: Methodology and case studies. In Proceedings of the Computer Aided Verification: 10th International Conference, CAV’98 Vancouver, BC, Canada, June 28–July 2, 1998 Proceedings 10. Springer, 440–451.Google Scholar
[57] Høiland-Jørgensen Toke, Täht Dave, and Morton Jonathan. 2018. Piece of CAKE: A comprehensive queue management solution for home gateways. In Proceedings of the 2018 IEEE International Symposium on Local and Metropolitan Area Networks.IEEE, 37–42.Google Scholar
[58] Hong Linxiong, Li Huacong, and Fu Jiangfeng. 2022. A novel surrogate-model based active learning method for structural reliability analysis. Computer Methods in Applied Mechanics and Engineering 394 (2022), 114835. DOI:Google ScholarCross Ref
[59] Hu Boyue Caroline, Marsso Lina, Czarnecki Krzysztof, Salay Rick, Shen Huakun, and Chechik Marsha. 2022. If a human can see it, so should your system: Reliability requirements for machine vision components. In Proceedings of the 44th International Conference on Software Engineering.Association for Computing Machinery, New York, NY, 1145–1156. DOI:DOI:Google ScholarDigital Library
[60] Humeniuk Dmytro, Antoniol Giuliano, and Khomh Foutse. 2021. Data driven testing of cyber physical systems. In Proceedings of the 2021 IEEE/ACM 14th International Workshop on Search-Based Software Testing. IEEE, 16–19.Google Scholar
[61] Humeniuk Dmytro, Khomh Foutse, and Antoniol Giuliano. 2022. A search-based framework for automatic generation of testing environments for cyber-physical systems. Information and Software Technology 149 (2022), 106936.Google ScholarDigital Library
[62] Jin Yaochu. 2005. A comprehensive survey of fitness approximation in evolutionary computation. Soft Computing 9, 1 (2005), 3–12.Google ScholarDigital Library
[63] Jin Yaochu and Sendhoff Bernhard. 2002. Fitness approximation in evolutionary computation-a survey. In Proceedings of the GECCO. 1105–12.Google Scholar
[64] Jodat Baharin A., Nejati Shiva, Sabetzadeh Mehrdad, and Saavedra Patricio. 2023. Learning non-robustness using simulation-based testing: A network traffic-shaping case study. In Proceedings of the 2023 IEEE Conference on Software Testing, Verification and Validation. IEEE, 386–397.Google Scholar
[65] Kampmann Alexander, Havrikov Nikolas, Soremekun Ezekiel O., and Zeller Andreas. 2020. When does my program do this? learning circumstances of software behavior. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1228–1239.Google ScholarDigital Library
[66] Kapugama Charaka Geethal, Pham Van-Thuan, Aleti Aldeida, and Böhme Marcel. 2022. Human-in-the-loop oracle learning for semantic bugs in string processing programs. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 215–226.Google ScholarDigital Library
[67] Kifetew Fitsum Meshesha, Tiella Roberto, and Tonella Paolo. 2017. Generating valid grammar-based test inputs by means of genetic programming and annotated grammars. Empirical Software Engineering 22, 2 (2017), 928–961.Google ScholarDigital Library
[68] Kulkarni Neil, Lemieux Caroline, and Sen Koushik. 2021. Learning highly recursive input grammars. In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 456–467.Google Scholar
[69] Lee Jaekwon, Shin Seung Yeob, Nejati Shiva, Bsriand Lionel C., and Parache Yago Isasi. 2022. Estimating probabilistic safe WCET ranges of real-time systems at design stages. ACM Transactions on Software Engineering and Methodology 32, 2 (2022), 1–33.Google Scholar
[70] Luke Sean. 2013. Essentials of Metaheuristics (2nd. ed.). Lulu. Retrieved from http://cs.gmu.edu/sean/book/metaheuristics/Google Scholar
[71] Matinnejad Reza, Nejati Shiva, and Briand Lionel C.. 2017. Automated testing of hybrid Simulink/Stateflow controllers: Industrial case studies. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017. ACM, 938–943.Google Scholar
[72] McKnight Patrick E. and Najab Julius. 2010. Mann-Whitney U test. The Corsini Encyclopedia of Psychology (2010), 1–1.Google Scholar
[73] Menghi Claudio, Nejati Shiva, Briand Lionel, and Parache Yago Isasi. 2020. Approximation-refinement testing of compute-intensive cyber-physical models: An approach based on system identification. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering. IEEE, 372–384.Google Scholar
[74] Menghi Claudio, Nejati Shiva, Gaaloul Khouloud, and Briand Lionel C.. 2019. Generating automated and online test oracles for simulink models with continuous and uncertain behaviors. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 27–38.Google ScholarDigital Library
[75] Miller Barton P., Fredriksen Lars, and So Bryan. 1990. An empirical study of the reliability of UNIX utilities. Communication of the ACM 33, 12 (1990), 32–44.Google ScholarDigital Library
[76] Molnar Christoph. 2020. Interpretable Machine Learning. Lulu. com.Google Scholar
[77] Nejati Shiva, Gaaloul Khouloud, Menghi Claudio, Briand Lionel C., Foster Stephen, and Wolfe David. 2019. Evaluating model testing and model checking for finding requirements violations in Simulink models. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1015–1025.Google ScholarDigital Library
[78] Nejati Shiva, Sorokin Lev, Safin Damir, Formica Federico, Mahboob Mohammad Mahdi, and Menghi Claudio. 2023. Reflections on surrogate-assisted search-based testing: A taxonomy and two replication studies based on industrial ADAS and simulink models. Information and Software Technology 163 (2023), 107286.Google ScholarDigital Library
[79] Ng Andrew. 2018. Machine learning yearning. Retrieved from http://www.mlyearning.org/. Accessed June 2023.Google Scholar
[80] Patgiri Ripon, Katari Hemanth, Kumar Ronit, and Sharma Dheeraj. 2019. Empirical study on malicious URL detection using machine learning. In Proceedings of theInternational Conference on Distributed Computing and Internet Technology. Springer, 380–388.Google Scholar
[81] Riccio Vincenzo and Tonella Paolo. 2023. When and why test generators for deep learning produce invalid inputs: an empirical study. In IEEE/ACM 45th International Conference on Software Engineering (ICSE’23), IEEE, 1161–1173.Google Scholar
[82] Sangiovanni-Vincentelli Alberto, Damm Werner, and Passerone Roberto. 2012. Taming Dr. Frankenstein: Contract-based design for cyber-physical systems. European Journal of Control 18, 3 (2012), 217–238.Google ScholarCross Ref
[83] Schaap Alexander, Marks Gordon, Pantelic Vera, Lawford Mark, Selim Gehan, Wassyng Alan, and Patcas Lucian. 2018. Documenting simulink designs of embedded systems. In Proceedings of the International Conference on Model Driven Engineering Languages and Systems Companion Proceedings. ACM, 47–51.Google Scholar
[84] Snoek Jasper, Larochelle Hugo, and Adams Ryan P.. 2012. Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 25 (2012), 2960–2968.Google Scholar
[85] Streijl Robert C., Winkler Stefan, and Hands David S.. 2016. Mean opinion score (MOS) revisited: Methods and applications, limitations and alternatives. Multimedia Systems 22, 2 (2016), 213–227.Google ScholarDigital Library
[86] Tong Hao, Huang Changwu, Minku Leandro L., and Yao Xin. 2021. Surrogate models in evolutionary single-objective optimization: A new taxonomy and experimental study. Information Sciences 562 (2021), 414–437.Google ScholarCross Ref
[87] Tuncali Cumhur Erkan, Fainekos Georgios, Prokhorov Danil, Ito Hisahiro, and Kapinski James. 2019. Requirements-driven test generation for autonomous vehicles with machine learning components. IEEE Transactions on Intelligent Vehicles 5, 2 (2019), 265–280.Google ScholarCross Ref
[88] Vargha András and Delaney Harold D.. 2000. A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics 25, 2 (2000), 101–132.Google Scholar
[89] Wang Junjie, Chen Bihuan, Wei Lei, and Liu Yang. 2019. Superion: Grammar-aware greybox fuzzing. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering. IEEE, 724–735.Google Scholar
[90] Wang Yan, Jia Peng, Liu Luping, Huang Cheng, and Liu Zhonglin. 2020. A systematic review of fuzzing based on machine learning techniques. PloS One 15, 8 (2020), e0237749.Google ScholarCross Ref
[91] Witten Ian H., Frank Eibe, and Hall Mark A.. 2011. Data Mining: Practical Machine Learning Tools and Techniques (3rd. ed.). Morgan Kaufmann, Amsterdam. Retrieved from http://www.sciencedirect.com/science/book/9780123748560Google Scholar
[92] Xu Huanwei, Zhang Xin, Li Hao, and Xiang Ge. 2021. An ensemble of adaptive surrogate models based on local error expectations. Mathematical Problems in Engineering 2021, Article ID 8857417 (2021), 14 pages.Google Scholar

Index Terms

Test Generation Strategies for Building Failure Models and Explaining Spurious Failures
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
    2. Machine learning approaches
      1. Rule learning
2. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering
    2. Software verification and validation
      1. Empirical software validation

Recommendations

Achieving scalable mutation-based generation of whole test suites

Without complete formal specification, automatically generated software tests need to be manually checked in order to detect faults. This makes it desirable to produce the strongest possible test set while keeping the number of tests as small as ...
Read More
A detailed investigation of the effectiveness of whole test suite generation

A common application of search-based software testing is to generate test cases for all goals defined by a coverage criterion (e.g., lines, branches, mutants). Rather than generating one test case at a time for each of these goals individually, whole ...
Read More
Mutation-based test generation for quantum programs with multi-objective search
GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference

Mutation testing is often used for designing new tests, and involves changing a program in minor ways, which results in mutated versions of the program, i.e., mutants. An effective test suite should find faults (or kill mutants) with a minimum number of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 33, Issue 4
May 2024
940 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3613665
Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2024
- Online AM: 21 December 2023
- Accepted: 8 December 2023
- Revised: 1 November 2023
- Received: 15 June 2023
Published in tosem Volume 33, Issue 4

Check for updates
Author Tags
Search-based testing
machine learning
surrogate models
failure models
test-input validity
and spurious failures
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 133
  Total Downloads
- Downloads (Last 12 months)133
- Downloads (Last 6 weeks)37
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Test Generation Strategies for Building Failure Models and Explaining Spurious Failures

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Achieving scalable mutation-based generation of whole test suites

A detailed investigation of the effectiveness of whole test suite generation

Mutation-based test generation for quantum programs with multi-objective search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Test Generation Strategies for Building Failure Models and Explaining Spurious Failures

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Achieving scalable mutation-based generation of whole test suites

A detailed investigation of the effectiveness of whole test suite generation

Mutation-based test generation for quantum programs with multi-objective search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media