Abstract
Ensuring the safety of autonomous vehicles (AVs) is of utmost importance, and testing them in simulated environments is a safer option than conducting in-field operational tests. However, generating an exhaustive test suite to identify critical test scenarios is computationally expensive, as the representation of each test is complex and contains various dynamic and static features, such as the AV under test, road participants (vehicles, pedestrians, and static obstacles), environmental factors (weather and light), and the road’s structural features (lanes, turns, road speed, etc.). In this article, we present a systematic technique that uses Instance Space Analysis (ISA) to identify the significant features of test scenarios that affect their ability to reveal the unsafe behaviour of AVs. ISA identifies the features that best differentiate safety-critical scenarios from normal driving and visualises the impact of these features on test scenario outcomes (safe/unsafe) in two dimensions. This visualisation helps to identify untested regions of the instance space and provides an indicator of the quality of the test suite in terms of the percentage of feature space covered by testing. To test the predictive ability of the identified features, we train five Machine Learning classifiers to classify test scenarios as safe or unsafe. The high precision, recall, and F1 scores indicate that our proposed approach is effective in predicting the outcome of a test scenario without executing it and can be used for test generation, selection, and prioritisation.
- [1] Baidu Apollo team (2017), Apollo: Open Source Autonomous Driving. Retrieved from https://github.com/ApolloAuto/apolloGoogle Scholar
- [2] BeamNG.tech. Retrieved from https://beamng.tech/Google Scholar
- [3] Mathworks Polyarea. Retrieved from https://www.mathworks.com/help/matlab/ref/polyarea.htmlGoogle Scholar
- [4] Mathworks Polygons. Retrieved from https://au.mathworks.com/help/map/create-and-display-polygons.htmlGoogle Scholar
- [5] . 2018. Testing autonomous cars for feature interaction failures using many-objective search. In Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE’18). IEEE, 143–154.Google ScholarDigital Library
- [6] . 2010. Principal component analysis. Wiley Interdiscipl. Rev.: Comput. Stat. 2, 4 (2010), 433–459.Google ScholarDigital Library
- [7] . 2021. Simulation for robotics test automation: Developer perspectives. In Proceedings of the 14th IEEE Conference on Software Testing, Verification and Validation (ICST’21). IEEE, 263–274.Google ScholarCross Ref
- [8] . 2023. Black-box testing of deep neural networks through test case diversity. IEEE Transactions on Software Engineering 49, 5 (2023), 3182–3204.Google Scholar
- [9] . 2021. E-APR: mapping the effectiveness of automated program repair techniques. Empirical Software Engineering 26, 5 (2021), 1–30.Google ScholarDigital Library
- [10] . 2019. Detection of critical safety events on freeways in clear and rainy weather using SHRP2 naturalistic driving data: Parametric and non-parametric techniques. Safe. Sci. 119 (2019), 141–149.Google ScholarCross Ref
- [11] . 2021. scenoRITA: Generating less-redundant, safety-critical and motion sickness-inducing scenarios for autonomous vehicles. arXiv:2112.09725. Retrieved from https://arxiv.org/abs/2112.09725Google Scholar
- [12] . 2007. Clustering categorical data using silhouette coefficient as a relocating measure. In Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’07), Vol. 2. IEEE, 13–17.Google ScholarDigital Library
- [13] . 2021. Targeting patterns of driving characteristics in testing autonomous driving systems. In Proceedings of the 14th IEEE Conference on Software Testing, Verification and Validation (ICST’21). IEEE, 295–305.Google ScholarCross Ref
- [14] . 2018. Multi-objective black-box test case selection for cost-effectively testing simulation models. In Proceedings of the Genetic and Evolutionary Computation Conference. 1411–1418.Google ScholarDigital Library
- [15] . 2019. Search-based test case prioritization for simulation-based testing of cyber-physical system product lines. J. Syst. Softw. 149 (2019), 1–34.Google ScholarCross Ref
- [16] . 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 63–74.Google ScholarDigital Library
- [17] . 2022. Cost-effective simulation-based test selection in self-driving cars software with SDC-Scissor. In Proceedings of the 29th IEEE International Conference on Software Analysis, Evolution, and Reengineering. ZHAW Zürcher Hochschule für Angewandte Wissenschaften.Google ScholarCross Ref
- [18] . 2021. Automated test cases prioritization for self-driving cars in virtual environments. arXiv:2107.09614. Retrieved from https://arxiv.org/abs/2107.09614Google Scholar
- [19] . 2022. Single and multi-objective test cases prioritization for self-driving cars in virtual environments. Proc. ACM Meas. Anal. Comput. Syst. 32, 2 (2022). 1–30.Google Scholar
- [20] . 2001. Random forests. Mach. Learn. 45, 1 (2001), 5–32.Google ScholarDigital Library
- [21] . 2020. Generating avoidable collision scenarios for testing autonomous driving systems. In Proceedings of the IEEE 13th International Conference on Software Testing, Validation and Verification (ICST’20). IEEE, 375–386.Google ScholarCross Ref
- [22] . 2017. An empirical evaluation of evolutionary algorithms for test suite generation. In International Symposium on Search Based Software Engineering. Springer, 33–48.Google ScholarCross Ref
- [23] . 2015. Classification for safety-critical car-cyclist scenarios using machine learning. In Proceedings of the IEEE 18th International Conference on Intelligent Transportation Systems. IEEE, 1995–2000.Google ScholarDigital Library
- [24] . 2021. Frenetic at the SBST 2021 tool competition. In Proceedings of the IEEE/ACM 14th International Workshop on Search-Based Software Testing (SBST’21). 36–37.
DOI: Google ScholarCross Ref - [25] . 2021. A combinatorial approach to testing deep neural network-based autonomous driving systems. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW’21). IEEE, 57–66.Google ScholarCross Ref
- [26] . 2020. Deep neural network test coverage: How far are we? arXiv:2010.04946. Retrieved from https://arxiv.org/abs/2010.04946Google Scholar
- [27] . 2014. Normalized compression distance of multisets with applications. IEEE Trans. Pattern Anal. Mach. Intell. 37, 8 (2014), 1602–1614.Google ScholarDigital Library
- [28] . 2001. Looking for natural patterns in data: Part 1. Density-based approach. Chemometr. Intell. Lab. Syst. 56, 2 (2001), 83–92.Google ScholarCross Ref
- [29] . 2016. Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychol. Methods 21, 3 (2016), 273.Google ScholarCross Ref
- [30] . 2022. Scenario-based test reduction and prioritization for multi-module autonomous driving systems. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 82–93.Google ScholarDigital Library
- [31] . 2023. A survey on safety-critical driving scenario generation—A methodological perspective. IEEE Transactions on Intelligent Transportation Systems 24 (2023), 6971–6988.Google ScholarDigital Library
- [32] . 2021. Efficient and effective generation of test cases for pedestrian detection-search-based software testing of baidu apollo in SVL. In Proceedings of the IEEE International Conference on Artificial Intelligence Testing (AITest’21). IEEE, 103–110.Google ScholarCross Ref
- [33] . 1983. On the shape of a set of points in the plane. IEEE Trans. Inf. Theory 29, 4 (1983), 551–559.Google ScholarDigital Library
- [34] . 2019. Generating effective test cases for self-driving cars from police reports. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 257–267.Google ScholarDigital Library
- [35] . 2022. SBST tool competition 2022. In Proceedings of the 15th Workshop on Search-Based Software Testing. 25–32.Google ScholarDigital Library
- [36] . 2019. Automatically testing self-driving cars with search-based procedural content generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 318–328.Google ScholarDigital Library
- [37] . 2022. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, Inc.Google Scholar
- [38] . 2009. Comparing algorithms for search-based test data generation of Matlab Simulink models. In Proceedings of the IEEE Congress on Evolutionary Computation. IEEE, 2940–2947.Google ScholarCross Ref
- [39] . 2020. Metamorphic fuzz testing of autonomous vehicles. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. 380–385.Google ScholarDigital Library
- [40] . 2022. Efficient online testing for DNN-enabled systems using surrogate-assisted and many-objective optimization. In Proceedings of the 44th International Conference on Software Engineering. 811–822.Google ScholarDigital Library
- [41] . 2007. A theoretical & empirical analysis of evolutionary testing and hill climbing for structural test data generation. In Proceedings of the International Symposium on Software Testing and Analysis. 73–83.Google ScholarDigital Library
- [42] . 2020. Clustering traffic scenarios using mental models as little as possible. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV’20). IEEE, 1007–1012.Google ScholarDigital Library
- [43] . 2019. Fitness functions for testing automated and autonomous driving systems. In International Conference on Computer Safety, Reliability, and Security. Springer, 69–84.Google ScholarDigital Library
- [44] . 2019. Did we test all scenarios for automated and autonomous driving systems? In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC’19). IEEE, 2950–2955.Google ScholarDigital Library
- [45] . 2002. Applied statistics for the behavioral sciences. CENGAGE Learning. https://books.google.com.au/books?id=74kDAAAACAAJGoogle Scholar
- [46] . 2022. AmbieGen tool at the SBST 2022 tool competition. In Proceedings of the 15th Workshop on Search-Based Software Testing. 43–46.Google ScholarDigital Library
- [47] . 2021. Quality metrics and oracles for autonomous vehicles testing. In Proceedings of the 14th IEEE Conference on Software Testing, Verification and Validation (ICST’21). IEEE, 194–204.Google ScholarCross Ref
- [48] . 2021. Application Story of ODD as Part of Safety Assurance. Retrieved from https://www.asam.net/index.php?eID=dumpFile&t=f&f=430&token=3135965e578e5bb92a01725cd37823c3979da158Google Scholar
- [49] . 2016. How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability. Technical Report. RAND Corporation, Santa Monica, CA, 1129–1134.Google Scholar
- [50] . 2017. Visualising forecasting algorithm performance using time series instance spaces. Int. J. Forecast. 33, 2 (2017), 345–358.Google ScholarCross Ref
- [51] . 2020. Clustering of the scenario space for the assessment of automated driving. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV’20). IEEE, 578–583.Google ScholarDigital Library
- [52] . 2016. Testing autonomous vehicle software in the virtual prototyping environment. IEEE Embed. Syst. Lett. 9, 1 (2016), 5–8.Google ScholarDigital Library
- [53] . 2018. Using ontologies for test suites generation for automated and autonomous driving functions. In Proceedings of the IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW’18). IEEE, 118–123.Google ScholarCross Ref
- [54] . 2021. Analysing experimental results obtained when applying search-based testing to verify automated driving functions. In Proceedings of the 8th International Conference on Dependable Systems and Their Applications (DSA’21). IEEE, 213–219.Google ScholarCross Ref
- [55] . 2019. Genetic algorithm-based test parameter optimization for ADAS system testing. In Proceedings of the IEEE 19th International Conference on Software Quality, Reliability and Security (QRS’19). IEEE, 418–425.Google ScholarCross Ref
- [56] . 2019. Performance comparison of two search-based testing strategies for ADAS system validation. In IFIP International Conference on Testing Software and Systems. Springer, 140–156.Google ScholarDigital Library
- [57] . 2013. Decision trees: A recent overview. Artif. Intell. Rev. 39, 4 (2013), 261–283.Google ScholarDigital Library
- [58] . 2016. Scikit-learn. In Machine Learning for Evolution Strategies. Springer, 45–53.Google ScholarCross Ref
- [59] . 2018. An unsupervised random forest clustering technique for automatic traffic scenario categorization. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC’18). IEEE, 2811–2818.Google ScholarDigital Library
- [60] . 2019. Unsupervised and supervised learning with the random forest algorithm for traffic scenario clustering and classification. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV’19). IEEE, 2463–2470.Google ScholarDigital Library
- [61] . 2012. Determinantal point processes for machine learning. Found. Trends Mach. Learn. 5, 2–3 (2012), 123–286.Google ScholarCross Ref
- [62] . 2022. Parameter coverage for testing of autonomous driving systems under uncertainty. ACM Transactions on Software Engineering and Methodology 33, 3 (2022), 1–31.Google ScholarDigital Library
- [63] . 2020. AV-FUZZER: Finding safety violations in autonomous driving systems. In Proceedings of the IEEE 31st International Symposium on Software Reliability Engineering (ISSRE’20). IEEE, 25–36.Google ScholarCross Ref
- [64] . 2020. AV-FUZZER: Finding safety violations in autonomous driving systems. In Proceedings of the IEEE 31st International Symposium on Software Reliability Engineering (ISSRE’20). 25–36.
DOI: Google ScholarCross Ref - [65] . 2020. Ontology-based test generation for automated and autonomous driving functions. Inf. Softw. Technol. 117 (2020), 106200.Google ScholarDigital Library
- [66] . 2022. Testing of autonomous driving systems: Where are we and where should we go? In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 31–43.Google ScholarDigital Library
- [67] . 2022. Learning configurations of operating environment of autonomous vehicles to maximize their collisions. IEEE Transactions on Software Engineering 49, 1 (2022), 384–402.Google ScholarCross Ref
- [68] . 2023. DeepScenario: An open driving scenario dataset for autonomous driving system testing. In IEEE/ACM 20th International Conference on Mining Software Repositories (MSR’23). 52–56. Google ScholarCross Ref
- [69] . 2021. Search-based selection and prioritization of test scenarios for autonomous driving systems. In International Symposium on Search Based Software Engineering. Springer, 41–55.Google ScholarDigital Library
- [70] . 2019. Towards system-level testing with coverage guarantees for autonomous vehicles. In Proceedings of the ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems (MODELS’19). IEEE, 89–94.Google ScholarCross Ref
- [71] . 2013. OpenDS: A new open-source driving simulator for research. In Proceedings of the GMM Symposium on Automotive meets Electronics. GMM-Symposium (AmE’13).Google Scholar
- [72] . 2001. Extended time-to-collision measures for road traffic safety assessment. Accident Anal. Prevent. 33, 1 (2001), 89–97.Google ScholarCross Ref
- [73] . 2015. Illuminating search spaces by mapping elites. arXiv:1504.04909. Retrieved from https://arxiv.org/abs/1504.04909Google Scholar
- [74] . 2021. InstanceSpace. Retrieved from https://github.com/andremun/InstanceSpaceGoogle Scholar
- [75] . 2020. Generating new space-filling test instances for continuous black-box optimization. Evol. Comput. 28, 3 (2020), 379–404.Google ScholarDigital Library
- [76] . 2017. Performance analysis of continuous black-box optimization algorithms via footprints in instance space. Evol. Comput. 25, 4 (2017), 529–554.Google ScholarDigital Library
- [77] . 2018. Instance spaces for machine learning classification. Mach. Learn. 107, 1 (2018), 109–147.Google ScholarDigital Library
- [78] . 2021. An instance space analysis of regression problems. ACM Trans. Knowl. Discov. Data 15, 2 (2021).Google Scholar
- [79] . 1991. Multilayer perceptrons for classification and regression. Neurocomputing 2, 5-6 (1991), 183–197.Google ScholarCross Ref
- [80] . 2021. SALVO: Automated generation of diversified tests for self-driving cars from existing maps. In Proceedings of the IEEE International Conference on Artificial Intelligence Testing (AITest’21). IEEE, 128–135.Google ScholarCross Ref
- [81] . 2018. Mapping the effectiveness of automated test suite generation techniques. IEEE Trans. Reliabil. 67, 3 (2018), 771–785.Google ScholarCross Ref
- [82] . 2015. A multi-objective evolutionary algorithm for the tuning of fuzzy rule bases for uncoordinated intersections in autonomous driving. Inf. Sci. 321 (2015), 14–30.Google ScholarDigital Library
- [83] . 2017. Lips vs mosa: A replicated empirical study on automated test case generation. In International Symposium on Search Based Software Engineering. Springer, 83–98.Google ScholarCross Ref
- [84] . 2018. A large scale empirical comparison of state-of-the-art search-based test case generators. Inf. Softw. Technol. 104 (2018), 236–256.Google ScholarCross Ref
- [85] . 2021. Sbst tool competition 2021. In Proceedings of the IEEE/ACM 14th International Workshop on Search-Based Software Testing (SBST’21). IEEE, 20–27.Google ScholarCross Ref
- [86] . 2020. Software verification and validation of safe autonomous cars: A systematic literature review. IEEE Access 9 (2020), 4797–4819.Google ScholarCross Ref
- [87] . 2020. Model-based exploration of the frontier of behaviours for deep learning system testing. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 876–888.Google ScholarDigital Library
- [88] . 2001. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Vol. 3. 41–46.Google Scholar
- [89] . 2020. Lgsvl simulator: A high fidelity simulator for autonomous driving. In Proceedings of the IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC’20). IEEE, 1–6.Google ScholarDigital Library
- [90] . 2016. Search-based testing of procedural programs: Iterative single-target or multi-target approach? In International Symposium on Search Based Software Engineering. Springer, 64–79.Google ScholarCross Ref
- [91] . 2017. DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Datab. Syst. 42, 3 (2017), 1–21.Google ScholarDigital Library
- [92] . 2017. On a formal model of safe and scalable self-driving cars. arXiv:1708.06374. Retrieved from https://arxiv.org/abs/1708.06374Google Scholar
- [93] . 2014. Towards objective measures of algorithm performance across instance space. Comput. Operat. Res. 45 (2014), 12–24.Google ScholarDigital Library
- [94] . 2022. Instance space analysis of search-based software testing. IEEE Transactions on Software Engineering 49, 4 (2022), 2642–2660.Google ScholarDigital Library
- [95] . 2022. Confidence-driven weighted retraining for predicting safety-critical failures in autonomous driving systems. J. Softw.: Evol. Process 34, 10 (2022), e2386.Google ScholarCross Ref
- [96] . 2020. Misbehaviour prediction for autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 359–371.Google ScholarDigital Library
- [97] . 2021. Collision avoidance testing for autonomous driving systems on complete maps. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV’21). IEEE, 179–185.Google ScholarDigital Library
- [98] . 2021. Systematic testing of autonomous driving systems using map topology-based scenario classification. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE’21). 1342–1346.
DOI: Google ScholarDigital Library - [99] . 2019. On the industrial application of combinatorial testing for autonomous driving functions. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW’19). IEEE, 234–240.Google ScholarCross Ref
- [100] . 2019. A brief review of nearest neighbor algorithm for learning and classification. In Proceedings of the International Conference on Intelligent Computing and Control Systems (ICCS’19). IEEE, 1255–1260.Google ScholarCross Ref
- [101] . 2022. MOSAT: Finding safety violations of autonomous driving systems using multi-objective genetic algorithm. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 94–106.Google ScholarDigital Library
- [102] . 2022. Generating critical test scenarios for autonomous driving systems via influential behavior patterns. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.Google ScholarDigital Library
- [103] . 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. 303–314.Google ScholarDigital Library
- [104] . 2018. Crashing simulated planes is cheap: Can simulation detect robotics bugs early? In Proceedings of the IEEE 11th International Conference on Software Testing, Verification and Validation (ICST’18). IEEE, 331–342.Google ScholarCross Ref
- [105] . 2003. A comparison of headway and time to collision as safety indicators. Accident Anal. Prevent. 35, 3 (2003), 427–433.Google ScholarCross Ref
- [106] . 2018. An empirical comparison of combinatorial testing, random testing and adaptive random testing. IEEE Trans. Softw. Eng. 46, 3 (2018), 302–320.Google ScholarDigital Library
- [107] . 2022. Operational design domain of automated vehicles at freeway entrance terminals. Accident Anal. Prevent. 174 (2022), 106776.Google ScholarCross Ref
- [108] . 2017. Search-based testing and system testing: a marriage in heaven. In Proceedings of the IEEE/ACM 10th International Workshop on Search-Based Software Testing (SBST’17). IEEE, 49–50.Google ScholarCross Ref
- [109] . 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 132–142.Google ScholarDigital Library
- [110] . 2023. Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles. IEEE Transactions on Software Engineering 49, 4 (2023), 1860–1875. Google ScholarDigital Library
- [111] . 2020. Deepbillboard: Systematic physical-world testing of autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 347–358.Google ScholarDigital Library
- [112] . 2016. Testing and validating high level components for automated driving: Simulation framework for traffic scenarios. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV’16). IEEE, 144–150.Google ScholarDigital Library
- [113] . 2021. Deephyperion: Exploring the feature space of deep learning-based systems through illumination search. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 79–90.Google ScholarDigital Library
- [114] . 2024. Towards reliable AI: Adequacy metrics for ensuring the quality of system-level testing of autonomous vehicles. In IEEE/ACM 46th International Conference on Software Engineering (ICSE’24). 805–816.Google ScholarDigital Library
Index Terms
- Identifying and Explaining Safety-critical Scenarios for Autonomous Vehicles via Key Features
Recommendations
Strong mutation-based test data generation using hill climbing
SBST '16: Proceedings of the 9th International Workshop on Search-Based Software TestingMutation Testing is an effective test criterion for finding faults and assessing the quality of a test suite. Every test criterion requires the generation of test cases, which turns to be a manual and difficult task. In literature, search-based ...
CriSGen: Constraint-Based Generation of Critical Scenarios for Autonomous Vehicles
Formal Methods. FM 2019 International WorkshopsAbstractEnsuring pedestrian-safety is paramount to the acceptance and success of autonomous cars. The scenario-based training and testing of such self-driving vehicles in virtual driving simulation environments has increasingly gained attention in the ...
An intuitive approach to determine test adequacy in safety-critical software
Safety-critical software must adhere to stringent quality standards and is expected to be thoroughly tested. However, exhaustive testing of software is usually impractical. The two main challenges faced by a software testing team are generation of ...
Comments