Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow
Empirical Software Engineering ( IF 4.1 ) Pub Date : 2023-11-29 , DOI: 10.1007/s10664-023-10389-6
Florian Tambon , Amin Nikanjam , Le An , Foutse Khomh , Giuliano Antoniol

Abstract

Deep Learning (DL) frameworks are now widely used, simplifying the creation of complex models as well as their integration into various applications even among non-DL experts. However, like any other programs, they are prone to bugs. This paper deals with the subcategory of bugs named silent bugs: they lead to wrong behavior but they do not cause system crashes or hangs, nor show an error message to the user. Such bugs are even more dangerous in DL applications and frameworks due to the “black-box” and stochastic nature of the DL systems (i.e., the end user can not understand how the model makes decisions). This paper presents the first empirical study of the silent bugs in Tensorflow, specifically its high-level API Keras, and their impact on users’ programs. We extracted closed issues related to Keras API from the TensorFlow GitHub repository. Out of the 1,168 issues that we gathered, 77 were reproducible silent bugs affecting users’ programs. We categorized the bugs based on the effects on the users’ programs and the components where the issues occurred, using information from the issue reports. We then derived a threat level for each of the issues, based on the impact they had on the users’ programs. To assess the relevance of identified categories and the impact scale, we conducted an online survey with 103 DL developers. The participants generally agreed with the significant impact of silent bugs in DL frameworks and how they impact users and acknowledged our findings (i.e., categories of silent bugs and the proposed impact scale).

Graphical abstract

中文翻译：

深度学习框架中的无声错误：Keras 和 TensorFlow 的实证研究

摘要

深度学习 (DL) 框架现已广泛使用，简化了复杂模型的创建以及将其集成到各种应用程序中，甚至对于非 DL 专家也是如此。然而，像任何其他程序一样，它们很容易出现错误。本文讨论名为静默错误的错误子类别：它们会导致错误的行为，但不会导致系统崩溃或挂起，也不会向用户显示错误消息。由于深度学习系统的“黑匣子”和随机性（即最终用户无法理解模型如何做出决策），此类错误在深度学习应用程序和框架中甚至更加危险。本文首次对Tensorflow 中的静默错误（特别是其高级 API Keras）及其对用户程序的影响进行了实证研究。我们从 TensorFlow GitHub 存储库中提取了与 Keras API 相关的已关闭问题。在我们收集的 1,168 个问题中，有 77 个是影响用户程序的可重现的无声错误。我们使用问题报告中的信息，根据对用户程序的影响以及发生问题的组件对错误进行分类。然后，我们根据每个问题对用户程序的影响得出了威胁级别。为了评估已确定类别的相关性和影响范围，我们对 103 名 DL 开发人员进行了在线调查。参与者普遍同意深度学习框架中无声错误的重大影响以及它们如何影响用户，并认可我们的发现（即无声错误的类别和建议的影响范围）。

图形概要

更新日期：2023-12-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>