基于机器学习的系统中的错误：故障负载基准测试

论文标题

基于机器学习的系统中的错误：故障负载基准测试

Bugs in Machine Learning-based Systems: A Faultload Benchmark

论文作者

Morovati, Mohammad Mehdi, Nikanjam, Amin, Khomh, Foutse, Ming, Zhen, Jiang

论文摘要

在各个领域应用机器学习（ML）的快速升级导致更多关注ML组件的质量。然后，旨在提高ML组件质量并安全地将其集成到基于ML的系统中的技术和工具的增长。尽管这些工具中的大多数都使用Bug的生命周期，但没有标准的错误来评估其性能，对其进行比较并讨论其优势和弱点。在这项研究中，我们首先研究了基于ML的系统中错误的可重复性和可验证性，并显示了每个错误的最重要因素。然后，我们探索在基于ML的软件系统中生成错误基准的挑战，并提供一个错误基准缺陷4ML，即满足标准基准的所有标准，即相关性，可重复性，公平性，可验证性和可用性。该故障负载基准测试包含ML开发人员在GitHub和堆栈溢出中报告的100个错误，并使用两个最受欢迎的ML框架：Tensorflow和Keras。缺陷4ML还解决了基于ML的软件系统软件可靠性工程的重要挑战，例如：1）通过为不同版本的框架提供各种错误，通过在不同的ML框架中提供相似的错误，3）通过与所需的数据进行完整的信息和4），通过在不同的ML框架中提供相似的错误，通过提供各种错误的错误，通过在不同的ML框架中提供相似的错误，通过提供各种错误的错误，通过提供各种错误的错误，并通过提供各种错误的错误来提供依赖性和4）错误的起源。基于ML的系统从业人员和研究人员可以评估其测试工具和技术的缺陷4ML。

The rapid escalation of applying Machine Learning (ML) in various domains has led to paying more attention to the quality of ML components. There is then a growth of techniques and tools aiming at improving the quality of ML components and integrating them into the ML-based system safely. Although most of these tools use bugs' lifecycle, there is no standard benchmark of bugs to assess their performance, compare them and discuss their advantages and weaknesses. In this study, we firstly investigate the reproducibility and verifiability of the bugs in ML-based systems and show the most important factors in each one. Then, we explore the challenges of generating a benchmark of bugs in ML-based software systems and provide a bug benchmark namely defect4ML that satisfies all criteria of standard benchmark, i.e. relevance, reproducibility, fairness, verifiability, and usability. This faultload benchmark contains 100 bugs reported by ML developers in GitHub and Stack Overflow, using two of the most popular ML frameworks: TensorFlow and Keras. defect4ML also addresses important challenges in Software Reliability Engineering of ML-based software systems, like: 1) fast changes in frameworks, by providing various bugs for different versions of frameworks, 2) code portability, by delivering similar bugs in different ML frameworks, 3) bug reproducibility, by providing fully reproducible bugs with complete information about required dependencies and data, and 4) lack of detailed information on bugs, by presenting links to the bugs' origins. defect4ML can be of interest to ML-based systems practitioners and researchers to assess their testing tools and techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题