论文标题
是什么使这种测试片?确定负责测试片状的课程
What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness
论文作者
论文摘要
片状测试定义为通过与代码相同版本的间歇性传递和间歇性失败表现出非确定性行为的测试。这些测试与浪费开发人员的时间并打破对回归测试的信任的虚假警报的连续集成在一起。为了减轻片状的影响,研究人员和工业专家都提出了检测和隔离片状测试的策略和工具。但是,随着开发人员难以本地化和理解其原因,片状测试很少是固定的。此外,使用大型代码库的开发人员通常需要了解非确定性的来源,以保留代码质量,即避免引入与非确定性行为相关的技术债务,并避免引入新的片状测试。为了帮助完成这些任务,我们建议将故障定位技术重新定位到片状组件本地化问题,即引起片状测试的非确定性行为的精确计划类别。特别是,我们采用了基于频谱的故障定位(SBFL),这是一种基于覆盖的故障定位技术,通常以其简单性和有效性而采用。我们还利用其他数据源,例如更改历史记录和静态代码指标,进一步改善本地化。我们的结果表明,在26%和47%的案例中,将SBFL通过变更和代码指标在前1名和前5个建议中排名片状。总体而言,我们成功地减少了检查的平均班级数量,以将第一片片状班级定位到片状测试覆盖的类总数的19%。我们的结果还表明,本地化方法在主要片段类别中有效,例如并发和异步等待,表明它们的一般能力识别片状组件。
Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers' time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e., avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e., pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components.