具有变压器和焦点环境的单位测试案例生成

论文标题

具有变压器和焦点环境的单位测试案例生成

Unit Test Case Generation with Transformers and Focal Context

论文作者

Tufano, Michele, Drain, Dawn, Svyatkovskiy, Alexey, Deng, Shao Kun, Sundaresan, Neel

论文摘要

自动化单元测试案例生成工具通过建议旨在识别其代码中的缺陷的测试来促进测试驱动的开发和支持开发人员。现有方法通常以测试覆盖标准为指导，生成通常难以阅读或理解的合成测试用例。在本文中，我们提出了Athenatest，这种方法旨在通过从现实世界中的焦点方法和开发人员写的测试柜中学习来生成单位测试案例。我们将单位测试案例生成作为序列学习任务，采用了两步训练程序，该程序包括在大型无监督的Java语料库上进行预处理，并监督对生成单元测试的下游翻译任务进行监督的填充。我们研究了自然语言和源代码预处理的影响，以及围绕焦点方法的焦点上下文信息。两种技术都在验证损失方面都提供了改进，预处理产生了25％的相对改善，焦点环境可提供额外的11.1％改善。我们还介绍了Methods2test，这是Java中最大的公开监督平行库和相应的焦点方法，其中包括从GitHub的91k开源存储库开采的780K测试用例。我们在五个缺陷4J项目上评估了Athenatest，从而产生了25K通过的测试案例，覆盖了43.7％的焦点方法，只有30次尝试。我们执行测试用例，收集测试覆盖信息，并将其与EvoSuite和GPT-3生成的测试案例进行比较，发现我们的方法表现优于GPT-3，并且具有可比的覆盖范围W.R.T. evosuite。最后，我们对专业开发人员的偏好进行了调查，这些开发人员对生成测试的可读性，可理解性和测试有效性的偏好，表明对雅典台的偏爱。

Automated unit test case generation tools facilitate test-driven development and support developers by suggesting tests intended to identify flaws in their code. Existing approaches are usually guided by the test coverage criteria, generating synthetic test cases that are often difficult for developers to read or understand. In this paper we propose AthenaTest, an approach that aims to generate unit test cases by learning from real-world focal methods and developer-written testcases. We formulate unit test case generation as a sequence-to-sequence learning task, adopting a two-step training procedure consisting of denoising pretraining on a large unsupervised Java corpus, and supervised finetuning for a downstream translation task of generating unit tests. We investigate the impact of natural language and source code pretraining, as well as the focal context information surrounding the focal method. Both techniques provide improvements in terms of validation loss, with pretraining yielding 25% relative improvement and focal context providing additional 11.1% improvement. We also introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java, which comprises 780K test cases mined from 91K open-source repositories from GitHub. We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts. We execute the test cases, collect test coverage information, and compare them with test cases generated by EvoSuite and GPT-3, finding that our approach outperforms GPT-3 and has comparable coverage w.r.t. EvoSuite. Finally, we survey professional developers on their preference in terms of readability, understandability, and testing effectiveness of the generated tests, showing overwhelmingly preference towards AthenaTest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题