论文标题

软件工程会议中的软件伪像开采:荟萃分析

Software Artifact Mining in Software Engineering Conferences: A Meta-Analysis

论文作者

Khalil, Zeinab Abou, Zacchiroli, Stefano

论文摘要

背景:软件开发会产生各种类型的工件:源代码,版本控制系统元数据,错误报告,邮寄列表对话,测试数据等。经验软件工程(ESE)已经蓬勃发展,促进了挖掘这些工件以发现软件开发的内在工作,并改善了其实践。但是,在现场研究了哪些工具是一个移动的目标,我们在本文中进行了经验研究。我们使用自然语言处理(NLP)技术来表征最经常被挖掘的软件工件类型,并且在16年期间(2004-2020)进行了演变。我们分析最经常被挖掘在一起的人工制品类型的组合,以及研究目的和开采的人工制品之间的关系。重新出现:(我们发现:(1)挖掘发生在绝大多数分析的论文中,(2)源代码和测试数据是最挖掘的工件的兴趣,包括越来越多的货物,(3),(3)既有矿床,则(3),(3)4种构造(4)(4),(4)是4个。并使用所有可能的经验信号来支持该目标。

Background: Software development results in the production of various types of artifacts: source code, version control system metadata, bug reports, mailing list conversations, test data, etc. Empirical software engineering (ESE) has thrived mining those artifacts to uncover the inner workings of software development and improve its practices. But which artifacts are studied in the field is a moving target, which we study empirically in this paper.Aims: We quantitatively characterize the most frequently mined and co-mined software artifacts in ESE research and the research purposes they support.Method: We conduct a meta-analysis of artifact mining studies published in 11 top conferences in ESE, for a total of 9621 papers. We use natural language processing (NLP) techniques to characterize the types of software artifacts that are most often mined and their evolution over a 16-year period (2004-2020). We analyze the combinations of artifact types that are most often mined together, as well as the relationship between study purposes and mined artifacts.Results: We find that: (1) mining happens in the vast majority of analyzed papers, (2) source code and test data are the most mined artifacts, (3) there is an increasing interest in mining novel artifacts, together with source code, (4) researchers are most interested in the evaluation of software systems and use all possible empirical signals to support that goal.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源