通过注意文件上下文改进了子例程的自动汇总

论文标题

通过注意文件上下文改进了子例程的自动汇总

Improved Automatic Summarization of Subroutines via Attention to File Context

论文作者

Haque, Sakib, LeClair, Alexander, Wu, Lingfei, McMillan, Collin

论文摘要

软件文档在很大程度上由软件中子例程的简短自然语言摘要组成。这些摘要可帮助程序员迅速理解子例程所做的事情，而不必阅读源代码。编写这些描述的任务称为“源代码摘要”，已经成为研究的目标已有几年了。最近，基于AI的方法取代了较旧的基于启发式的方法。但是，至今这些基于AI的方法假定预测摘要所需的所有内容都在子例程本身内。此假设限制了性能，因为如果没有周围的上下文，就无法理解许多子例程。在本文中，我们提出了一种方法，该方法对子例程的文件上下文进行了建模（即同一文件中的其他子例程），并使用注意机制来查找用于摘要中的单词和概念。我们在一个实验中表明，我们的方法扩展并改善了最近的几个基线。

Software documentation largely consists of short, natural language summaries of the subroutines in the software. These summaries help programmers quickly understand what a subroutine does without having to read the source code him or herself. The task of writing these descriptions is called "source code summarization" and has been a target of research for several years. Recently, AI-based approaches have superseded older, heuristic-based approaches. Yet, to date these AI-based approaches assume that all the content needed to predict summaries is inside subroutine itself. This assumption limits performance because many subroutines cannot be understood without surrounding context. In this paper, we present an approach that models the file context of subroutines (i.e. other subroutines in the same file) and uses an attention mechanism to find words and concepts to use in summaries. We show in an experiment that our approach extends and improves several recent baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题