基于变压器的源代码摘要的方法

论文标题

基于变压器的源代码摘要的方法

A Transformer-based Approach for Source Code Summarization

论文作者

Ahmad, Wasi Uddin, Chakraborty, Saikat, Ray, Baishakhi, Chang, Kai-Wei

论文摘要

生成描述程序功能的可读摘要称为源代码摘要。在此任务中，通过建模代码令牌之间的成对关系以捕获其长期依赖性是至关重要的。为了学习汇总的代码表示形式，我们探索了使用自我发挥机制的变压器模型，并已证明可以有效捕获长期依赖性。在这项工作中，我们表明，尽管这种方法很简单，但它的表现可以超出最先进的技术。我们进行了广泛的分析和消融研究，这些研究揭示了几个重要的发现，例如，绝对编码源代码令牌的位置阻碍，而相对编码显着改善了汇总性能。我们已公开使用代码来促进未来的研究。

Generating a readable summary that describes the functionality of a program is known as source code summarization. In this task, learning code representation by modeling the pairwise relationship between code tokens to capture their long-range dependencies is crucial. To learn code representation for summarization, we explore the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies. In this work, we show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin. We perform extensive analysis and ablation studies that reveal several important findings, e.g., the absolute encoding of source code tokens' position hinders, while relative encoding significantly improves the summarization performance. We have made our code publicly available to facilitate future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题