上下文代码更改的结构模型

论文标题

上下文代码更改的结构模型

A Structural Model for Contextual Code Changes

论文作者

Brody, Shaked, Alon, Uri, Yahav, Eran

论文摘要

我们解决了基于对过去编辑培训的学习模型预测编辑完成的问题。给定一个部分编辑的代码段，我们的目标是预测其余片段的编辑完成。我们将此任务称为编辑完成任务，并提出了解决该任务的新方法。主要思想是直接表示结构编辑。这使我们能够建模编辑本身的可能性，而不是学习编辑代码的可能性。我们将编辑操作表示为程序的抽象语法树（AST）中的路径，该路径源自编辑的来源到编辑的目标。使用此表示形式，我们为编辑完成任务提供了功能强大且轻巧的神经模型。我们进行了彻底的评估，将我们的方法与由LSTM，变压器和神经CRF等多种强模型驱动的各种表示和建模方法进行了比较。我们的实验表明，与学会生成编辑的代码相比，与直接建模编辑相比，我们的模型比最新的顺序模型相对的相对增益比最新的顺序模型获得了28％的相对增益。我们的代码，数据集和受过训练的模型可在https://github.com/tech-srl/c3po/上公开获取。

We address the problem of predicting edit completions based on a learned model that was trained on past edits. Given a code snippet that is partially edited, our goal is to predict a completion of the edit for the rest of the snippet. We refer to this task as the EditCompletion task and present a novel approach for tackling it. The main idea is to directly represent structural edits. This allows us to model the likelihood of the edit itself, rather than learning the likelihood of the edited code. We represent an edit operation as a path in the program's Abstract Syntax Tree (AST), originating from the source of the edit to the target of the edit. Using this representation, we present a powerful and lightweight neural model for the EditCompletion task. We conduct a thorough evaluation, comparing our approach to a variety of representation and modeling approaches that are driven by multiple strong models such as LSTMs, Transformers, and neural CRFs. Our experiments show that our model achieves a 28% relative gain over state-of-the-art sequential models and 2x higher accuracy than syntactic models that learn to generate the edited code, as opposed to modeling the edits directly. Our code, dataset, and trained models are publicly available at https://github.com/tech-srl/c3po/ .

下载PDF全文

下载文献需遵守相关版权规定

论文标题