深度API学习重新审视

论文标题

深度API学习重新审视

Deep API Learning Revisited

论文作者

Martin, James, Guo, Jin L. C.

论文摘要

了解正确的API使用序列是程序员使用陌生库时最重要的任务之一。但是，由于API文档质量差或基于查询的搜索策略质量较差，程序员经常会遇到障碍来找到适当的信息。为了帮助解决这个问题，研究人员提出了各种方法，建议给定自然语言查询的API顺序，代表了程序员的信息需求。在这种努力中，Gu等人。采用了一种深度学习方法，尤其是RNN编码器架构来执行此任务，并在Java中获得了有希望的结果。在这项工作中，我们旨在重现他们的结果，并在Python中应用相同的API方法。此外，我们将最新的基于变压器的方法（即Codebert）进行比较。我们的实验揭示了执行仔细数据清洁时的性能度量明显下降。由于从大量源代码文件和有效编码技术进行了预处理，Codebert在很大程度上优于Gu等人的方法。

Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either poor quality of API documentation or ineffective query-based searching strategy. To help solve this issue, researchers have proposed various methods to suggest the sequence of APIs given natural language queries representing the information needs from programmers. Among such efforts, Gu et al. adopted a deep learning method, in particular an RNN Encoder-Decoder architecture, to perform this task and obtained promising results on common APIs in Java. In this work, we aim to reproduce their results and apply the same methods for APIs in Python. Additionally, we compare the performance with a more recent Transformer-based method, i.e., CodeBERT, for the same task. Our experiment reveals a clear drop in performance measures when careful data cleaning is performed. Owing to the pretraining from a large number of source code files and effective encoding technique, CodeBERT outperforms the method by Gu et al., to a large extent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题