论文标题

代码检索的一代增强查询扩展

Generation-Augmented Query Expansion For Code Retrieval

论文作者

Li, Dong, Shen, Yelong, Jin, Ruoming, Mao, Yi, Wang, Kuan, Chen, Weizhu

论文摘要

预训练的语言模型已在代码检索任务中取得了有希望的成功,其中提供了自然语言文档查询以找到最相关的现有代码段。但是,现有模型仅着重于优化文档代码对,而无需将它们嵌入潜在空间,而无需外部知识的关联。在本文中,我们提出了一个生成一代的查询扩展框架。受到人类检索过程的启发 - 在搜索之前先绘制答案,在这项工作中,我们利用强大的代码生成模型来使代码检索任务受益。具体而言,我们证明,不仅是根据文档查询检索目标代码段,而是用代码生成模型中生成的代码段来增强文档查询。据我们所知,这是第一次利用代码生成模型来增强代码检索任务的尝试。我们在CodesearchNet基准测试中获得了新的最新结果,并显着超过了基线。

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源