论文标题

通过合成和轻量级错误反馈进行输入维修

Input Repair via Synthesis and Lightweight Error Feedback

论文作者

Kirschner, Lukas, Soremekun, Ezekiel, Gopinath, Rahul, Zeller, Andreas

论文摘要

通常,输入数据表面上可能符合给定的输入格式,但不能通过符合程序来解析,例如,由于人为错误或数据损坏。在这种情况下,数据工程师的任务是输入维修,即,她必须手动修复损坏的数据,以便遵循给定的格式,因此可以通过符合程序来处理。这种手动修理可能耗时且容易出错。特别是,如果没有输入规范(例如,输入语法)或程序分析,输入维修具有挑战性。 在这项工作中,我们表明,将轻质故障反馈(例如,输入不完整)纳入解析器足以修复任何损坏的输入数据,并最亲密地与输入数据的语义接近。我们提出了一种方法(称为fsynth),该方法利用轻巧的错误反馈和输入合成来修复无效的输入。 fsynth是语法敏捷的,它不需要程序分析。给定一个符合程序和任何无效的输入,Fsynth提供了一组维修,该维修优先考虑了维修距离与原始输入的距离。我们使用四种众所周知的输入格式(即INI,TinyC,Sexp和Cjson)评估了806(现实世界)的Fsynth。在我们的评估中,我们发现Fsonth恢复了91%的有效输入数据。 Fsynth在输入维修​​方面也非常有效且有效:它在四分钟内维修了77%的无效输入。它比以前最著名的方法DDMAX高35%。总体而言,我们的方法在可以修复的内容以及提供的一组维修方面都解决了DDMAX的几个局限性。

Often times, input data may ostensibly conform to a given input format, but cannot be parsed by a conforming program, for instance, due to human error or data corruption. In such cases, a data engineer is tasked with input repair, i.e., she has to manually repair the corrupt data such that it follows a given format, and hence can be processed by the conforming program. Such manual repair can be time-consuming and error-prone. In particular, input repair is challenging without an input specification (e.g., input grammar) or program analysis. In this work, we show that incorporating lightweight failure feedback (e.g., input incompleteness) to parsers is sufficient to repair any corrupt input data with maximal closeness to the semantics of the input data. We propose an approach (called FSYNTH) that leverages lightweight error-feedback and input synthesis to repair invalid inputs. FSYNTH is grammar-agnostic, and it does not require program analysis. Given a conforming program, and any invalid input, FSYNTH provides a set of repairs prioritized by the distance of the repair from the original input. We evaluate FSYNTH on 806 (real-world) invalid inputs using four well-known input formats, namely INI, TinyC, SExp, and cJSON. In our evaluation, we found that FSYNTH recovers 91% of valid input data. FSYNTH is also highly effective and efficient in input repair: It repairs 77% of invalid inputs within four minutes. It is up to 35% more effective than DDMax, the previously best-known approach. Overall, our approach addresses several limitations of DDMax, both in terms of what it can repair, as well as in terms of the set of repairs offered.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源