论文标题

计算机视觉模型的自适应测试

Adaptive Testing of Computer Vision Models

论文作者

Gao, Irena, Ilharco, Gabriel, Lundberg, Scott, Ribeiro, Marco Tulio

论文摘要

视觉模型通常会在具有共同的语义特征(例如稀有物体或异常场景)的数据组上系统地失败,但是识别这些故障模式是一个挑战。我们介绍了Adavision,这是一个用于测试视觉模型的交互过程,可帮助用户识别和修复连贯的故障模式。鉴于对连贯群体的自然语言描述,Adavision用剪辑从Laion-5b中检索相关图像。然后,用户标记了少量数据以进行模型正确性,该数据可用于连续检索到向高误区的山坡上的爬升,从而完善了组定义。一旦组饱和后,Adavision使用GPT-3建议新的组描述供用户探索。我们证明了Adavision在用户研究中的有用性和普遍性,在该方法中,用户在最新的分类,对象检测和图像字幕模型中找到了主要的错误。这些用户发现的组的故障率比自动错误聚类方法浮出水面的组高2-3倍。最后,通过Adavision发现的示例进行了填充,可以在看不见的示例进行评估时修复发现的错误,而不会降低分布精度,同时还可以提高分布外部数据集的性能。

Vision models often fail systematically on groups of data that share common semantic characteristics (e.g., rare objects or unusual scenes), but identifying these failure modes is a challenge. We introduce AdaVision, an interactive process for testing vision models which helps users identify and fix coherent failure modes. Given a natural language description of a coherent group, AdaVision retrieves relevant images from LAION-5B with CLIP. The user then labels a small amount of data for model correctness, which is used in successive retrieval rounds to hill-climb towards high-error regions, refining the group definition. Once a group is saturated, AdaVision uses GPT-3 to suggest new group descriptions for the user to explore. We demonstrate the usefulness and generality of AdaVision in user studies, where users find major bugs in state-of-the-art classification, object detection, and image captioning models. These user-discovered groups have failure rates 2-3x higher than those surfaced by automatic error clustering methods. Finally, finetuning on examples found with AdaVision fixes the discovered bugs when evaluated on unseen examples, without degrading in-distribution accuracy, and while also improving performance on out-of-distribution datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源