论文标题
SketchFormer:基于变压器的素描结构表示
Sketchformer: Transformer-based Representation for Sketched Structure
论文作者
论文摘要
SketchFormer是一种基于新颖的变压器表示形式,用于以向量形式编码自由绘制的草图输入,即作为一系列笔触。 SketchFormer有效地解决了多个任务:草图分类,基于草图的图像检索(SBIR)以及草图的重建和插值。我们报告了几种探索连续和令牌输入表示形式的变体,并对比它们的性能。当与LSTM序列驱动到序列体系结构驱动的基线表示时,我们学到的嵌入在词典学习令牌化方案的驱动下,在分类和图像检索任务中产生最先进的表现:SketchRNN和衍生物。我们表明,通过用较长的中风序列的复杂草图嵌入素描形式的嵌入,可以显着改善草图重建和插值。
Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively addresses multiple tasks: sketch classification, sketch based image retrieval (SBIR), and the reconstruction and interpolation of sketches. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks, when compared against baseline representations driven by LSTM sequence to sequence architectures: SketchRNN and derivatives. We show that sketch reconstruction and interpolation are improved significantly by the Sketchformer embedding for complex sketches with longer stroke sequences.