论文标题
GGVIT:face2face面部重演检测中的多式视觉变压器网络
GGViT:Multistream Vision Transformer Network in Face2Face Facial Reenactment Detection
论文作者
论文摘要
在社交网络上发现操纵的面部图像和视频是要解决的紧迫问题。社交媒体上视频的压缩破坏了一些可以用于检测伪造的像素细节。因此,在质量不同的视频中发现操纵的面孔至关重要。我们提出了一个名为GGVIT的新的多流网络体系结构,该网络架构利用全局信息来改善模型的概括。 VIT提取的整个面的嵌入将指导每个流网络。通过大量实验,我们证明了我们提出的模型在FF ++数据集上实现了最新的分类精度,并且在不同的压缩率方案上得到了极大的改善。 RAW/C23,RAW/C40和C23/C40的精度分别提高了24.34%,15.08%和10.14%。
Detecting manipulated facial images and videos on social networks has been an urgent problem to be solved. The compression of videos on social media has destroyed some pixel details that could be used to detect forgeries. Hence, it is crucial to detect manipulated faces in videos of different quality. We propose a new multi-stream network architecture named GGViT, which utilizes global information to improve the generalization of the model. The embedding of the whole face extracted by ViT will guide each stream network. Through a large number of experiments, we have proved that our proposed model achieves state-of-the-art classification accuracy on FF++ dataset, and has been greatly improved on scenarios of different compression rates. The accuracy of Raw/C23, Raw/C40 and C23/C40 was increased by 24.34%, 15.08% and 10.14% respectively.