融合：通过多尺度功能融合进行有效的人群计数

论文标题

融合：通过多尺度功能融合进行有效的人群计数

FusionCount: Efficient Crowd Counting via Multiscale Feature Fusion

论文作者

Ma, Yiming, Sanchez, Victor, Guha, Tanaya

论文摘要

最先进的人群计数模型遵循编码器描述器的方法。图像首先由编码器处理以提取功能。然后，为了说明透视失真，最高级别的特征映射被馈送到额外的组件以提取多尺度功能，这是解码器的输入以生成人群密度。但是，在这些方法中，编码期间早期提取的功能不足，并且多尺度模块只能捕获有限的接收场，尽管计算成本相当大。本文提出了一种新颖的人群计数体系结构（FusionCount），该架构利用了绝大多数编码功能的适应性融合，而不是依靠其他提取组件来获得多尺度功能。因此，它可以涵盖更广泛的接受场大小的范围，并降低计算成本。我们还引入了一个新的频道减少块，该模块可以在解码过程中提取显着信息并进一步提高模型的性能。两个基准数据库上的实验表明，我们的模型以降低的计算复杂性实现了最先进的结果。

State-of-the-art crowd counting models follow an encoder-decoder approach. Images are first processed by the encoder to extract features. Then, to account for perspective distortion, the highest-level feature map is fed to extra components to extract multiscale features, which are the input to the decoder to generate crowd densities. However, in these methods, features extracted at earlier stages during encoding are underutilised, and the multiscale modules can only capture a limited range of receptive fields, albeit with considerable computational cost. This paper proposes a novel crowd counting architecture (FusionCount), which exploits the adaptive fusion of a large majority of encoded features instead of relying on additional extraction components to obtain multiscale features. Thus, it can cover a more extensive scope of receptive field sizes and lower the computational cost. We also introduce a new channel reduction block, which can extract saliency information during decoding and further enhance the model's performance. Experiments on two benchmark databases demonstrate that our model achieves state-of-the-art results with reduced computational complexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题