锻炼表格评估的领域知识知识的自我监督表示

论文标题

锻炼表格评估的领域知识知识的自我监督表示

Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment

论文作者

Parmar, Paritosh, Gharat, Amol, Rhodin, Helge

论文摘要

在锻炼时保持适当的形式对于防止伤害和最大化肌肉质量增益很重要。检测锻炼中的错误自然需要估计人体姿势。但是，由于摄像机角度，健身器材，照明和服装等因素，现成的姿势估计量很难在健身房场景中录制的视频中表现出色。为了加剧问题，锻炼中要检测到的错误非常微妙。为此，我们建议从未标记的样本中学习面向运动的图像和视频表示形式，以使专家注释的小数据集足以进行监督错误检测。特别是，我们的领域知识知识的自我监督方法（姿势对比度学习和运动解开）利用了运动动作的谐波运动，并利用摄像机角度，衣服和照明的较大差异来学习强大的表现。 To facilitate our self-supervised pretraining, and supervised finetuning, we curated a new exercise dataset, \emph{Fitness-AQA} (\url{https://github.com/ParitoshParmar/Fitness-AQA}), comprising of three exercises: BackSquat, BarbellRow, and OverheadPress.专家培训师已经对其进行了注释，以解决多个关键且通常发生的运动错误。实验结果表明，我们的自我监督表示形式优于现成的2D和3D置式估计器以及其他几个基线。我们还表明，我们的方法可以应用于其他领域/任务，例如姿势估计和潜水质量评估。

Maintaining proper form while exercising is important for preventing injuries and maximizing muscle mass gains. Detecting errors in workout form naturally requires estimating human's body pose. However, off-the-shelf pose estimators struggle to perform well on the videos recorded in gym scenarios due to factors such as camera angles, occlusion from gym equipment, illumination, and clothing. To aggravate the problem, the errors to be detected in the workouts are very subtle. To that end, we propose to learn exercise-oriented image and video representations from unlabeled samples such that a small dataset annotated by experts suffices for supervised error detection. In particular, our domain knowledge-informed self-supervised approaches (pose contrastive learning and motion disentangling) exploit the harmonic motion of the exercise actions, and capitalize on the large variances in camera angles, clothes, and illumination to learn powerful representations. To facilitate our self-supervised pretraining, and supervised finetuning, we curated a new exercise dataset, \emph{Fitness-AQA} (\url{https://github.com/ParitoshParmar/Fitness-AQA}), comprising of three exercises: BackSquat, BarbellRow, and OverheadPress. It has been annotated by expert trainers for multiple crucial and typically occurring exercise errors. Experimental results show that our self-supervised representations outperform off-the-shelf 2D- and 3D-pose estimators and several other baselines. We also show that our approaches can be applied to other domains/tasks such as pose estimation and dive quality assessment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题