有限的未来MS-TCN ++用于手术手术识别

论文标题

有限的未来MS-TCN ++用于手术手术识别

Bounded Future MS-TCN++ for surgical gesture recognition

论文作者

Goldbraikh, Adam, Avisdris, Netanell, Pugh, Carla M., Laufer, Shlomi

论文摘要

In recent times there is a growing development of video based applications for surgical purposes. Part of these applications can work offline after the end of the procedure, other applications must react immediately. However, there are cases where the response should be done during the procedure but some delay is acceptable. In the literature, the online-offline performance gap is known. Our goal in this study was to learn the performance-delay trade-off and design an MS-TCN++-based algorithm that can utilize this trade-off.为此，我们使用了开放手术模拟数据集，其中包含24个参与者的96个视频，这些视频在可变的组织模拟器上执行缝合任务。 In this study, we used video data captured from the side view. The Networks were trained to identify the performed surgical gestures.幼稚的方法是减少MS-TCN ++深度，结果减少了接受场，并且还减少了所需的未来帧数。 We showed that this method is sub-optimal, mainly in the small delay cases. The second method was to limit the accessible future in each temporal convolution. This way, we have flexibility in the network design and as a result, we achieve significantly better performance than in the naive approach.

In recent times there is a growing development of video based applications for surgical purposes. Part of these applications can work offline after the end of the procedure, other applications must react immediately. However, there are cases where the response should be done during the procedure but some delay is acceptable. In the literature, the online-offline performance gap is known. Our goal in this study was to learn the performance-delay trade-off and design an MS-TCN++-based algorithm that can utilize this trade-off. To this aim, we used our open surgery simulation data-set containing 96 videos of 24 participants that perform a suturing task on a variable tissue simulator. In this study, we used video data captured from the side view. The Networks were trained to identify the performed surgical gestures. The naive approach is to reduce the MS-TCN++ depth, as a result, the receptive field is reduced, and also the number of required future frames is also reduced. We showed that this method is sub-optimal, mainly in the small delay cases. The second method was to limit the accessible future in each temporal convolution. This way, we have flexibility in the network design and as a result, we achieve significantly better performance than in the naive approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题