请，不要忘记寻求最新状态时的差异和置信区间

论文标题

请，不要忘记寻求最新状态时的差异和置信区间

Please, Don't Forget the Difference and the Confidence Interval when Seeking for the State-of-the-Art Status

论文作者

Bestgen, Yves

论文摘要

本文认为，在比较NLP系统性能而不是最先进的状态（SOTA）和统计显着性测试的最广泛使用置信区间。他们的主要好处是提请人们注意两个系统之间的性能差异，并帮助评估一个系统的优越程度。两种案例研究，一种比较了几个系统，另一个基于K折的交叉验证程序进行了比较，这说明了这些好处。 Python模块可在PYPI上自由使用，用于获得这些置信区间以及实施配对样品的Fisher-Pitman测试的第二个功能。

This paper argues for the widest possible use of bootstrap confidence intervals for comparing NLP system performances instead of the state-of-the-art status (SOTA) and statistical significance testing. Their main benefits are to draw attention to the difference in performance between two systems and to help assessing the degree of superiority of one system over another. Two cases studies, one comparing several systems and the other based on a K-fold cross-validation procedure, illustrate these benefits. A python module for obtaining these confidence intervals as well as a second function implementing the Fisher-Pitman test for paired samples are freely available on PyPi.

下载PDF全文

下载文献需遵守相关版权规定

论文标题