论文标题
使用计算语言模型预测人类心理测量学特性
Predicting Human Psychometric Properties Using Computational Language Models
论文作者
论文摘要
基于变压器的语言模型(LMS)继续在自然语言处理(NLP)基准上实现最新性能,包括旨在模仿人类启发的“常识”能力的任务。为了更好地理解可以说具有某些语言推理技能的LMS的程度,研究人员开始适应精神计量学的工具和概念。但是,在多大程度上可以使另一个方向流动呢?换句话说,当将这些项目授予人类参与者时,LM可以用来预测测试项目的心理测量特性吗?如果是这样,心理学从业者的好处是巨大的,因为它可以减少对多轮经验测试的需求。我们从众多人类参与者和LMS(基于变压器和非转化器)的反应中收集了对语言能力的广泛诊断测试。然后,我们使用人类反应来分别使用人类反应和LM反应来计算诊断测试中项目的标准心理测量特性。然后,我们确定这两组预测的相关性如何。我们发现,基于变压器的LMS在大多数类别中都很好地预测了人类心理测量数据,这表明它们可用于收集类似人类的心理测量数据,而无需进行广泛的人类试验。
Transformer-based language models (LMs) continue to achieve state-of-the-art performance on natural language processing (NLP) benchmarks, including tasks designed to mimic human-inspired "commonsense" competencies. To better understand the degree to which LMs can be said to have certain linguistic reasoning skills, researchers are beginning to adapt the tools and concepts from psychometrics. But to what extent can benefits flow in the other direction? In other words, can LMs be of use in predicting the psychometric properties of test items, when those items are given to human participants? If so, the benefit for psychometric practitioners is enormous, as it can reduce the need for multiple rounds of empirical testing. We gather responses from numerous human participants and LMs (transformer- and non-transformer-based) on a broad diagnostic test of linguistic competencies. We then use the human responses to calculate standard psychometric properties of the items in the diagnostic test, using the human responses and the LM responses separately. We then determine how well these two sets of predictions correlate. We find that transformer-based LMs predict the human psychometric data consistently well across most categories, suggesting that they can be used to gather human-like psychometric data without the need for extensive human trials.