通过神经网络预测未来的世界活动

论文标题

通过神经网络预测未来的世界活动

Forecasting Future World Events with Neural Networks

论文作者

Zou, Andy, Xiao, Tristan, Jia, Ryan, Kwon, Joe, Mazeika, Mantas, Li, Richard, Song, Dawn, Steinhardt, Jacob, Evans, Owain, Hendrycks, Dan

论文摘要

预测未来的世界事件是一项具有挑战性但有价值的任务。对气候，地缘政治冲突，大流行和经济指标的预测有助于塑造政策和决策。在这些领域中，专家人类的判断促成了最好的预测。鉴于语言建模的进步，这些预测可以自动化吗？为此，我们介绍了AutoCast，这是一个包含数千个预测问题和随附的新闻语料库的数据集。问题来自预测比赛，确保高质量，现实世界中的重要性和多样性。新闻语料库是按日期组织的，使我们能够精确模拟人类过去的预测（避免将来泄漏）的条件。我们的动机是由于数量级的预测数字的难度（例如，2022年的Covid-19的全球案例），我们还策划了Intervate Intervelqa，这是一个数字问题和校准指标的数据集。我们在预测任务上测试语言模型，并发现绩效远低于人类专家基线。但是，随着新闻语料库中相关信息的合并，绩效提高了绩效。总而言之，AutoCast对大型语言模型构成了一个新的挑战，并提高了性能可能会带来巨大的实际收益。

Forecasting future world events is a challenging but valuable task. Forecasts of climate, geopolitical conflict, pandemics and economic indicators help shape policy and decision making. In these domains, the judgment of expert humans contributes to the best forecasts. Given advances in language modeling, can these forecasts be automated? To this end, we introduce Autocast, a dataset containing thousands of forecasting questions and an accompanying news corpus. Questions are taken from forecasting tournaments, ensuring high quality, real-world importance, and diversity. The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts (avoiding leakage from the future). Motivated by the difficulty of forecasting numbers across orders of magnitude (e.g. global cases of COVID-19 in 2022), we also curate IntervalQA, a dataset of numerical questions and metrics for calibration. We test language models on our forecasting task and find that performance is far below a human expert baseline. However, performance improves with increased model size and incorporation of relevant information from the news corpus. In sum, Autocast poses a novel challenge for large language models and improved performance could bring large practical benefits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题