论文标题

使用搜索引擎查询量预测日前的股票回报:梯度提升决策树的应用到标准普尔100

Predicting Day-Ahead Stock Returns using Search Engine Query Volumes: An Application of Gradient Boosted Decision Trees to the S&P 100

论文作者

Bockel-Rickermann, Christopher

论文摘要

互联网改变了我们的生活,工作和做出决定的方式。由于它是研究的主要现代资源,因此互联网使用中的详细数据显示了大量的行为信息。本文旨在回答一个问题,是否可以促进此信息以预测金融资本市场上股票的未来回报。在经验分析中,它实现了梯度的促进决策树,以了解标准普尔100指数中股票回报异常的关系与从历史财务数据中得出的滞后预测变量,以及Internet搜索引擎上的搜索术语查询量。模型预测,日期股票收益的发生率超过指数中位数。在2005年至2017年的时间范围内,所有不同数据集都会显示出宝贵的信息。评估的模型在接收器工作特性下的平均面积在54.2%至56.7%之间,显然表明分类比随机猜测更好。实施简单的统计套利策略,使用模型来创建十个股票的每日交易投资组合,并在交易成本之前的年度表现超过57%。随着不同数据集的合奏加上绩效排名,结果进一步质疑了现代金融资本市场的薄弱形式和半强度效率。即使不包括交易成本,该方法也增加了现有文献。它为如何使用和转换有关互联网使用行为的数据进行指导,以进行财务和经济建模和预测。

The internet has changed the way we live, work and take decisions. As it is the major modern resource for research, detailed data on internet usage exhibits vast amounts of behavioral information. This paper aims to answer the question whether this information can be facilitated to predict future returns of stocks on financial capital markets. In an empirical analysis it implements gradient boosted decision trees to learn relationships between abnormal returns of stocks within the S&P 100 index and lagged predictors derived from historical financial data, as well as search term query volumes on the internet search engine Google. Models predict the occurrence of day-ahead stock returns in excess of the index median. On a time frame from 2005 to 2017, all disparate datasets exhibit valuable information. Evaluated models have average areas under the receiver operating characteristic between 54.2% and 56.7%, clearly indicating a classification better than random guessing. Implementing a simple statistical arbitrage strategy, models are used to create daily trading portfolios of ten stocks and result in annual performances of more than 57% before transaction costs. With ensembles of different data sets topping up the performance ranking, the results further question the weak form and semi-strong form efficiency of modern financial capital markets. Even though transaction costs are not included, the approach adds to the existing literature. It gives guidance on how to use and transform data on internet usage behavior for financial and economic modeling and forecasting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源