论文标题

哪些情绪使一到五星星?通过情感分析和XAI了解在线产品评论的评分

What Emotions Make One or Five Stars? Understanding Ratings of Online Product Reviews by Sentiment Analysis and XAI

论文作者

So, Chaehan

论文摘要

当人们在线购买产品时,他们的决定主要基于在线评论中给出的其他人的建议。当前的工作通过情感分析分析了这些在线评论,并将提取的观点用作通过多种机器学习算法预测产品评分的功能。这些预测被可解释的AI(XAI)的各种METH-ODS解散,以了解该模型在预测过程中是否显示出任何偏见。研究1基准测试了这些算法(KNN,支持向量机,随机森林,梯度提升机,XGBOOST),并确定了随机森林和XGBoost,作为预测产品评分的最佳算法。在研究2中,对全球特征重要性的分析确定了情感欢乐和情感价为大多数预测特征。两种XAI可视化方法,本地特征归因和部分依赖图,揭示了实例级别的几种错误的预测机制。研究3将基准测试作为分类,确定了高度信息64.4%,这表明高级不平衡是确定问题的根本原因。总之,必须谨慎使用机器学习算法的良好性能,因为这项工作中遇到的数据集可能会偏向某些预测。这项工作证明了XAI方法如何揭示这种预测偏见。

When people buy products online, they primarily base their decisions on the recommendations of others given in online reviews. The current work analyzed these online reviews by sentiment analysis and used the extracted sentiments as features to predict the product ratings by several machine learning algorithms. These predictions were disentangled by various meth-ods of explainable AI (XAI) to understand whether the model showed any bias during prediction. Study 1 benchmarked these algorithms (knn, support vector machines, random forests, gradient boosting machines, XGBoost) and identified random forests and XGBoost as best algorithms for predicting the product ratings. In Study 2, the analysis of global feature importance identified the sentiment joy and the emotional valence negative as most predictive features. Two XAI visualization methods, local feature attributions and partial dependency plots, revealed several incorrect prediction mechanisms on the instance-level. Performing the benchmarking as classification, Study 3 identified a high no-information rate of 64.4% that indicated high class imbalance as underlying reason for the identified problems. In conclusion, good performance by machine learning algorithms must be taken with caution because the dataset, as encountered in this work, could be biased towards certain predictions. This work demonstrates how XAI methods reveal such prediction bias.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源