探索特征密度在估算机器学习分类器性能中的潜力，并应用于网络欺凌检测

论文标题

探索特征密度在估算机器学习分类器性能中的潜力，并应用于网络欺凌检测

Exploring the Potential of Feature Density in Estimating Machine Learning Classifier Performance with Application to Cyberbullying Detection

论文作者

Eronen, Juuso, Ptaszynski, Michal, Masui, Fumito, Leliwa, Gniewosz, Wroczynski, Michal

论文摘要

在这项研究中。我们在训练之前分析了特征密度（HD）的潜力（HD），以估算机器学习（ML）分类器的性能。该研究的目的是协助解决ML模型的资源密集型培训问题，由于不断增加数据集大小以及深神经网络（DNN）的普及，这成为一个严重的问题。对更强大的计算资源需求不断增加的问题也影响了环境，因为培训大规模的ML模型正在引起令人震惊的二氧化碳，排放。我们的1S方法是优化对自然语言处理ML模型的资源密集型培训，以减少所需的实验迭代次数。我们扩展了以前通过FD提高分类器培训效率的尝试，同时还可以深入了解对话框分类的各种语言支持的特征预处理方法，特别是网络欺凌检测。

In this research. we analyze the potential of Feature Density (HD) as a way to comparatively estimate machine learning (ML) classifier performance prior to training. The goal of the study is to aid in solving the problem of resource-intensive training of ML models which is becoming a serious issue due to continuously increasing dataset sizes and the ever rising popularity of Deep Neural Networks (DNN). The issue of constantly increasing demands for more powerful computational resources is also affecting the environment, as training large-scale ML models are causing alarmingly-growing amounts of CO2, emissions. Our approach 1s to optimize the resource-intensive training of ML models for Natural Language Processing to reduce the number of required experiments iterations. We expand on previous attempts on improving classifier training efficiency with FD while also providing an insight to the effectiveness of various linguistically-backed feature preprocessing methods for dialog classification, specifically cyberbullying detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题