可解释的文本分类的因果特征选择，并缩小尺寸

论文标题

可解释的文本分类的因果特征选择，并缩小尺寸

Causal Feature Selection with Dimension Reduction for Interpretable Text Classification

论文作者

Shan, Guohou, Foulds, James, Pan, Shimei

论文摘要

与班级标签相关但没有直接引起的文本功能有时可以预测，但它们可能不会有见地。作为传统相关基础选择的替代方法，因果推理可以揭示更有原则的，有意义的关系的特征和标签。为了帮助研究人员了解文本数据，例如对于社交科学诉讼，在本文中，我们研究了一类基于匹配的因果推理方法，即尾部特征选择。文档分类中使用的特征通常是高维，但是有因果特征选择方法使用倾向得分匹配（PSM），该方法在高维空间中有效。我们提出了一个新的因果特征选择框架，该框架与因果推断相缩小，以改善文本特征选择。实验在整个合成和现实世界中的数据展示了我们在改善分类和增强可解释性方面的方法的希望。

Text features that are correlated with class labels, but do not directly cause them, are sometimesuseful for prediction, but they may not be insightful. As an alternative to traditional correlation-basedfeature selection, causal inference could reveal more principled, meaningful relationships betweentext features and labels. To help researchers gain insight into text data, e.g. for social scienceapplications, in this paper we investigate a class of matching-based causal inference methods fortext feature selection. Features used in document classification are often high dimensional, howeverexisting causal feature selection methods use Propensity Score Matching (PSM) which is known to beless effective in high-dimensional spaces. We propose a new causal feature selection framework thatcombines dimension reduction with causal inference to improve text feature selection. Experiments onboth synthetic and real-world data demonstrate the promise of our methods in improving classificationand enhancing interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题