论文标题

主要成分分析和因素分析用于信用评级中的特征选择

Principal Component Analysis and Factor Analysis for Feature Selection in Credit Rating

论文作者

Yang, Shenghuan, Florescu, lonut, Islam, Md Tariqul

论文摘要

信用评级是对公司信用风险的评估,该信用风险重视偿还债务并预测债务人违约的可能性的能力。有各种影响信用评级的功能。因此,必须选择实质性功能以探索信用评级更改的主要原因是至关重要的。为了解决这个问题,本文利用了主要成分分析和因子分析作为特征选择算法来选择重要特征,将相似的特征汇总在一起,并获得了四个部门,金融部门,能源部门,卫生保健部门,消费者批准部门的最低特征。本文使用了两个数据集:财务比率和资产负债表,并使用两个映射,详细的映射和粗制映射,将目标变量(信用等级)转换为分类变量。为了测试信用评级预测的准确性,使用随机森林分类器来测试和火车特征集。结果表明,财务比率功能集的准确性高于资产负债表功能集的精度。此外,因子分析可以大大减少特征的数量,以获得几乎相同的精度,这些精度可以大大减少分析数据所花费的时间。我们还通过分别利用因素分析来概括了七个主要因素和十个主要因素,从而影响了财务比率和资产负债表的信用评级变化,这可以更好地解释信用评级变化的原因。

The credit rating is an evaluation of a company's credit risk that values the ability to pay back the debt and predict the likelihood of the debtor defaulting. There are various features influencing credit rating. Therefore, it is essential to select substantive features to explore the main reason for credit rating change. To address this issue, this paper exploited Principal Component Analysis and Factor Analysis as feature selection algorithms to select important features, summarized the similar features together, and obtained a minimum set of features for four sectors, Financial Sector, Energy Sector, Health Care Sector, Consumer Discretionary Sector. This paper used two data sets, Financial Ratio and Balance Sheet, with two mappings, Detailed Mapping, and Coarse Mapping, converting the target variable(credit rating) into categorical variable. To test the accuracy of credit rating prediction, Random Forest Classifier was used to test and train feature sets. The results showed that the accuracy of Financial Ratio feature sets was higher than that of Balance Sheet feature sets. In addition, Factor Analysis can reduce the number of features significantly to obtain almost the same accuracy that can decrease dramatically the time spent on analyzing data; we also summarized seven dominant factors and ten dominant factors affecting credit rating change in Financial Ratio and Balance Sheet by utilizing Factor Analysis, respectively, which can explain the reason of credit rating change better.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源