通过转移学习和K-均值聚类进行无监督的机器学习，以对材料进行分类图像数据

论文标题

通过转移学习和K-均值聚类进行无监督的机器学习，以对材料进行分类图像数据

Unsupervised machine learning via transfer learning and k-means clustering to classify materials image data

论文作者

Cohn, Ryan, Holm, Elizabeth

论文摘要

无监督的机器学习为从未标记的数据集中提取知识并获得最大的机器学习性能提供了重要的机会。本文演示了如何在流行的微观结构数据集中构建，使用和评估高性能无监督的机器学习系统。东北大学钢表面缺陷数据库包括在热卷钢上观察到的六个不同缺陷的显微照片，该格式方便训练和评估图像分类模型。我们使用在自然图像的成像网数据集上预先训练的VGG16卷积神经网络来提取每个显微照片的特征表示。在应用主成分分析以从特征描述符中提取信号后，我们使用K-均值聚类来对图像进行分类，而无需标记的训练数据。该方法可实现$ 99.4 \％\ pm 0.16 \％$精度，并且可以使用所得模型对新图像进行分类而无需重新培训，与先前的研究相比，性能和实用性都有所改善。进行灵敏度分析是为了更好地了解每个步骤对分类性能的影响。结果为将无监督的机器学习技术应用于材料科学感兴趣的问题提供了见识。

Unsupervised machine learning offers significant opportunities for extracting knowledge from unlabeled data sets and for achieving maximum machine learning performance. This paper demonstrates how to construct, use, and evaluate a high performance unsupervised machine learning system for classifying images in a popular microstructural dataset. The Northeastern University Steel Surface Defects Database includes micrographs of six different defects observed on hot-rolled steel in a format that is convenient for training and evaluating models for image classification. We use the VGG16 convolutional neural network pre-trained on the ImageNet dataset of natural images to extract feature representations for each micrograph. After applying principal component analysis to extract signal from the feature descriptors, we use k-means clustering to classify the images without needing labeled training data. The approach achieves $99.4\% \pm 0.16\%$ accuracy, and the resulting model can be used to classify new images without retraining This approach demonstrates an improvement in both performance and utility compared to a previous study. A sensitivity analysis is conducted to better understand the influence of each step on the classification performance. The results provide insight toward applying unsupervised machine learning techniques to problems of interest in materials science.

下载PDF全文

下载文献需遵守相关版权规定

论文标题