从稀疏数据集中学习：通过机器学习预测混凝土的强度

论文标题

从稀疏数据集中学习：通过机器学习预测混凝土的强度

Learning from Sparse Datasets: Predicting Concrete's Strength by Machine Learning

论文作者

Ouyang, Boya, Li, Yuhai, Song, Yu, Wu, Feishu, Yu, Huizi, Wang, Yongzhe, Bauchy, Mathieu, Sant, Gaurav

论文摘要

尽管过去几十年来建立具体比例和力量之间的关系巨大的努力，但仍缺乏基于知识的基于知识的模型，以实现准确的具体强度预测。作为物理或化学模型的替代方法，数据驱动的机器学习（ML）方法为此问题提供了新的解决方案。尽管这种方法有望在混凝土混合物比例和强度之间处理复杂的，非线性的，非添加的关系，但ML的主要局限性在于，模型训练需要大的数据集。这是一个关注的问题，因为可靠，一致的强度数据受到限制，尤其是对于现实的工业混凝土而言。在这里，基于对工业生产的混凝土测得的抗压强度的大数据集（> 10,000个观察结果）的分析，我们比较了选择的ML算法“学习”如何可靠地预测具体强度作为数据集大小的函数的能力。基于这些结果，我们讨论了给定模型最终的准确性（在大型数据集中训练时）与培训该模型实际需要多少数据之间的竞争。

Despite enormous efforts over the last decades to establish the relationship between concrete proportioning and strength, a robust knowledge-based model for accurate concrete strength predictions is still lacking. As an alternative to physical or chemical-based models, data-driven machine learning (ML) methods offer a new solution to this problem. Although this approach is promising for handling the complex, non-linear, non-additive relationship between concrete mixture proportions and strength, a major limitation of ML lies in the fact that large datasets are needed for model training. This is a concern as reliable, consistent strength data is rather limited, especially for realistic industrial concretes. Here, based on the analysis of a large dataset (>10,000 observations) of measured compressive strengths from industrially-produced concretes, we compare the ability of select ML algorithms to "learn" how to reliably predict concrete strength as a function of the size of the dataset. Based on these results, we discuss the competition between how accurate a given model can eventually be (when trained on a large dataset) and how much data is actually required to train this model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题