论文标题
设计选择和机器学习模型表演
Design choice and machine learning model performances
论文作者
论文摘要
越来越多的出版物介绍了实验设计(DOE)和机器学习(ML)的共同应用,以收集和分析特定工业现象的数据。但是,文献表明,对于数据分析的数据收集和模型的设计通常不受统计或算法的优势的驱动,因此缺乏研究提供有关与数据收集和分析共同使用的设计和ML模型的指南。本文讨论了与ML模型性能有关的设计选择。进行了一项研究,该研究考虑了12种实验设计,7个预测模型家族,7种模拟物理过程的测试函数以及8个噪声设置,包括均匀的噪声和异形。研究结果可能会对从业者的工作产生直接影响,从而为DOE和ML的实际应用提供指南。
An increasing number of publications present the joint application of Design of Experiments (DOE) and machine learning (ML) as a methodology to collect and analyze data on a specific industrial phenomenon. However, the literature shows that the choice of the design for data collection and model for data analysis is often not driven by statistical or algorithmic advantages, thus there is a lack of studies which provide guidelines on what designs and ML models to jointly use for data collection and analysis. This article discusses the choice of design in relation to the ML model performances. A study is conducted that considers 12 experimental designs, 7 families of predictive models, 7 test functions that emulate physical processes, and 8 noise settings, both homoscedastic and heteroscedastic. The results of the research can have an immediate impact on the work of practitioners, providing guidelines for practical applications of DOE and ML.