论文标题
一种用于自动数据叠加的数据驱动方法,该方法与软物质科学中的应用
A Data-Driven Method for Automated Data Superposition with Applications in Soft Matter Science
论文作者
论文摘要
与内部参数自相似性的数据集的叠加是一种长期且广泛的技术,用于分析物理科学中多种类型的实验数据。通常,此叠加是手动执行的,或者最近是通过几种自动化算法之一进行的。但是,这些方法通常是启发式性的,可以通过手动数据转移或参数化来偏向用户偏见,并且缺乏处理数据中的数据和最终模型的本地框架。在这项工作中,我们开发了一种数据驱动的非参数方法,用于使用任意坐标转换的超级实验数据,该方法采用高斯过程回归来学习描述数据的统计模型,然后使用最大的后验估计来最佳地超过数据集。该统计框架对实验噪声是可靠的,并且会自动为学习的坐标转换产生不确定性估计。此外,它与黑框机器学习的可解释性不同 - 特别是,它会产生一个可能会受到质疑以深入了解所研究系统的模型。我们通过将其应用于四个代表性数据集来证明我们方法的这些显着特征,这些数据集表征了软材料的机制。在每种情况下,我们的方法都会复制使用其他方法获得的结果,但偏差减少和不确定性估计值。该方法可以在许多领域中对自相似数据进行标准化的统计处理,从而产生可解释的数据驱动模型,这些模型可能会为材料分类,设计和发现等应用程序提供信息。
The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently by one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, non-parametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise, and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability -- specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.