论文标题
基于猜想的数据中的模式发现
Conjecturing-Based Discovery of Patterns in Data
论文作者
论文摘要
我们建议使用猜想的机器,该机器以涉及数值特征的非线性术语的形式提出特征关系,并为分类特征的布尔表达式提供。提出的猜想框架恢复了数据中的特征之间已知的非线性和布尔关系。在这两种情况下,都揭示了真正的潜在关系。然后,我们将方法与以前提供的框架进行比较,以符合符号回归的框架,以恢复数据集中功能之间满足的方程式。然后将该框架应用于有关COVID-19结果的患者级别数据,以提出医学文献中确认的可能风险因素。
We propose the use of a conjecturing machine that suggests feature relationships in the form of bounds involving nonlinear terms for numerical features and boolean expressions for categorical features. The proposed Conjecturing framework recovers known nonlinear and boolean relationships among features from data. In both settings, true underlying relationships are revealed. We then compare the method to a previously-proposed framework for symbolic regression on the ability to recover equations that are satisfied among features in a dataset. The framework is then applied to patient-level data regarding COVID-19 outcomes to suggest possible risk factors that are confirmed in the medical literature.