方程发现的概率语法

论文标题

方程发现的概率语法

Probabilistic Grammars for Equation Discovery

论文作者

Brence, Jure, Todorovski, Ljupčo, Džeroski, Sašo

论文摘要

方程发现，也称为符号回归，是一种自动建模的一种，它从观察到的数据和专家知识中以方程式表达的科学定律发现了科学定律。确定性的语法（例如无上下文语法）已被用来通过提供硬性约束来指定要考虑哪些方程和不考虑的方程来限制方程发现中的搜索空间。在本文中，我们建议在方程发现中使用无概率的语法。这样的语法编码软约束，指定了可能的方程空间上的先前概率分布。我们表明，概率语法可用于优雅而灵活地制定简短原理，这些原理通过语法中规则附加的概率来偏爱更简单的方程式。我们证明，在基于语法的方程发现的蒙特卡洛算法的背景下，使用概率而不是确定性语法，从而导致更有效的方程发现。最后，通过在方程空间上指定先前的概率分布，为贝叶斯方程发现方法奠定了基础。

Equation discovery, also known as symbolic regression, is a type of automated modeling that discovers scientific laws, expressed in the form of equations, from observed data and expert knowledge. Deterministic grammars, such as context-free grammars, have been used to limit the search spaces in equation discovery by providing hard constraints that specify which equations to consider and which not. In this paper, we propose the use of probabilistic context-free grammars in equation discovery. Such grammars encode soft constraints, specifying a prior probability distribution on the space of possible equations. We show that probabilistic grammars can be used to elegantly and flexibly formulate the parsimony principle, that favors simpler equations, through probabilities attached to the rules in the grammars. We demonstrate that the use of probabilistic, rather than deterministic grammars, in the context of a Monte-Carlo algorithm for grammar-based equation discovery, leads to more efficient equation discovery. Finally, by specifying prior probability distributions over equation spaces, the foundations are laid for Bayesian approaches to equation discovery.

下载PDF全文

下载文献需遵守相关版权规定

论文标题