论文标题
关于复杂性,数据和模型的随机思考
Random thoughts about Complexity, Data and Models
论文作者
论文摘要
在过去的十年中,数据科学和机器学习一直在增长。我们认为,要充分利用这一令人兴奋的领域,我们应该抵制假设可以将预测降低到蛮力数据分析的诱惑。这归功于以下事实:如下所述,建模需要掌握选择相关变量的艺术。更具体地说,我们通过重点关注算法复杂性所起的作用来研究“数据和模型”之间的微妙关系,这有助于使数学上严格的长期观念,即了解经验现象是描述在比数据本身更简单的术语中生成数据的规则。 评估算法复杂性与算法学习之间关系的关键问题是与急需的澄清有关的可压缩性,确定性和可预测性的概念。为此,我们将说明混乱系统的进化定律是压缩的,但是它的通用初始条件不是,这使得时间序列通常是由混沌系统产生的,通常不可压缩。因此,了解控制经验现象的规则不足以预测其结果。反过来,这意味着要理解现象的更多不仅仅是学习 - 即使是从数据中学习)。只有在我们有能力进行“良好建模”的情况下,这才能实现。 显然,算法复杂性的概念取决于图灵对计算的开创性分析。这激发了我们对基于类比的抽象建模的这个极为有说服力的例子的评论,尽管如此,它还是由经验事实大量了解的。
Data Science and Machine learning have been growing strong for the past decade. We argue that to make the most of this exciting field we should resist the temptation of assuming that forecasting can be reduced to brute-force data analytics. This owes to the fact that modelling, as we illustrate below, requires mastering the art of selecting relevant variables. More specifically, we investigate the subtle relation between "data and models" by focussing on the role played by algorithmic complexity, which contributed to making mathematically rigorous the long-standing idea that to understand empirical phenomena is to describe the rules which generate the data in terms which are "simpler" than the data itself. A key issue for the appraisal of the relation between algorithmic complexity and algorithmic learning is to do with a much needed clarification on the related but distinct concepts of compressibility, determinism and predictability. To this end we will illustrate that the evolution law of a chaotic system is compressibile, but a generic initial condition for it is not, making the time series generated by chaotic systems incompressible in general. Hence knowledge of the rules which govern an empirical phenomenon are not sufficient for predicting its outcomes. In turn this implies that there is more to understanding phenomena than learning -- even from data alone -- such rules. This can be achieved only in those cases when we are capable of "good modelling". Clearly, the very idea of algorithmic complexity rests on Turing's seminal analysis of computation. This motivates our remarks on this extremely telling example of analogy-based abstract modelling which is nonetheless heavily informed by empirical facts.