论文标题
用于基于模型设计的自动式甲板
Autofocused oracles for model-based design
论文作者
论文摘要
数据驱动的设计正在进入许多应用领域,包括蛋白质,小分子和材料工程。设计目标是构造具有所需特性的对象,例如与治疗靶标结合的蛋白质,或比以前观察到的临界温度更高的临界温度的超导材料。为此,正在用标记数据训练的高容量回归模型的呼叫代替代价高昂的实验测量,这些模型可以在用于硅的搜索候选候选者中利用。但是,设计目标需要进入训练此类模型之外的设计空间区域。因此,可以问:在没有新数据的情况下,当设计算法探索设计空间时,回归模型是否应更改?在此,我们以肯定的方式回答了这个问题。特别是,我们(i)将数据驱动的设计问题正式为非零和游戏,(ii)制定了一种原则性的策略来重新验证回归模型,因为设计算法会进行 - 我们将其称为自动关注,以及(iii)(iii)证明自动启发性的承诺。
Data-driven design is making headway into a number of application areas, including protein, small-molecule, and materials engineering. The design goal is to construct an object with desired properties, such as a protein that binds to a therapeutic target, or a superconducting material with a higher critical temperature than previously observed. To that end, costly experimental measurements are being replaced with calls to high-capacity regression models trained on labeled data, which can be leveraged in an in silico search for design candidates. However, the design goal necessitates moving into regions of the design space beyond where such models were trained. Therefore, one can ask: should the regression model be altered as the design algorithm explores the design space, in the absence of new data? Herein, we answer this question in the affirmative. In particular, we (i) formalize the data-driven design problem as a non-zero-sum game, (ii) develop a principled strategy for retraining the regression model as the design algorithm proceeds---what we refer to as autofocusing, and (iii) demonstrate the promise of autofocusing empirically.