DNA编码图书馆的机器学习：用于打击的新范式

论文标题

DNA编码图书馆的机器学习：用于打击的新范式

Machine learning on DNA-encoded libraries: A new paradigm for hit-finding

论文作者

McCloskey, Kevin, Sigel, Eric A., Kearnes, Steven, Xue, Ling, Tian, Xia, Moccia, Dennis, Gikunju, Diana, Bazzaz, Sana, Chan, Betty, Clark, Matthew A., Cuozzo, John W., Guié, Marie-Aude, Guilinger, John P., Huguet, Christelle, Hupp, Christopher D., Keefe, Anthony D., Mulhern, Christopher J., Zhang, Ying, Riley, Patrick

论文摘要

DNA编码的小分子库（DELS）通过筛选具有多达数十亿个独特的小分子的库，可以发现许多不同的治疗价值蛋白靶标的新型抑制剂。我们通过从大型商业集合中识别活跃分子和易于合成的化合物的虚拟库来证明一种新方法，将机器学习应用于DEL选择数据。我们仅使用DEL选择数据训练模型，并应用自动化或自动过滤器，并使用化学家审查限于去除具有不稳定性或反应性的分子的去除。我们通过三种不同的蛋白质靶标的大型前瞻性研究（测试了近2000种化合物）来验证这种方法：SEH（水解酶），ERα（核受体）和C-KIT（激酶）。该方法是有效的，在30 {\ textmu} m处的总命中率为{\ sim} 30％，并且发现每个目标的有效化合物（IC50 <10 nm）。该模型即使对于与原始DEL不同的分子也可以做出有用的预测，并且所鉴定的化合物具有多种多样，主要是药物样，并且与已知的配体不同。总体而言，DEL选择数据的质量和数量；现代机器学习方法的力量；访问大型，廉价，商业上可用的图书馆为命中率创造了一种强大的新方法。

DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value through screening of libraries with up to billions of unique small molecules. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from a large commercial collection and a virtual library of easily synthesizable compounds. We train models using only DEL selection data and apply automated or automatable filters with chemist review restricted to the removal of molecules with potential for instability or reactivity. We validate this approach with a large prospective study (nearly 2000 compounds tested) across three diverse protein targets: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). The approach is effective, with an overall hit rate of {\sim}30% at 30 {\textmu}M and discovery of potent compounds (IC50 <10 nM) for every target. The model makes useful predictions even for molecules dissimilar to the original DEL and the compounds identified are diverse, predominantly drug-like, and different from known ligands. Collectively, the quality and quantity of DEL selection data; the power of modern machine learning methods; and access to large, inexpensive, commercially-available libraries creates a powerful new approach for hit finding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题