论文标题
香料,一种类似药物的分子和用于训练机器学习潜力的肽的数据集
SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials
论文作者
论文摘要
机器学习潜力是分子模拟的重要工具,但是由于缺少高质量数据集以训练它们的发展,它们的开发阻碍了它们。我们描述了Spice数据集,这是一种新的量子化学数据集,用于训练与模拟与蛋白质相互作用的类似药物样的小分子相关的潜在。它包含超过110万个小分子,二聚体,二肽和溶剂化氨基酸的构象。它包括15个元素,带电和未充电的分子,以及各种共价和非共价相互作用。它提供了在理论的ωB97M-D3(BJ)/DEF2-TZVPPD水平以及其他有用的数量(例如多极矩和键顺序)上计算出的力和能量。我们在其上训练一组机器学习潜力,并证明它们可以在整个化学空间区域实现化学精度。它可以作为创建可转移的,准备使用潜在功能用于分子模拟的宝贵资源。
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.