论文标题

设计特征矢量表示:化学案例研究

Designing Feature Vector Representations: A case study from Chemistry

论文作者

Thygesen, Signe Sidwall, Witschard, Daniel, Kerren, Andreas, Masood, Talha Bin, Hotz, Ingrid

论文摘要

我们提出了一个案例研究,该案例研究在分析化学多元集合数据的背景下调查了特征描述符。每个集合成员的数据包括三个部分:每个集合成员的设计参数,由数值模拟产生的现场数据以及分子的物理性质。由于基于特征的方法具有降低数据复杂性并促进比较和聚类的潜力,因此我们将重点放在此类方法上。但是,有许多选择可以设计特征向量表示,并且没有明显的偏好。为了更好地了解不同表示形式,我们分析了它们的相似性和差异。因此,我们专注于从表示形式得出的三个特征:成对距离的分布,聚类趋势和成对距离的等级。我们的调查结果部分证实了预期的行为,但也提供了一些令人惊讶的观察结果,这些观察结果可用于化学域中特征表示的未来发展。

We present a case study investigating feature descriptors in the context of the analysis of chemical multivariate ensemble data. The data of each ensemble member consists of three parts: the design parameters for each ensemble member, field data resulting from the numerical simulations, and physical properties of the molecules. Since feature-based methods have the potential to reduce the data complexity and facilitate comparison and clustering, we are focusing on such methods. However, there are many options to design the feature vector representation and there is no obvious preference. To get a better understanding of the different representations, we analyze their similarities and differences. Thereby, we focus on three characteristics derived from the representations: the distribution of pairwise distances, the clustering tendency, and the rank-order of the pairwise distances. The results of our investigations partially confirmed expected behavior, but also provided some surprising observations that can be used for the future development of feature representations in the chemical domain.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源