论文标题
神经网络分类器的数据可分离性和开发可分离性指数
Data Separability for Neural Network Classifiers and the Development of a Separability Index
论文作者
论文摘要
在机器学习中,分类器的性能取决于分类器模型和数据集。对于特定的神经网络分类器,训练过程随所使用的训练集而异。一些训练数据使训练精度快速融合到高值,而某些数据可能会导致慢慢融合到较低的精度。为了量化这种现象,我们创建了基于距离的可分离性指数(DSI),该指数独立于分类器模型,以测量数据集的可分离性。在本文中,我们考虑了在同一分布中混合不同类别的数据的情况是分类器分开的最难,并且我们表明DSI可以指示属于不同类别的数据是否具有相似的分布。当将我们提出的方法与使用合成和实际数据集的几种现有可分离性/复杂性度量进行比较时,结果表明DSI是有效的可分离性度量。我们还讨论了DSI在数据科学,机器学习和深度学习领域的可能应用。
In machine learning, the performance of a classifier depends on both the classifier model and the dataset. For a specific neural network classifier, the training process varies with the training set used; some training data make training accuracy fast converged to high values, while some data may lead to slowly converged to lower accuracy. To quantify this phenomenon, we created the Distance-based Separability Index (DSI), which is independent of the classifier model, to measure the separability of datasets. In this paper, we consider the situation where different classes of data are mixed together in the same distribution is most difficult for classifiers to separate, and we show that the DSI can indicate whether data belonging to different classes have similar distributions. When comparing our proposed approach with several existing separability/complexity measures using synthetic and real datasets, the results show the DSI is an effective separability measure. We also discussed possible applications of the DSI in the fields of data science, machine learning, and deep learning.