论文标题
群集分析和丢失数据的离群值检测
Cluster analysis and outlier detection with missing data
论文作者
论文摘要
多元污染的正常(MCN)分布的混合物是一种有用的基于模型的聚类技术,可容纳与轻度异常值的数据集。但是,此模型仅在拟合完成数据集时起作用,在实际应用程序中通常不是这种情况。在本文中,我们开发了一个框架,将MCN分布的混合物拟合到不完整的数据集中,即数据集,其中一些值随机丢失。我们采用期望条件最大化算法进行参数估计。我们使用仿真研究比较模型的结果以及学生T分布的混合物,以使数据不完整。
A mixture of multivariate contaminated normal (MCN) distributions is a useful model-based clustering technique to accommodate data sets with mild outliers. However, this model only works when fitted to complete data sets, which is often not the case in real applications. In this paper, we develop a framework for fitting a mixture of MCN distributions to incomplete data sets, i.e. data sets with some values missing at random. We employ the expectation-conditional maximization algorithm for parameter estimation. We use a simulation study to compare the results of our model and a mixture of Student's t distributions for incomplete data.