论文标题
深离群检测中的超参数灵敏度:分析和可伸缩的超音溶液
Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution
论文作者
论文摘要
异常检测(OD)文献表现出许多适用于不同领域的算法。但是,给定一个新的检测任务,尚不清楚如何选择要使用的算法,也不清楚如何在无监督的设置中设置其超参数(S)(HPS)。 HP调整是一个不断增长的问题,基于深度学习的许多新检测器的到来,通常带有一长串HP。令人惊讶的是,在离群矿业文献中选择模型的问题是“房间里的大象”。解锁深度方法的最大潜力的一个重要因素,但很少有人说或系统地解决这个问题。在本文的第一部分中,我们对Deep OD方法的HP灵敏度进行了第一个大规模分析,并通过35,000多个训练有素的模型进行了定量证明模型选择是不可避免的。接下来,我们设计了一个称为Robod的HP刺激性和可扩展的深度高音模型,该模型以不同的HP配置组装模型,绕过选择瘫痪。重要的是,我们引入了新的策略来加快集体培训的速度,例如参数共享,批处理/同时培训和数据子采样,使我们能够更少的参数培训较少的模型。在图像和表格数据集上进行了广泛的实验表明,与现代同行相比,机器人可以实现并保留了强大的最先进的检测性能,而在幼稚的超级振奋人物中仅需$ 2 $ - $ 10 $ \%。
Outlier detection (OD) literature exhibits numerous algorithms as it applies to diverse domains. However, given a new detection task, it is unclear how to choose an algorithm to use, nor how to set its hyperparameter(s) (HPs) in unsupervised settings. HP tuning is an ever-growing problem with the arrival of many new detectors based on deep learning, which usually come with a long list of HPs. Surprisingly, the issue of model selection in the outlier mining literature has been "the elephant in the room"; a significant factor in unlocking the utmost potential of deep methods, yet little said or done to systematically tackle the issue. In the first part of this paper, we conduct the first large-scale analysis on the HP sensitivity of deep OD methods, and through more than 35,000 trained models, quantitatively demonstrate that model selection is inevitable. Next, we design a HP-robust and scalable deep hyper-ensemble model called ROBOD that assembles models with varying HP configurations, bypassing the choice paralysis. Importantly, we introduce novel strategies to speed up ensemble training, such as parameter sharing, batch/simultaneous training, and data subsampling, that allow us to train fewer models with fewer parameters. Extensive experiments on both image and tabular datasets show that ROBOD achieves and retains robust, state-of-the-art detection performance as compared to its modern counterparts, while taking only $2$-$10$\% of the time by the naive hyper-ensemble with independent training.