论文标题

与扩展的独家组套索相关的特征选择

Correlated Feature Selection with Extended Exclusive Group Lasso

论文作者

Sun, Yuxin, Chain, Benny, Kaski, Samuel, Shawe-Taylor, John

论文摘要

在许多高维分类或在生物环境中设定的回归问题中,对信息特征集的完整识别通常与预测精度一样重要,因为这可以提供机械洞察力和概念上的理解。套索和相关算法已被广泛使用,因为它们的稀疏解决方案自然识别了一组信息丰富的特征。但是,当特征相关时,拉索会不正当地执行。这限制了这种算法在生物学问题中的使用,在这种问题中,基因等特征通常会在途径中共同起作用,从而导致一系列高度相关的特征。在本文中,我们在这种情况下研究了套索衍生物,独家组套索的性能。我们提出了快速算法来求解独家组的套索,并向基础群结构未知的情况引入解决方案。该解决方案将稳定性选择与随机组分配和人工特征的引入相结合。合成数据和现实世界数据的实验强调了这种提出的方​​法与Lasso相比,在全面选择信息特征方面。

In many high dimensional classification or regression problems set in a biological context, the complete identification of the set of informative features is often as important as predictive accuracy, since this can provide mechanistic insight and conceptual understanding. Lasso and related algorithms have been widely used since their sparse solutions naturally identify a set of informative features. However, Lasso performs erratically when features are correlated. This limits the use of such algorithms in biological problems, where features such as genes often work together in pathways, leading to sets of highly correlated features. In this paper, we examine the performance of a Lasso derivative, the exclusive group Lasso, in this setting. We propose fast algorithms to solve the exclusive group Lasso, and introduce a solution to the case when the underlying group structure is unknown. The solution combines stability selection with random group allocation and introduction of artificial features. Experiments with both synthetic and real-world data highlight the advantages of this proposed methodology over Lasso in comprehensive selection of informative features.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源