论文标题
beta-corm:$ n $ gram配置文件分析的贝叶斯方法
Beta-CoRM: A Bayesian Approach for $n$-gram Profiles Analysis
论文作者
论文摘要
$ n $ gram的配置文件已成功,广泛用于分析群集或分类的潜在长度不同的长序列。主要的是,机器学习算法已用于此目的,但是,尽管具有预测性能,但这些方法无法发现隐藏的结构或提供数据的完整概率表示。为$ n $ gram概况设计为二进制属性的新型贝叶斯生成模型已设计用于解决此问题。所提出的建模的灵活性允许考虑一种直接的方法,可以在生成模型中进行特征选择。此外,为快速推论过程得出了一种切片采样算法,该过程适用于合成和真实数据方案,并表明特征选择可以提高分类精度。
$n$-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for $n$-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.