论文标题

统计假设测试与机器学习二进制分类:区别和指南

Statistical hypothesis testing versus machine-learning binary classification: distinctions and guidelines

论文作者

Li, Jingyi Jessica, Tong, Xin

论文摘要

做出二进制决策是科学研究和工业应用中的常见数据分析任务。在数据科学中,有两种相关但不同的策略:假设检验和二进制分类。在实践中,如何在这两种策略之间进行选择可能不清楚,而且令人困惑。在这里,我们总结了这两个方面的这两种策略之间的关键区别,并列出了数据分析师的五个实用指南,以选择适当的策略来满足特定分析需求。我们在癌症驱动基因预测示例中证明了这些准则的使用。

Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源