论文标题
预测乳腺癌的数据挖掘技术
Data Mining Techniques in Predicting Breast Cancer
论文作者
论文摘要
背景和目标:占所有癌症的23%的乳腺癌,由于意识和治疗差而威胁着发展中国家的社区。早期诊断对疾病的治疗有很大帮助。进行的本研究是为了改善预测过程并提取主要原因会影响乳腺癌。材料和方法:根据感染该疾病的临床阶段中130名利比亚妇女的八个属性收集数据。通过应用六种算法根据临床阶段预测疾病,使用数据挖掘。所有算法都具有很高的精度,但是决策树提供了最高的决策树准确性数据,用于构建每个叶子的规则。排名变量应用于提取重要变量并支持最终规则以预测疾病。结果:所有应用算法都以不同的精度获得了高预测。规则1、3、4、5和9提供了一个纯粹的子集,以确认为重要规则。只有五个输入变量有助于建筑规则,但并非所有变量都有重大影响。结论:肿瘤大小在构建具有重大影响的所有规则中起着至关重要的作用。继承,乳房和绝经状态的变量在分析中具有微不足道的影响,但它们可能会使用不同的数据分析策略来考虑显着的发现。
Background and Objective: Breast cancer, which accounts for 23% of all cancers, is threatening the communities of developing countries because of poor awareness and treatment. Early diagnosis helps a lot in the treatment of the disease. The present study conducted in order to improve the prediction process and extract the main causes impacted the breast cancer. Materials and Methods: Data were collected based on eight attributes for 130 Libyan women in the clinical stages infected with this disease. Data mining was used by applying six algorithms to predict disease based on clinical stages. All the algorithms gain high accuracy, but the decision tree provides the highest accuracy-diagram of decision tree utilized to build rules from each leafnode. Ranking variables applied to extract significant variables and support final rules to predict disease. Results: All applied algorithms were gained a high prediction with different accuracies. Rules 1, 3, 4, 5 and 9 provided a pure subset to be confirmed as significant rules. Only five input variables contributed to building rules, but not all variables have a significant impact. Conclusion: Tumor size plays a vital role in constructing all rules with a significant impact. Variables of inheritance, breast side and menopausal status have an insignificant impact in analysis, but they may consider remarkable findings using a different strategy of data analysis.