垃圾邮件四种方法：理解文本数据

论文标题

垃圾邮件四种方法：理解文本数据

Spam four ways: Making sense of text data

论文作者

Horton, Nicholas J., Chao, Jie, Finzer, William, Palmer, Phebe

论文摘要

世界上充满了文本数据，但是文本分析传统上并没有在统计教育中发挥重要作用。我们考虑了四种不同的方式，可以为学生提供探索电子邮件消息是否不需要的通信（垃圾邮件）的机会。主题行的文本用于识别可用于分类的功能。这些方法包括使用模型启发活动，使用CODAP进行探索，使用专门设计的闪亮应用程序进行建模以及使用R进行更复杂的分析。这些方法在使用技术和代码方面有所不同，但所有方法都共享了使用数据的共同目标，以做出更好的决策和评估这些决定的准确性。

The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. The approaches include use of a Model Eliciting Activity, exploration with CODAP, modeling with a specially designed Shiny app, and coding more sophisticated analyses using R. The approaches vary in their use of technology and code but all share the common goal of using data to make better decisions and assessment of the accuracy of those decisions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题