论文标题
从开放教育资源中提取主题
Extracting Topics from Open Educational Resources
论文作者
论文摘要
近年来,在减轻全球对教育需求日益增加时,开放教育资源(OER)被指定为至关重要。显然,在许多不同的情况下,OER具有高潜力来满足学习者,因为它们在各种环境中可用。但是,通常,OER元数据的低质量是缺乏个性化服务(例如搜索和建议)的主要原因之一。结果,OER的适用性仍然有限。然而,学习者基本上需要关于涵盖主题(主题)的OER元数据,以建立有效的学习途径,以实现其个人学习目标。因此,在本文中,我们报告了一项正在进行的项目,该项目提出了一种应用主题提取方法,应用文本挖掘技术,以生成有关主题分布的高质量OER元数据。这是通过以下方式完成的:1)从数据科学相关技能领域的Coursera和Khan Academy收集123次讲座,2)在收集的资源上应用潜在的Dirichlet分配(LDA),以提取与这些技能有关的现有主题,以及3)定义特定OER所涵盖的主题分布。为了评估我们的模型,我们使用了YouTube的教育资源数据集,并将我们的主题分布结果与他们手动定义的目标主题与数据科学领域的3位专家的帮助进行了比较。结果,我们的模型以F1得分的79%提取了主题。
In recent years, Open Educational Resources (OERs) were earmarked as critical when mitigating the increasing need for education globally. Obviously, OERs have high-potential to satisfy learners in many different circumstances, as they are available in a wide range of contexts. However, the low-quality of OER metadata, in general, is one of the main reasons behind the lack of personalised services such as search and recommendation. As a result, the applicability of OERs remains limited. Nevertheless, OER metadata about covered topics (subjects) is essentially required by learners to build effective learning pathways towards their individual learning objectives. Therefore, in this paper, we report on a work in progress project proposing an OER topic extraction approach, applying text mining techniques, to generate high-quality OER metadata about topic distribution. This is done by: 1) collecting 123 lectures from Coursera and Khan Academy in the area of data science related skills, 2) applying Latent Dirichlet Allocation (LDA) on the collected resources in order to extract existing topics related to these skills, and 3) defining topic distributions covered by a particular OER. To evaluate our model, we used the data-set of educational resources from Youtube, and compared our topic distribution results with their manually defined target topics with the help of 3 experts in the area of data science. As a result, our model extracted topics with 79% of F1-score.