使用Word Mover的距离在用户故事上进行主题建模

论文标题

使用Word Mover的距离在用户故事上进行主题建模

Topic Modeling on User Stories using Word Mover's Distance

论文作者

Gülle, Kim Julian, Ford, Nicholas, Ebel, Patrick, Brokhausen, Florian, Vogelsang, Andreas

论文摘要

需求启发最近添加了基于人群的技术，这些技术不断涉及大型，异质的用户群体，这些用户通过各种媒体表达反馈。基于人群的启发具有巨大的潜力，可以尽早与（潜在的）用户互动，但也会带来大量的原始反馈和非结构化反馈。合并和分析此反馈是将其转变为明智的用户需求的关键挑战。在本文中，我们将关注主题建模作为识别大量人群生成的用户故事中的主题的一种手段，并比较三种方法：（1）一种基于潜在的Dirichlet分配的传统方法，（2）单词嵌入式和主成分分析的组合，以及（3）单词嵌入和单词Movover的距离的组合。我们评估了由人群工人编写和分类的2,966个用户故事的公开可用的方法。我们发现，单词嵌入和单词移动器的距离的组合是最有前途的。根据我们在方法中使用的单词嵌入单词，我们设法以两种方式聚集用户故事：一个更接近原始分类，另一个允许在数据集中进行新的见解，例如。找到潜在的新类别。不幸的是，没有措施客观地对我们的结果的质量进行评分。尽管如此，我们的发现为未来的工作提供了一个基础，以分析众包用户故事。

Requirements elicitation has recently been complemented with crowd-based techniques, which continuously involve large, heterogeneous groups of users who express their feedback through a variety of media. Crowd-based elicitation has great potential for engaging with (potential) users early on but also results in large sets of raw and unstructured feedback. Consolidating and analyzing this feedback is a key challenge for turning it into sensible user requirements. In this paper, we focus on topic modeling as a means to identify topics within a large set of crowd-generated user stories and compare three approaches: (1) a traditional approach based on Latent Dirichlet Allocation, (2) a combination of word embeddings and principal component analysis, and (3) a combination of word embeddings and Word Mover's Distance. We evaluate the approaches on a publicly available set of 2,966 user stories written and categorized by crowd workers. We found that a combination of word embeddings and Word Mover's Distance is most promising. Depending on the word embeddings we use in our approaches, we manage to cluster the user stories in two ways: one that is closer to the original categorization and another that allows new insights into the dataset, e.g. to find potentially new categories. Unfortunately, no measure exists to rate the quality of our results objectively. Still, our findings provide a basis for future work towards analyzing crowd-sourced user stories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题