论文标题
通过人类的机器学习建模和减轻人类注释错误,以设计高效的流处理系统
Modeling and mitigating human annotation errors to design efficient stream processing systems with human-in-the-loop machine learning
论文作者
论文摘要
高质量的人类注释对于创建有效的机器学习驱动的流处理系统是必要的。我们研究了基于人类循环机器学习(HITL-ML)范式的混合流处理系统,其中一个或多个人类注释者和自动分类器(至少由人类注释者进行了部分培训)标记了传入的实例。这是许多近实时的社交媒体分析和Web应用程序的典型特征,包括在数字志愿者团体紧急情况下注释社交媒体帖子。从实际的角度来看,低质量的人类注释会导致错误的标签,用于重新训练自动分类器,并间接有助于创建不准确的分类器。 将人类注释视为心理过程,使我们能够解决这些局限性。我们表明,人类注释质量取决于注释者显示的实例的排序,并且可以通过提供给注释者的实例序列/顺序中的局部变化来改善,从而更准确地注释流。我们适应了一个理论动机的错误框架和人类注释任务的滑动框架,以研究订购实例的效果(即“注释计划”)。此外,我们提出了一种避免误差的方法,用于在决定人类注释时间表时对这些可能的人类错误(以滑倒的形式)强大的流处理应用程序。我们使用众包实验来支持人体错误框架,并通过广泛的实验来评估针对标准基准的拟议算法,这些算法通过对自然灾害期间过滤相关社交媒体帖子的分类任务进行广泛的实验。
High-quality human annotations are necessary for creating effective machine learning-driven stream processing systems. We study hybrid stream processing systems based on a Human-In-The-Loop Machine Learning (HITL-ML) paradigm, in which one or many human annotators and an automatic classifier (trained at least partially by the human annotators) label an incoming stream of instances. This is typical of many near-real-time social media analytics and web applications, including annotating social media posts during emergencies by digital volunteer groups. From a practical perspective, low-quality human annotations result in wrong labels for retraining automated classifiers and indirectly contribute to the creation of inaccurate classifiers. Considering human annotation as a psychological process allows us to address these limitations. We show that human annotation quality is dependent on the ordering of instances shown to annotators and can be improved by local changes in the instance sequence/order provided to the annotators, yielding a more accurate annotation of the stream. We adapt a theoretically-motivated human error framework of mistakes and slips for the human annotation task to study the effect of ordering instances (i.e., an "annotation schedule"). Further, we propose an error-avoidance approach to the active learning paradigm for stream processing applications robust to these likely human errors (in the form of slips) when deciding a human annotation schedule. We support the human error framework using crowdsourcing experiments and evaluate the proposed algorithm against standard baselines for active learning via extensive experimentation on classification tasks of filtering relevant social media posts during natural disasters.