论文标题
tarexp:技术辅助审查实验的Python框架
TARexp: A Python Framework for Technology-Assisted Review Experiments
论文作者
论文摘要
技术辅助评论(TAR)是信息检索(IR)和机器学习(ML)的重要工业应用。尽管存在一个小的焦油研究界,但焦油软件和工作流的复杂性是进入的主要障碍。利用过去的开源焦油工作以及IR和ML开源软件的设计模式,我们提出了一个开源Python框架,用于在TAR算法上进行实验。该框架的关键特征是工作流程和实验计划的声明性表示,组件扮演可变数量工作流角色的能力以及状态维护和重新启动功能。用户可以利用标准焦油算法的参考实现,同时结合新的组件以探索其研究兴趣。该框架可在https://github.com/eugene-yang/tarexp上找到。
Technology-assisted review (TAR) is an important industrial application of information retrieval (IR) and machine learning (ML). While a small TAR research community exists, the complexity of TAR software and workflows is a major barrier to entry. Drawing on past open source TAR efforts, as well as design patterns from the IR and ML open source software, we present an open source Python framework for conducting experiments on TAR algorithms. Key characteristics of this framework are declarative representations of workflows and experiment plans, the ability for components to play variable numbers of workflow roles, and state maintenance and restart capabilities. Users can draw on reference implementations of standard TAR algorithms while incorporating novel components to explore their research interests. The framework is available at https://github.com/eugene-yang/tarexp.