论文标题
预测法律程序状态:基于顺序文本数据的方法
Predicting Legal Proceedings Status: Approaches Based on Sequential Text Data
论文作者
论文摘要
本文的目的是开发预测模型,以三个可能的状态类别对巴西法律程序进行分类:(i)归档程序,(ii)积极的程序和(iii)暂停程序。该问题的解决方案旨在帮助公共和私人机构管理大量法律程序的组合,从而提供规模和效率。在本文中,法律程序是由称为“动议”的短文序列组成的。我们结合了几种自然语言处理(NLP)和机器学习技术来解决问题。尽管与葡萄牙NLP合作,由于缺乏资源,这可能具有挑战性,但我们的方法在分类任务中表现出色,最高准确性为.93,最高平均F1得分为.89(宏)和.93(加权)。此外,除了量化这些模式与分类任务之间的关系外,我们还可以提取和解释我们的一个模型所学的模式。在机器学习法律应用中,可解释性步骤非常重要,并使我们对黑框模型的决策有一个令人兴奋的见解。
The objective of this paper is to develop predictive models to classify Brazilian legal proceedings in three possible classes of status: (i) archived proceedings, (ii) active proceedings, and (iii) suspended proceedings. This problem's resolution is intended to assist public and private institutions in managing large portfolios of legal proceedings, providing gains in scale and efficiency. In this paper, legal proceedings are made up of sequences of short texts called "motions." We combined several natural language processing (NLP) and machine learning techniques to solve the problem. Although working with Portuguese NLP, which can be challenging due to lack of resources, our approaches performed remarkably well in the classification task, achieving maximum accuracy of .93 and top average F1 Scores of .89 (macro) and .93 (weighted). Furthermore, we could extract and interpret the patterns learned by one of our models besides quantifying how those patterns relate to the classification task. The interpretability step is important among machine learning legal applications and gives us an exciting insight into how black-box models make decisions.