论文标题
法律案件文件的相似性:您需要网络和文本
Legal Case Document Similarity: You Need Both Network and Text
论文作者
论文摘要
估计两个法律案件文件之间的相似性是一个重要且具有挑战性的问题,具有各种下游应用程序,例如先前的检索和引文建议。该任务有两种广泛的方法 - 基于引用网络和基于文本的情况。先前的基于引用网络的方法仅考虑对先前案例(也称为先例)(PCNET)的引用。这种方法错过了法规固有的重要信号(管辖权的书面法)。在这项工作中,我们提出了HIER-SPCNET,该HIER-SPCNET通过法规的异质网络增强PCNET。我们将法律文件相似性的域知识纳入了HIER-SPCNET,从而为基于网络的法律文档相似性获得了最新结果。文本和网络相似性都为法律案件相似性提供了重要的信号;但是到目前为止,仅进行了统一两个信号的琐碎尝试。在这项工作中,我们采用了几种方法来组合文本和网络信息,以估计法律案件相似性。我们对印度司法机构的法律案件文件进行了广泛的实验,在该法律案件文件中,文档对之间的黄金标准相似性是由印度两家知名法律机构的法律专家判断的。我们的实验表明,与基于网络的法律文档相似性相比,我们提出的基于网络的方法可显着改善与域专家意见相关的相关性。我们表现最佳的组合方法(结合基于网络的基于网络和基于文本的相似性)将与域专家的意见相关性比最佳基于文本的方法的相关性提高了11.8%,而基于网络的最佳方法的相关性则比20.6 \%。我们还确定,我们表现最佳的方法可用于推荐 /检索来源(查询)案例的可提点案例和类似案例,这些案例受到法律专家的赞赏。
Estimating the similarity between two legal case documents is an important and challenging problem, having various downstream applications such as prior-case retrieval and citation recommendation. There are two broad approaches for the task -- citation network-based and text-based. Prior citation network-based approaches consider citations only to prior-cases (also called precedents) (PCNet). This approach misses important signals inherent in Statutes (written laws of a jurisdiction). In this work, we propose Hier-SPCNet that augments PCNet with a heterogeneous network of Statutes. We incorporate domain knowledge for legal document similarity into Hier-SPCNet, thereby obtaining state-of-the-art results for network-based legal document similarity. Both textual and network similarity provide important signals for legal case similarity; but till now, only trivial attempts have been made to unify the two signals. In this work, we apply several methods for combining textual and network information for estimating legal case similarity. We perform extensive experiments over legal case documents from the Indian judiciary, where the gold standard similarity between document-pairs is judged by law experts from two reputed Law institutes in India. Our experiments establish that our proposed network-based methods significantly improve the correlation with domain experts' opinion when compared to the existing methods for network-based legal document similarity. Our best-performing combination method (that combines network-based and text-based similarity) improves the correlation with domain experts' opinion by 11.8% over the best text-based method and 20.6\% over the best network-based method. We also establish that our best-performing method can be used to recommend / retrieve citable and similar cases for a source (query) case, which are well appreciated by legal experts.