论文标题
用于操作风险损失描述的文本分析
A text analysis for Operational Risk loss descriptions
论文作者
论文摘要
金融机构通过执行法规要求的活动(例如收集损失数据,计算资本要求和报告)来管理运营风险(Oprisk)。为此,对于每个Oprisk事件,损失金额,日期,组织单位,事件类型和描述都记录在Oprisk数据库中。近年来,需要运营风险功能超越其监管任务,以主动管理运营风险,防止或减轻其影响。由于Oprisk数据库还包含事件描述,因此机会领域是从此类文本中提取信息。目前的工作首次介绍了用于应用文本分析技术(主要的自然语言处理任务之一)的结构化工作流程,以识别代表潜在风险的根源的管理群集(比监管类别更具粒状)。我们已经根据定量数据补充并丰富了统计方法的既定框架。具体而言,在诸如数据清洁,文本矢量化和语义调整之类的微妙任务之后,我们应用了降低维度降低的方法,以及具有算法的几种聚类模型来比较其性能和弱点。我们的结果提高了损失事件的回顾性知识,并可以减轻未来的风险。
Financial institutions manage operational risk (OpRisk) by carrying out activities required by regulation, such as collecting loss data, calculating capital requirements, and reporting. For this purpose, for each OpRisk event, loss amounts, dates, organizational units involved, event types, and descriptions are recorded in the OpRisk databases. In recent years, operational risk functions have been required to go beyond their regulatory tasks to proactively manage operational risk, preventing or mitigating its impact. As OpRisk databases also contain event descriptions, an area of opportunity is to extract information from such texts. The present work introduces for the first time a structured workflow for the application of text analysis techniques (one of the main Natural Language Processing tasks) to the OpRisk event descriptions to identify managerial clusters (more granular than regulatory categories) representing the root-causes of the underlying risks. We have complemented and enriched the established framework of statistical methods based on quantitative data. Specifically, after delicate tasks like data cleaning, text vectorization, and semantic adjustment, we have applied methods of dimensionality reduction and several clustering models with algorithms to compare their performances and weaknesses. Our results improve retrospective knowledge of loss events and enable to mitigate future risks.