指标重新加载：图像分析验证的建议

论文标题

指标重新加载：图像分析验证的建议

Metrics reloaded: Recommendations for image analysis validation

论文作者

Maier-Hein, Lena, Reinke, Annika, Godau, Patrick, Tizabi, Minu D., Buettner, Florian, Christodoulou, Evangelia, Glocker, Ben, Isensee, Fabian, Kleesiek, Jens, Kozubek, Michal, Reyes, Mauricio, Riegler, Michael A., Wiesenfarth, Manuel, Kavur, A. Emre, Sudre, Carole H., Baumgartner, Michael, Eisenmann, Matthias, Heckmann-Nötzel, Doreen, Rädsch, Tim, Acion, Laura, Antonelli, Michela, Arbel, Tal, Bakas, Spyridon, Benis, Arriel, Blaschko, Matthew, Cardoso, M. Jorge, Cheplygina, Veronika, Cimini, Beth A., Collins, Gary S., Farahani, Keyvan, Ferrer, Luciana, Galdran, Adrian, van Ginneken, Bram, Haase, Robert, Hashimoto, Daniel A., Hoffman, Michael M., Huisman, Merel, Jannin, Pierre, Kahn, Charles E., Kainmueller, Dagmar, Kainz, Bernhard, Karargyris, Alexandros, Karthikesalingam, Alan, Kenngott, Hannes, Kofler, Florian, Kopp-Schneider, Annette, Kreshuk, Anna, Kurc, Tahsin, Landman, Bennett A., Litjens, Geert, Madani, Amin, Maier-Hein, Klaus, Martel, Anne L., Mattson, Peter, Meijering, Erik, Menze, Bjoern, Moons, Karel G. M., Müller, Henning, Nichyporuk, Brennan, Nickel, Felix, Petersen, Jens, Rajpoot, Nasir, Rieke, Nicola, Saez-Rodriguez, Julio, Sánchez, Clara I., Shetty, Shravya, van Smeden, Maarten, Summers, Ronald M., Taha, Abdel A., Tiulpin, Aleksei, Tsaftaris, Sotirios A., Van Calster, Ben, Varoquaux, Gaël, Jäger, Paul F.

论文摘要

越来越多的证据表明，机器学习（ML）算法验证的缺陷是一个低估的全球问题。特别是在自动生物医学图像分析中，所选的性能指标通常不能反映领域的兴趣，因此无法充分测量科学进步并阻碍将ML技术转化为实践。为了克服这一点，我们的大型国际专家联盟创建了重新加载的指标，这是一个全面的指导研究人员，以解决问题的指标选择。遵循ML跨应用领域的ML方法的收敛，指标重新加载促进了验证方法的收敛性。该框架是在多阶段的Delphi过程中开发的，基于问题指纹的新颖概念 - 给定问题的结构化表示，捕获了从域的兴趣到目标结构，数据集，数据集和算法输出的属性的所有方面。根据问题指纹，指导用户可以通过选择和应用适当的验证指标的过程，同时使用户意识到潜在的陷阱。指标重新加载目标图像分析问题，可以解释为图像，对象或像素级别的分类任务，即图像级分类，对象检测，语义分割和实例分割任务。为了改善用户体验，我们在重新加载的在线工具中实现了框架，该工具还为探索最常见验证指标的弱点，优势和特定建议提供了访问点。我们跨领域的框架的广泛适用性通过实例化各种生物学和医学图像分析用例证明。

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题