探索性景观分析对采样策略非常敏感

论文标题

探索性景观分析对采样策略非常敏感

Exploratory Landscape Analysis is Strongly Sensitive to the Sampling Strategy

论文作者

Renau, Quentin, Doerr, Carola, Dreo, Johann, Doerr, Benjamin

论文摘要

探索性景观分析（ELA）通过提供量化当前优化问题最相关特征的功能集来支持自动算法选择和配置的监督学习方法。在黑框优化中，如果没有明确的问题表示形式，则需要从少数样品点近似特征值。在实践中，均匀采样的随机点集和拉丁超立方体结构是常用的采样策略。在这项工作中，我们分析了采样方法和样本量如何影响特征值近似值的质量以及该质量如何影响标准分类任务的准确性。虽然并非出乎意料的是，增加样品点的数量给出了特征值的更强大的估计，但令我们惊讶的是，我们发现不同采样策略的特征值近似不会收敛到相同的值。这意味着近似特征值不能独立于基础采样策略来解释。正如我们的分类实验所示，这也意味着用于训练分类器的功能近似必须源于与实际分类任务相同的采样策略。作为侧面结果，我们表明，经过近似于SOBOL序列的特征值训练的分类器比任何标准采样技术都具有更高的精度。这可能表明ELA训练的机器学习模型的改进潜力。

Exploratory landscape analysis (ELA) supports supervised learning approaches for automated algorithm selection and configuration by providing sets of features that quantify the most relevant characteristics of the optimization problem at hand. In black-box optimization, where an explicit problem representation is not available, the feature values need to be approximated from a small number of sample points. In practice, uniformly sampled random point sets and Latin hypercube constructions are commonly used sampling strategies. In this work, we analyze how the sampling method and the sample size influence the quality of the feature value approximations and how this quality impacts the accuracy of a standard classification task. While, not unexpectedly, increasing the number of sample points gives more robust estimates for the feature values, to our surprise we find that the feature value approximations for different sampling strategies do not converge to the same value. This implies that approximated feature values cannot be interpreted independently of the underlying sampling strategy. As our classification experiments show, this also implies that the feature approximations used for training a classifier must stem from the same sampling strategy as those used for the actual classification tasks. As a side result we show that classifiers trained with feature values approximated by Sobol' sequences achieve higher accuracy than any of the standard sampling techniques. This may indicate improvement potential for ELA-trained machine learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题