论文标题
展示您的工作并不总是有效
Showing Your Work Doesn't Always Work
论文作者
论文摘要
在自然语言处理中,最近流行的工作探索了如何最好地报告神经网络的实验结果。一个示例出版物,标题为“显示您的作品:改进的实验结果报告”,倡导在计算预算方面报告最出色模型的预期验证效果。在目前的工作中,我们批判性地研究了本文。就统计概括性而言,我们发现了使用这种方法的陷阱和警告。我们分析表明,它们的估计器有偏见并使用容易出错的假设。我们发现估计器有利于负错误,并产生较差的自举置信区间。我们得出了一种公正的替代方案,并通过统计模拟的经验证据来增强我们的主张。我们的代码库位于http://github.com/castorini/meanmax。
In natural language processing, a recently popular line of work explores how to best report the experimental results of neural networks. One exemplar publication, titled "Show Your Work: Improved Reporting of Experimental Results," advocates for reporting the expected validation effectiveness of the best-tuned model, with respect to the computational budget. In the present work, we critically examine this paper. As far as statistical generalizability is concerned, we find unspoken pitfalls and caveats with this approach. We analytically show that their estimator is biased and uses error-prone assumptions. We find that the estimator favors negative errors and yields poor bootstrapped confidence intervals. We derive an unbiased alternative and bolster our claims with empirical evidence from statistical simulation. Our codebase is at http://github.com/castorini/meanmax.