论文标题

抗差:提高深网的可重复性

Anti-Distillation: Improving reproducibility of deep networks

论文作者

Shamir, Gil I., Coviello, Lorenzo

论文摘要

深层网络在改善机器学习和人工智能系统的性能方面具有革命性。但是,它们的高预测准确性的价格是\ emph {模型不可复制性\/}的价格很高,而经典线性模型不会出现。即使据称是相同的两个模型,它们具有相同的架构和相同的训练参数集,并且在相同的一组培训示例上进行了训练,尽管可能提供相同的平均预测准确性,但在以前看不见的例子上可能会非常不同。 \ emph {预测差异\/}可能与预测本身的数量级一样大。已显示合奏可以减轻这种行为,但如果没有额外的推动,可能并不能利用它们的全部潜力。在这项工作中,提出了一种新颖的方法,即\ emph {antiestillation \/},以解决深层网络中的不可重复性,其中合奏模型用于生成预测。抗差异力集合组件通过在小型示例中取消输出的技术彼此之间的彼此相距,从而迫使它们变得更加不同,更多样化。这样做可以增强合奏的好处,从而使最终预测更加可重复。经验结果表明,在基准和实际数据集上的抗差异实现了实质性的预测差异。

Deep networks have been revolutionary in improving performance of machine learning and artificial intelligence systems. Their high prediction accuracy, however, comes at a price of \emph{model irreproducibility\/} in very high levels that do not occur with classical linear models. Two models, even if they are supposedly identical, with identical architecture and identical trained parameter sets, and that are trained on the same set of training examples, while possibly providing identical average prediction accuracies, may predict very differently on individual, previously unseen, examples. \emph{Prediction differences\/} may be as large as the order of magnitude of the predictions themselves. Ensembles have been shown to somewhat mitigate this behavior, but without an extra push, may not be utilizing their full potential. In this work, a novel approach, \emph{Anti-Distillation\/}, is proposed to address irreproducibility in deep networks, where ensemble models are used to generate predictions. Anti-Distillation forces ensemble components away from one another by techniques like de-correlating their outputs over mini-batches of examples, forcing them to become even more different and more diverse. Doing so enhances the benefit of ensembles, making the final predictions more reproducible. Empirical results demonstrate substantial prediction difference reductions achieved by Anti-Distillation on benchmark and real datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源