稀疏结构搜索参数有效调整

论文标题

稀疏结构搜索参数有效调整

Sparse Structure Search for Parameter-Efficient Tuning

论文作者

Hu, Shengding, Zhang, Zhen, Ding, Ning, Wang, Yadao, Wang, Yasheng, Liu, Zhiyuan, Sun, Maosong

论文摘要

通过微调调整大型预训练模型（PTM）会施加质计算和存储负担。对参数有效调整（PET）的最新研究发现，与常规微调相比，仅优化以PTM为条件的一小部分参数才能产生PAR性能。通常，PET方法精确设计参数有效的模块（PET模块），可以应用于PTMS内部的任意细粒位置。但是，这些细粒度位置的有效性在很大程度上依赖于复杂的手动指定，因此通常会产生优化的结果。与手动指定相反，我们以自动方式探索构建宠物模块。我们将自动\ textbf {s} earch \ textbf {s} parse \ textbf {s} \ textbf {p} arameter- \ textbf {e} fficbf {e} fficient \ textbf {t textbf {t} uning（t} uning（s $^3 $ pet）。基于各种PET方法的统一框架，S $^3 $ PET通过双层优化进行了可区分的PET结构搜索，并提出了移动的全局Sigmoid方法，以明确控制可训练的参数的数量。广泛的实验表明，S $^3 $ PET超过了具有较低训练参数的手册和随机结构。搜索结构可保留超过99 \％的微调性能，具有0.01 \％可训练的参数。此外，S $^3 $ PET的优势通过极低的训练参数预算（0.0009 \％$ \ sim $ 0.01 \％）进行扩增。搜索结构是可转移和解释的，为PET方法的未来设计提供了建议和指导。

Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of parameter-efficient tuning (PET) find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, PET methods exquisitely design parameter-efficient modules (PET modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing PET modules in an automatic manner. We automatically \textbf{S}earch for the \textbf{S}parse \textbf{S}tructure of \textbf{P}arameter-\textbf{E}fficient \textbf{T}uning (S$^3$PET). Based on a unified framework of various PET methods, S$^3$PET conducts the differentiable PET structure search through bi-level optimization and proposes shifted global sigmoid method to explicitly control the number of trainable parameters. Extensive experiments show that S$^3$PET surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99\% fine-tuning performance with 0.01\% trainable parameters. Moreover, the advantage of S$^3$PET is amplified with extremely low trainable parameters budgets (0.0009\%$\sim$0.01\%). The searched structures are transferable and explainable, providing suggestions and guidance for the future design of PET methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题