论文标题

稀疏结构搜索参数有效调整

Sparse Structure Search for Parameter-Efficient Tuning

论文作者

Hu, Shengding, Zhang, Zhen, Ding, Ning, Wang, Yadao, Wang, Yasheng, Liu, Zhiyuan, Sun, Maosong

论文摘要

通过微调调整大型预训练模型(PTM)会施加质计算和存储负担。对参数有效调整(PET)的最新研究发现,与常规微调相比,仅优化以PTM为条件的一小部分参数才能产生PAR性能。通常,PET方法精确设计参数有效的模块(PET模块),可以应用于PTMS内部的任意细粒位置。但是,这些细粒度位置的有效性在很大程度上依赖于复杂的手动指定,因此通常会产生优化的结果。与手动指定相反,我们以自动方式探索构建宠物模块。我们将自动\ textbf {s} earch \ textbf {s} parse \ textbf {s} \ textbf {p} arameter- \ textbf {e} fficbf {e} fficient \ textbf {t textbf {t} uning(t} uning(s $^3 $ pet)。基于各种PET方法的统一框架,S $^3 $ PET通过双层优化进行了可区分的PET结构搜索,并提出了移动的全局Sigmoid方法,以明确控制可训练的参数的数量。广泛的实验表明,S $^3 $ PET超过了具有较低训练参数的手册和随机结构。搜索结构可保留超过99 \%的微调性能,具有0.01 \%可训练的参数。此外,S $^3 $ PET的优势通过极低的训练参数预算(0.0009 \%$ \ sim $ 0.01 \%)进行扩增。搜索结构是可转移和解释的,为PET方法的未来设计提供了建议和指导。

Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of parameter-efficient tuning (PET) find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, PET methods exquisitely design parameter-efficient modules (PET modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing PET modules in an automatic manner. We automatically \textbf{S}earch for the \textbf{S}parse \textbf{S}tructure of \textbf{P}arameter-\textbf{E}fficient \textbf{T}uning (S$^3$PET). Based on a unified framework of various PET methods, S$^3$PET conducts the differentiable PET structure search through bi-level optimization and proposes shifted global sigmoid method to explicitly control the number of trainable parameters. Extensive experiments show that S$^3$PET surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99\% fine-tuning performance with 0.01\% trainable parameters. Moreover, the advantage of S$^3$PET is amplified with extremely low trainable parameters budgets (0.0009\%$\sim$0.01\%). The searched structures are transferable and explainable, providing suggestions and guidance for the future design of PET methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源