论文标题

从验证的语言模型中提取潜在转向向量

Extracting Latent Steering Vectors from Pretrained Language Models

论文作者

Subramani, Nishant, Suresh, Nivedita, Peters, Matthew E.

论文摘要

可控文本生成的先前工作重点是学习如何通过基于期望的目标进行训练的解码,智能推出设计或微调来控制语言模型。我们假设指导模型生成目标句子所需的信息已在模型中编码。因此,我们完全探讨了一种不同的方法:直接从预审计的语言模型解码器中提取潜在向量而无需微调。实验表明,存在转向向量,当将其添加到语言模型的隐藏状态中时,它几乎完美地生成了一个来自各个域的英语句子的目标句子(> 99 bleu)。我们表明,矢量算术可用于Yelp情感基准的无监督情感转移,其性能与适合此任务的模型相当。我们发现,转向向量之间的距离反映了在文本相似性基准(STS-B)上评估时的句子相似性,表现优于模型的汇总隐藏状态。最后,我们对转向向量的内在特性进行了分析。综上所述,我们的结果表明,可以通过其潜在转向空间有效地控制冷冻LM。

Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to generate a target sentence is already encoded within the model. Accordingly, we explore a different approach altogether: extracting latent vectors directly from pretrained language model decoders without fine-tuning. Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly (> 99 BLEU) for English sentences from a variety of domains. We show that vector arithmetic can be used for unsupervised sentiment transfer on the Yelp sentiment benchmark, with performance comparable to models tailored to this task. We find that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark (STS-B), outperforming pooled hidden states of models. Finally, we present an analysis of the intrinsic properties of the steering vectors. Taken together, our results suggest that frozen LMs can be effectively controlled through their latent steering space.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源