记忆效率NLLB-200：大量多语言机器翻译模型的特定语言专家修剪

论文标题

记忆效率NLLB-200：大量多语言机器翻译模型的特定语言专家修剪

Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model

论文作者

Koishekenov, Yeskendir, Berard, Alexandre, Nikoulina, Vassilina

论文摘要

最近发布的NLLB-200是一组涵盖202种语言的多语言神经机器翻译模型。最大的模型是基于专家架构的混合，并在许多语言对中实现了SOTA结果。它包含54.5b参数，至少需要四个32GB GPU才能推断。在这项工作中，我们提出了一种修剪方法，该方法可以使多达80％的专家无需进一步填充，并且翻译质量的损失可忽略不计，这使得可以在单个32GB GPU上运行该模型。进一步的分析表明，我们的修剪指标可以识别特定语言的专家。

The recently released NLLB-200 is a set of multilingual Neural Machine Translation models that cover 202 languages. The largest model is based on a Mixture of Experts architecture and achieves SoTA results across many language pairs. It contains 54.5B parameters and requires at least four 32GB GPUs just for inference. In this work, we propose a pruning method that enables the removal of up to 80% of experts without further finetuning and with a negligible loss in translation quality, which makes it feasible to run the model on a single 32GB GPU. Further analysis suggests that our pruning metrics can identify language-specific experts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题