论文标题

变压器的形式算法

Formal Algorithms for Transformers

论文作者

Phuong, Mary, Hutter, Marcus

论文摘要

该文档旨在成为变压器体系结构和算法的独立,数学精确的概述(*不*结果)。它涵盖了变压器是什么,它们的训练方式,使用的方式,关键的架构组件以及最突出的模型的预览。假定读者熟悉基本的ML术语和更简单的神经网络体系结构,例如MLP。

This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源