Reformer: The Efficient Transformer

Authors

Nikita Kitaev, ?ukasz Kaiser, Anselm Levskaya

U.C. Berkeley; Google Research

Portals

Abstract

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

PDF Preview

2001.04451

Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

Authors

Portals

Abstract

PDF Preview

Like this:

Leave a Reply Cancel reply

Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

Authors

Portals

Abstract

PDF Preview

Like this:

You may also Like:

MatFormer: A Generative Model for Procedural Materials

Transformer: Attention Is All You Need

Taming Transformers for High-Resolution Image Synthesis

Leave a Reply Cancel reply