June 2017Science & TechnologyAmericas

Attention Is All You Need — The Transformer

Google researchers published the Transformer architecture, replacing sequential processing with attention mechanisms and enabling the AI revolution that followed.

In June 2017, a team of Google researchers — Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, and Illia Polosukhin — published "Attention Is All You Need," introducing the Transformer architecture. Unlike previous models that processed text sequentially (word by word), the Transformer used "self-attention" mechanisms to process all words simultaneously, understanding how each word relates to every other word in a sentence. This parallelism enabled dramatic scaling. The Transformer became the foundation for virtually every major AI model that followed: BERT, GPT-2, GPT-3, GPT-4, Claude, Gemini, LLaMA, and the entire large language model ecosystem. It is arguably the most consequential machine learning paper ever published.

More in Science & Technology

History, delivered weekly.

A curated dispatch of forgotten moments, pivotal turning points, and the stories behind the dates. No spam, just history.