Back to courses
Progress
64%

Module 5 · Lesson 2

Transformer architectures

In this lesson we build a small transformer from scratch in PyTorch — embeddings, multi-head attention, the residual stack, and finally a training loop on a tiny Swahili-English parallel corpus.

Resources

  • Lecture slides (PDF)
  • Reference notebook (.ipynb)
  • Reading: Vaswani et al. — Attention Is All You Need