Talk at Stanford NLP Seminar
NLP Seminar, Stanford University, Stanford, CA
NLP Seminar, Stanford University, Stanford, CA
NLP Seminar, Duke University, Durham, NC
In this talk, we explore how algorithmic insights and tools from optimization theory and Fourier transforms can shed light on the mechanisms underlying Transformers’ ability to solve fundamental computational tasks, including linear regression and addition. We will examine the interplay between architectural design and pre-training data in enabling Transformers to learn these mechanisms effectively. Lastly, we will discuss recent advancements in directly mapping numbers to their Fourier representations, eliminating the tokenization step entirely for improved efficiency and accuracy.