Talk at Duke NLP Seminar
Date:
In this talk, we explore how algorithmic insights and tools from optimization theory and Fourier transforms can shed light on the mechanisms underlying Transformers’ ability to solve fundamental computational tasks, including linear regression and addition. We will examine the interplay between architectural design and pre-training data in enabling Transformers to learn these mechanisms effectively. Lastly, we will discuss recent advancements in directly mapping numbers to their Fourier representations, eliminating the tokenization step entirely for improved efficiency and accuracy.
Related Papers:
- Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression - Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan, NeurIPS 2024
- Previously titled: Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models
- SoCalNLP Symposium 2023 Best Paper Award
- Pre-trained Large Language Models Use Fourier Features to Compute Addition - Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia, NeurIPS 2024
- FNE: Precise Single-Token Number Embeddings via Fourier Features - Tianyi Zhou, Deqing Fu, Mahdi Soltanolkotabi, Robin Jia, Vatsal Sharan