Richard Sutton: Father of RL Thinks LLMs Are a Dead End

⬅️ Back to Podcasts

  • Podcast: Dwarkesh Podcast
  • Host: Dwarkesh Patel
  • Guest: Richard Sutton — Turing Award winner, inventor of TD learning and policy gradient methods, founding father of reinforcement learning
  • Duration: ~1 hour 7 minutes
  • Listen: Apple Podcasts | YouTube

Richard Sutton is the most prominent critic of the LLM-centric approach to AI, and he has the credentials to back it up. He won the Turing Award for inventing temporal difference learning and policy gradient methods — foundational ideas in reinforcement learning. This conversation is him explaining why he thinks LLMs are a dead end.


The Core Argument

Sutton’s position is simple: LLMs mimic what humans would say. They do not figure out what to do. Without a definition of what constitutes a correct action, they cannot develop genuine understanding or goals.

Reinforcement learning, by contrast, starts from a clear definition of right and wrong: the action that leads to reward. The agent interacts with the world, receives feedback, and updates its behavior. That is learning. LLMs do not learn in this sense. They pattern-match.

“Large language models are about mimicking people, doing what people say you should do. They’re not about figuring out what to do.”


How Humans Actually Learn

Sutton challenges the assumption that children learn primarily through imitation. Even infants learn through trial and error, prediction, and interaction with their environment. Imitation plays a role, but the foundation is experiential learning.


The Four Components of Intelligence

Sutton outlines what a true continual learning agent requires:

  1. Policy — what to do in a given situation
  2. Value function — how well things are going
  3. Perception — understanding the current state
  4. Transition model — predicting consequences of actions

LLMs address perception and some aspects of prediction, but they lack policy (they have no goals) and value functions (they have no sense of better or worse).


The Bitter Lesson

Sutton’s famous essay argues that methods which leverage computation (search, learning) consistently outperform methods that incorporate human knowledge. This has held true across decades of AI: from chess to Go to language.

His provocative suggestion: LLMs may represent another instance of this pattern. Systems that ingest massive human text look impressive now, but they will be superseded by systems that learn purely from experience and computation.


The Digital Intelligence Succession

Sutton sees the transition to digital intelligence as inevitable and historically significant — one of the major stages in the universe’s evolution, from biological replication to designed intelligence. His advice: focus on what you can control locally, the way parents raise children with good values without dictating their life paths.

Crepi il lupo! 🐺