Dwarkesh Podcast: Terence Tao – How the World's Top Mathematician Uses AI

⬅️ Back to Podcasts

PODCAST INFORMATION

  • Title: 🎙️ Terence Tao – “Kepler, Newton, and the True Nature of Scientific Progress”
  • Show: Dwarkesh Podcast
  • Host: Dwarkesh Patel (Independent)
  • Guest: Terence Tao (Fields Medalist, UCLA Professor of Mathematics)
  • Duration: ~2h 15m
  • Publication Date: March 2026
  • Original Episode: Apple Podcasts | YouTube

🎧 Listen to the Podcast

📺 Watch here

⚖️ VERDICT

Overall Rating: 9/10

A masterclass in how to think about AI’s role in science, told through the lens of one of history’s greatest scientific stories. Tao uses the Kepler-Brahe-Newton chain as a surprisingly precise analogy for LLMs, and his take—that idea generation is no longer the bottleneck, verification is—reframes the entire AI-for-science debate. The conversation ranges from the history of heliocentrism to Lean formalization to whether a Riemann hypothesis proof could be meaningful if no human understands it. Tao’s clarity of thought and ability to explain the deepest ideas in mathematics in plain language make this essential listening for anyone thinking about the future of research.

🎯 ONE-SENTENCE ASSESSMENT

AI has driven the cost of idea generation to near zero—like Kepler’s frantic hypothesis search—but the real constraint on scientific progress is verification, validation, and the “test of time” that cannot be reinforcement-learned.

📊 EVALUATION CRITERIA

CriterionScore (/10)Key Observation
Content Depth10Extraordinary intellectual range: from Tycho Brahe’s naked-eye observations to the twin prime conjecture to the philosophical limits of formal verification. Tao moves between history, philosophy, and cutting-edge AI research with total fluency.
Narrative Structure9Kepler as a framing device gives the entire conversation a narrative spine. The flow from historical astronomy → AI for math → Lean formalization → career advice is elegant and natural. Dwarkesh’s prompts are well-calibrated.
Audio Quality9Clean production, consistent levels. Tao’s measured delivery contrasts well with Dwarkesh’s more rapid question style, creating good conversational rhythm.
Evidence & Sources9Tao cites specific examples: 50 Erdős problems solved by AI, the Bode numerical fluke, Copernicus’s model being less accurate than Ptolemy’s. Claims are grounded in historical fact and verifiable data.
Originality10The Kepler-as-high-temperature-LLM framing, the “verification bottleneck” thesis, the distinction between artificial cleverness and artificial intelligence, and the idea that math needs an experimental side are all genuinely novel contributions.

📝 REVIEW SUMMARY

What the Episode Covers

The conversation opens with the story of how Kepler discovered the laws of planetary motion—a narrative that Tao uses as a jumping-off point for the entire rest of the episode. We learn that Kepler began with a beautiful but wrong theory (Platonic solids inscribed between planetary orbits), had to steal data from Tycho Brahe’s descendants, spent twenty years trying random relationships, and only through painstaking data analysis arrived at his three laws of motion. Newton then provided the explanation a century later.

This story becomes the central metaphor: Kepler is an LLM running at high temperature, trying random hypotheses against a verified dataset (Brahe’s observations). The episode’s core thesis emerges from this: AI has driven idea generation to near zero, but the bottleneck in science has shifted to verification, validation, and assessing which ideas actually constitute progress.

The conversation then moves through a series of interconnected themes:

  • The Copernican revolution as ongoing: Copernicus’s heliocentric model was actually less accurate than Ptolemy’s geocentric model. The correct theory survived this “epistemic hell” through judgment and heuristics we still can’t fully articulate—a challenge for any RL-based scientific progress system.

  • AI for math—current state: 50 Erdős problems have been solved with AI assistance, but this was a one-off burst that has since plateaued. The success rate on any given problem is 1-2%, but at scale, that produces impressive-looking wins.

  • Breadth vs. depth: Humans excel at depth, AIs at breadth. Science needs to be restructured to exploit AI’s ability to try all standard techniques across thousands of problems simultaneously.

  • Artificial cleverness vs. artificial intelligence: Tao draws a sharp distinction. AI can jump and fail repeatedly (brute force trial-and-error), but cannot accumulate partial progress—reaching a handhold, staying there, and jumping from there.

  • The Lean question: Could a fully AI-generated proof of the Riemann hypothesis be meaningful if no human understands it? Tao argues that once you have the artifact of a proof, we can do a lot of analysis. He envisions professions of mathematicians who ablate and refactor giant Lean proofs.

  • Formalizing scientific plausibility: Tao wishes for a semi-formal language for scientific strategies (not just proofs), using the prime number theorem and the random model of the primes as examples of the kind of heuristic reasoning that currently resists formalization.

  • The need for serendipity: Tao argues that modern optimization destroys the serendipitous interactions (hallway conversations, library browsing) that drive unexpected breakthroughs.

  • Career advice: Embrace change, stay adaptable, and don’t assume traditional paths will remain the only way to contribute.

Who Created It & Why It Matters

Dwarkesh Patel demonstrates his signature preparation here, asking questions that reveal genuine depth of engagement with Tao’s work. His framing of Kepler as “a high-temperature LLM” is the kind of provocation that elicits the best responses, and his follow-ups on whether the verification loop can be RL’d, whether Lean proofs could be incomprehensible but correct, and when AI will replace mathematicians show he’s thought carefully about these questions before the conversation.

Terence Tao is uniquely positioned for this conversation. He’s simultaneously one of the greatest mathematicians alive, an early and thoughtful adopter of AI tools in his own work, and an unusually gifted communicator who can explain the history of astronomy, the twin prime conjecture, and Lean formalization in plain English. His observation that he’s now “5x” more productive on auxiliary tasks (literature search, numerics, reformatting) but the core of solving the hardest problems hasn’t changed much provides a grounded, first-person account of AI’s actual impact on frontier research.

Core Argument & Evidence

The central thesis is that AI has inverted the bottleneck in science:

  1. Idea generation is solved: AI can generate hypotheses at massive scale, trying random relationships, applying standard techniques, and finding obscure connections in the literature. The 50 Erdős problems solved demonstrate this capability.

  2. Verification is the new constraint: When you can generate a thousand theories per day, human peer review becomes the bottleneck. We don’t have scalable systems for assessing which ideas constitute progress and which are dead ends.

  3. The verification loop can be decades long: Kepler’s correct theory (elliptical orbits) was less accurate than wrong theories for years. Copernicus’s model was less accurate than Ptolemy’s. The “right” theory survives through a mixture of judgment and heuristics we can’t articulate—let alone codify into RL.

  4. Breadth and depth are complementary: AIs can map out entire fields (breadth), identifying all the easy observations, while human experts work on the “islands of difficulty” (depth). But we don’t yet have paradigms for this complementary science.

  5. Formal verification changes everything but doesn’t solve everything: Lean enables automated proof checking and refactoring, but we lack a formal framework for assessing plausibility, strategy, and the kind of heuristic reasoning that drives scientific progress.

🧠 INSIGHTS

Strengths

  • The Kepler-LLM analogy: This framing is precise and illuminating. Kepler trying random hypotheses against Brahe’s data is structurally similar to an LLM generating candidate solutions and checking them against a verified dataset. The analogy extends naturally: Newton’s explanation came a century later, just as deeper understanding may follow AI-generated empirical regularities.

  • Historical depth: The episode is rich with specific historical examples—the Bode numerical fluke, Copernicus vs. Ptolemy accuracy, Darwin vs. Newton in terms of conceptual simplicity vs. time to discovery, Gauss’s data-driven prime number conjecture. Each example illuminates a specific point about AI and science.

  • Honest about AI’s limits: Tao doesn’t oversell. He’s clear that AI currently solves problems at a 1-2% success rate, that the core of mathematical insight hasn’t changed, and that the tools are “complementary, not replacement.” His 2023 prediction about AI being a “trustworthy co-author by 2026” looking good in retrospect adds credibility.

  • Philosophical depth: The discussion of whether a Lean proof of the Riemann hypothesis could be “gobbledygook,” whether we can formalize plausibility, and how to recognize a Descartes-level insight in Lean code goes beyond the typical AI discourse into genuinely open philosophical questions.

  • Personal candor: Tao’s admission that he uses AI agents to fix parenthesis sizes, that writing blog posts is what he does when he doesn’t want to do referee reports, and that he had to “ween himself off computer games” adds warmth and authenticity.

Limitations & Gaps

  • Optimistic on timelines: Tao’s prediction that “within a decade, a lot of things math students do can be done by AI” may prove aggressive, especially for the deeper problems. His own description of AI’s current 1-2% success rate on individual Erdős problems suggests the gap to “replace Terry Tao” is still enormous.

  • Verification alternatives underexplored: The episode identifies verification as the bottleneck but doesn’t deeply explore potential solutions—automated theorem proving as verification, multi-agent debate, or computational checking of empirical claims. What would a scalable verification system actually look like?

  • Software vs. research uplift unclear: Tao’s observation that the proof is instrumental to the intermediate work (not the end goal) in math is interesting but could have been pushed further. If AI can do the intermediate work, does that undermine the value of solving problems at all?

  • Serendipity argument could go further: Tao’s point about destroying serendipity is important but feels underdeveloped. How do you design systems that preserve serendipity while gaining AI’s efficiency? Is this a fundamental tradeoff or an engineering problem?

  • Economics of AI math unexplored: The episode doesn’t discuss who will fund the million-AI-scientist future, how the economics of compute-intensive math research will work, or whether academic institutions can adapt.

How This Connects to Broader Trends

  • The verification crisis: Across AI safety, scientific publishing, and information ecosystems, the cost of generating plausible content has dropped to zero while the cost of verification hasn’t changed. Tao’s diagnosis of this as the core bottleneck in science mirrors concerns in AI alignment (evaluating AI outputs), journalism (fact-checking), and law (reviewing AI-generated documents).

  • Formal verification as a foundation: Lean and formal proof assistants represent a rare case where AI can both generate and verify outputs within a trusted framework. The success of AI on Erdős problems (where the Lean verification is the ground truth) may preview what AI-accelerated science looks like when verification is built in.

  • The changing nature of expertise: Tao’s description of how his own work is changing—more code, more pictures, richer papers, but the core insight generation unchanged—mirrors reports from programmers, writers, and designers. The auxiliary tasks are automated; the essential judgment remains human.

  • From depth to breadth: The idea that science needs to be restructured to exploit AI’s breadth capability—exploring entire fields rather than solving individual deep problems—represents a potential paradigm shift comparable to the introduction of simulation or big data.

🏗️ KEY FRAMEWORKS PRESENTED

The Kepler-LLM Analogy

The structural parallel between Kepler’s discovery process and how LLMs generate solutions.

  • Components:

    • High-temperature hypothesis generation: Kepler tried random relationships (musical notes, Platonic solids, geometric patterns)
    • Verified dataset: Brahe’s observations, ten times more precise than any previous data
    • Long verification loop: It took twenty years for Kepler to work from wrong theories to correct laws
    • Separation of discovery and explanation: Kepler found the laws; Newton explained them a century later
  • Application: When deploying AI for scientific discovery, ensure you have a high-quality verified dataset and accept that the verification loop may be long. Don’t expect AI to generate both empirical regularities and deep explanations simultaneously.

  • Significance: Reframes LLMs not as reasoning engines but as hypothesis generators that need a separate verification system—a useful corrective to both overclaiming and underclaiming about AI capabilities.

  • Evidence: The 50 Erdős problems solved by AI—each involved the AI finding an obscure technique in the literature and applying it to an unsolved problem, then Lean verifying the result.

The Verification Bottleneck Thesis

The argument that idea generation is no longer the constraint on scientific progress; verification is.

  • Components:

    • Idea generation cost: Dropped to near zero with AI (analogous to communication cost dropping with the internet)
    • Verification cost: Remains high, requires human expertise, peer review, and “test of time”
    • Scale mismatch: AI can generate thousands of candidate theories; human reviewers can evaluate a few per year
    • Assessment problem: We lack formal frameworks for assessing plausibility and “constitutes progress” vs. “is correct”
  • Application: Invest in verification infrastructure (formal methods, automated checking, multi-agent debate) rather than just idea generation. Redesign scientific institutions to handle volume.

  • Significance: Predicts that the bottleneck will shift from “who can come up with ideas” to “who can efficiently sort good ideas from bad”—a fundamentally different challenge requiring different skills and institutions.

  • Evidence: Journals flooded with AI submissions; the Erdős problem success stories where Lean verification was the ground truth; the historical examples of correct theories surviving despite being worse than incorrect ones (Copernicus vs. Ptolemy).

Breadth vs. Depth in Scientific Research

The complementary strengths of AI (breadth) and human experts (depth).

  • Components:

    • AI breadth: Can try all standard techniques across thousands of problems simultaneously; excels at mapping fields and finding obscure connections
    • Human depth: Excels at the 20% of a problem that resists standard techniques; can build cumulative understanding; can invent new methods
    • Complementary design: AI maps the field, makes easy observations, identifies “islands of difficulty” for human experts to tackle
    • Missing paradigm: We don’t yet have frameworks for this complementary science; current institutions are built around depth
  • Application: Create broad classes of problems for AI to work on; use AI to systematically try standard techniques; reserve human expertise for the resistant 20%.

  • Significance: Suggests a future of science that looks fundamentally different from today—one where the “experiments” are large-scale AI sweeps and the “theory” is human insight applied to the resistant cases.

  • Evidence: The Erdős problems burst (breadth success); the plateau after low-hanging fruit was picked (depth remains human); Tao’s own papers becoming richer in auxiliary content but unchanged in core insight generation.

Artificial Cleverness vs. Artificial Intelligence

Tao’s distinction between brute-force trial-and-error (cleverness) and cumulative, adaptive problem-solving (intelligence).

  • Components:

    • Artificial cleverness: Jump and fail, jump and fail; scales well; can reach low-hanging fruit; no memory between sessions
    • Artificial intelligence: Jump, reach a handhold, stay there, pull others up, jump from there; builds cumulative understanding; adapts strategy based on partial progress
    • Current state: AIs exhibit cleverness, not intelligence; they cannot accumulate partial progress across attempts
    • Training absorption: Each interaction is 0.001% of future training data; learning is extremely slow and indirect
  • Application: Use AI for tasks that benefit from cleverness (trying standard techniques, finding obscure connections, generating candidate solutions) and reserve human judgment for tasks requiring cumulative progress.

  • Significance: Provides a precise vocabulary for what’s missing in current AI systems—not raw capability, but the ability to build on partial results in a structured, cumulative way.

  • Evidence: Tao’s observation that when AIs work on a problem without solving it, “its own understanding of math has not progressed”; the contrast between the Erdős problems burst (cleverness sufficient) and the plateau on harder problems (intelligence required).

💬 NOTABLE QUOTES

  1. “AI has driven the cost of idea generation down to almost zero, in a very similar way to how the internet drove the cost of communication down to almost zero. It’s an amazing thing, but it doesn’t create abundance by itself. Now the bottleneck is different.” Significance: The episode’s central thesis, stated with maximum clarity. Idea generation is solved; verification is the new constraint.

  2. “The Copernicus theory of the planets was less accurate than Ptolemy’s theory. Geocentrism had been developed for a millennium by that point, and they had made many tweaks and increasingly complicated ad hoc fixes to make it more and more accurate. Copernicus’s theory was a lot simpler but much less accurate.” Significance: A devastating example of why “correctness” is not the right metric for early-stage scientific theories, and why verification loops can be decades or millennia long.

  3. “They excel at breadth, and humans excel at depth. I think they’re very complementary. But our current way of doing math and science is focused on depth because that’s where human expertise is, because humans can’t do breadth. We have to redesign the way we do science to take full advantage of this breadth capability that we now have.” Significance: Points toward a paradigm shift in how science is organized—not replacing humans but restructuring to exploit AI’s different strengths.

  4. “What they can’t do is jump a little bit, reach some handhold, stay there, pull other people up, and then try to jump from there. There isn’t this cumulative process which is built up interactively.” Significance: The most precise articulation of what current AI lacks—not capability, but the ability to accumulate and build on partial progress.

  5. “We live in a particularly unpredictable era. Things that we’ve taken for granted for centuries may not hold anymore. The way we do everything, and not just mathematics, will change.” Significance: Tao’s closing reflection—grounded, honest, and appropriately uncertain. Acknowledges both the excitement and the fear of genuine paradigm change.

  6. “I remember when Google’s web search came out 20 years ago. It just blew all the other searches out of the water… It was amazing, and then after a few years, you just took for granted that you could Google anything. 2026-level AI would be stunning in 2021.” Significance: A useful reminder of how quickly we normalize transformative technology, and how 2026 AI capabilities may seem unremarkable by 2031.

📋 APPLICATIONS & HABITS

Practical Guidance from the Episode

  • For AI Researchers: Focus on verification systems, not just generation. The bottleneck is evaluating candidate solutions at scale, not producing them. Lean and formal methods are a rare case where generation and verification can be coupled.

  • For Mathematicians: Use AI to enrich your papers (more code, more pictures, deeper literature search) but don’t expect it to solve the hardest problems yet. The 1-2% success rate on individual problems means AI is a tool for breadth, not a replacement for depth.

  • For Scientists Across Fields: Consider whether your field has the equivalent of Brahe’s dataset—a verified, high-precision data source against which AI can test hypotheses. Fields without such a dataset may not benefit from AI hypothesis generation.

  • For Academic Institutions: Prepare for AI to flood scientific submissions. Build verification infrastructure and redesign peer review to handle volume. Consider how to preserve serendipity in an increasingly optimized world.

  • For Students: Embrace both traditional education and new tools. AI makes it possible to contribute to frontier research earlier (even at high school level), but foundational knowledge remains essential for the hard problems.

  • For Everyone: Write down what you learn. Tao’s blog came from the frustration of understanding something and then losing it. This habit of recording insights is more valuable than ever as the pace of change accelerates.

Common Pitfalls Mentioned

  • The 1-2% illusion: Looking only at AI success stories on social media creates a wildly inflated impression. On any given problem, current AI has a 1-2% success rate. It’s the scale that produces impressive-looking wins, not individual capability.

  • Confusing cleverness with intelligence: AI can try random things and find low-hanging fruit (cleverness), but cannot build cumulative understanding (intelligence). Don’t mistake one for the other.

  • Destroying serendipity: Over-optimizing your information intake (Google search instead of library browsing, scheduled meetings instead of hallway conversations) removes the randomness that drives unexpected breakthroughs.

  • Ignoring the verification problem: Generating ideas is now cheap; verifying them isn’t. Don’t invest in hypothesis generation infrastructure without investing equally in verification infrastructure.

  • Assuming the core will be automated: Tao’s experience is that AI enriches his papers (auxiliary tasks) but hasn’t changed the core of solving the hardest problems. Don’t assume the hardest parts of your work will be the first to be automated.

📚 REFERENCES & SOURCES CITED

  • Kepler’s Laws of Planetary Motion: Historical account of Kepler’s discovery process, from Platonic solids to elliptical orbits, built on Brahe’s dataset. Primary source for the episode’s central analogy.

  • Tycho Brahe’s Observations: Decades of naked-eye planetary observations, ten times more precise than any previous dataset. The “verified data bank” that made Kepler’s work possible.

  • Copernicus vs. Ptolemy: The well-documented fact that Copernicus’s heliocentric model was initially less accurate than the geocentric model it replaced. Illustrates the long verification loop for correct theories.

  • The Erdős Problems: Over 1,000 open problems posed by Paul Erdős. As of the episode, ~50 have been solved with AI assistance. Demonstrates both the power (breadth) and limits (1-2% success rate) of AI for mathematics.

  • Bode’s Law (Titius-Bode Law): A numerical relationship predicting planetary distances that initially seemed confirmed (Uranus, Ceres) but failed for Neptune. An example of a “numerical fluke” that illustrates the danger of fitting curves to too few data points.

  • The Riemann Hypothesis: One of the Millennium Prize Problems, concerning the distribution of prime numbers. Used as a test case for whether AI-generated proofs could be meaningful.

  • Lean Theorem Prover: A formal proof verification system. Discussed as both a tool for verifying AI-generated proofs and a potential medium for incomprehensible but correct proofs.

  • Gauss’s Prime Number Theorem Conjecture: Gauss computed prime distributions and conjectured the prime number theorem based on statistical patterns—the first “data-driven” conjecture in mathematics.

  • The Random Model of the Primes: A heuristic framework for understanding prime number distribution. Used as an example of the kind of semi-formal reasoning that resists formalization.

  • Darwin’s Origin of Species vs. Newton’s Principia: Darwin’s theory was conceptually simpler but published two centuries later. Used to illustrate how different types of evidence (cumulative/retrospective vs. immediate/predictive) affect the speed of scientific adoption.

🎯 AUDIENCE & RECOMMENDATION

Who Should Listen:

  • Mathematicians & Scientists: Essential. Tao provides a first-person account of how AI is changing the practice of mathematics and offers a framework for thinking about what’s coming next.
  • AI Researchers: The Kepler-LLM analogy and verification bottleneck thesis provide a useful lens for thinking about what AI systems should be optimized for.
  • Science Policymakers: The discussion of verification infrastructure, serendipity, and institutional redesign addresses the systemic challenges of AI-accelerated science.
  • Philosophers of Science: The episodes raises deep questions about the nature of scientific progress, the role of explanation vs. prediction, and whether plausibility can be formalized.
  • Educators: Tao’s career advice and observations about how the nature of mathematical work is changing are directly relevant to how we train the next generation.
  • Anyone interested in the history of science: The Kepler-Brahe-Newton narrative is told with unusual clarity and depth.

Who Should Skip:

  • Casual AI users: If you’re not engaged with the questions of how AI will change science, the philosophical depth may feel abstract.
  • People seeking specific AI product recommendations: This is a conversation about the nature of science, not a tools tutorial.

Optimal Listening Strategy:

  • Speed: 1.25x is comfortable; Tao speaks clearly and at a measured pace.
  • Note-taking: Yes. Specifically track: the Kepler-LLM analogy, the verification bottleneck argument, the breadth vs. depth distinction, and Tao’s career advice.
  • Sections to pause on: The Copernicus vs. Ptolemy discussion (counterintuitive and important), the distinction between artificial cleverness and intelligence (precise vocabulary for what’s missing), the discussion of whether Lean proofs could be incomprehensible but correct (genuinely open question).
  • Follow-up: Watch Tao’s series with 3Blue1Brown on the cosmic distance ladder for more of his thinking on how to extract knowledge from limited data.

Meta Notes: Episode reviewed from transcript and audio. Rating reflects the episode’s intellectual density and originality—the Kepler-LLM analogy alone justifies the listen. Tao’s dual perspective as both a world-class mathematician and an active AI user gives the conversation unusual credibility. The transcript is one of the richest available on the intersection of AI and scientific progress.