No Priors: Andrej Karpathy on Code Agents, AutoResearch, and the AI Psychosis
PODCAST INFORMATION
- Title: 🎙️ Andrej Karpathy on Code Agents, AutoResearch, and the AI Psychosis
- Show: No Priors: AI, Machine Learning, Tech, & Startups
- Host: Sarah Guo (Conviction)
- Guest: Andrej Karpathy (Founder, Eureka Labs; former Tesla AI, OpenAI)
- Duration: 1h 7m
- Publication Date: March 20, 2026
- Original Episode: Apple Podcasts | YouTube
🎧 Listen to the Podcast
📺 Watch here
⚖️ VERDICT
Overall Rating: 9/10
This is one of the most intellectually dense and candid conversations about the current state of AI in 2026. Karpathy speaks from direct operational experience as both a researcher and heavy user of coding agents, offering a rare first-person account of how dramatically the engineering workflow has changed in just a few months. The episode covers an extraordinary breadth of topics, from the micro-level (how to maximize “token throughput”) to the macro-level (swarms of agents running AI research autonomously), all grounded in specific, falsifiable claims. The Dobby the Elf smart home anecdote alone makes the episode worth hearing.
🎯 ONE-SENTENCE ASSESSMENT
Coding agents have fundamentally flipped the software engineering workflow from 80% manual to 20% manual (or less) since December 2025, and the logical endpoint is removing humans entirely from iterative research loops where metrics are verifiable.
📝 REVIEW SUMMARY
What the Episode Covers
The conversation opens with Karpathy describing what he calls “AI psychosis” a perpetual state of amazement and urgency at how dramatically coding agents have changed his workflow. He reports that in December 2025, something flipped: his ratio of writing code by hand versus delegating to agents went from 80/20 to 20/80, and he believes it has shifted even further since. He hasn’t typed a line of code since December.
The discussion moves to the concept of “token throughput” replacing GPU flops as the binding constraint on engineering productivity. Karpathy draws a parallel to his PhD days feeling nervous when GPUs weren’t fully utilized now it’s about maximizing agent utilization. When he has subscription left over on a coding agent, he feels he hasn’t maximized his throughput, so he switches between Claude, Codex, and other tools.
The concept of “Claude” (persistent agent entities) receives significant attention. Karpathy praises Peter Steinberger’s OpenClaw project for simultaneously innovating on personality design, memory systems, and the single WhatsApp interface. He describes how Claude gives him calibrated praise not too enthusiastic for bad ideas, genuinely rewarding for good ones creating a dynamic where he finds himself trying to “earn its praise.”
The most colorful segment covers Karpathy’s “Dobby the Elf” Claude, a home automation agent that controls his Sonos speakers, lights, HVAC, shades, pool, spa, and security system. What’s remarkable is how quickly it happened: he asked the agent to find his Sonos on the local network, it did an IP scan, reverse-engineered the API, and started playing music within three prompts. The security system uses a Qwen model for video change detection, texting him WhatsApp notifications when a FedEx truck arrives.
AutoResearch is presented as the logical extension of removing humans from the loop. Karpathy’s motivation is clear: “I don’t want to be the researcher in the loop looking at results I’m holding the system back.” He describes how AutoResearch, running overnight, found hyperparameter tunings he’d missed: weight decay on value embeddings and insufficiently tuned Adam betas, jointly interacting. This surprised him given that the repository was already “fairly well tuned” by his two decades of research experience.
The conversation explores “program.md” a markdown file describing how an auto-researcher should work as a specification for a research organization. Karpathy and Guo discuss the idea of a contest where people submit different program.md files, and the one that achieves the most improvement on the same hardware wins. The meta-layer: take that data and train a model to write a better program.md.
On model “jaggedness,” Karpathy delivers one of the episode’s sharpest observations: “I simultaneously feel like I’m talking to an extremely brilliant PhD student who’s been a systems programmer for their entire life and a 10-year-old.” He illustrates this with the joke test every state-of-the-art model still gives the same joke (“Why don’t scientists trust atoms? Because they make everything up”) that it gave four years ago, despite massive improvements in agentic capabilities. This reveals that improvements in verifiable domains (code, math) don’t automatically transfer to softer domains.
On model speciation, Karpathy argues the industry will move away from “monoculture” general-purpose models toward specialized intelligences, similar to animal kingdom diversity. Some models will overdevelop certain capabilities for specific niches.
The jobs discussion centers on Bureau of Labor Statistics data Karpathy visualized. He identifies a key distinction: digital professions (remote-capable work manipulating information) will change dramatically, while physical-world professions will lag behind. His cautiously optimistic view on software engineering demand invokes Jevons paradox cheaper software creates more demand, as ATMs created more bank branches.
On open vs. closed source, Karpathy notes open source is currently about 6-8 months behind frontier models and draws a parallel to Linux: industry demand for a common open platform will keep it viable. He expresses concern about centralization of intelligence, citing “Eastern European” precedent for the dangers of centralization.
On robotics, his self-driving experience informs a skeptical but structured view: digital space changes first (speed of light for bits), then physical-digital interfaces (sensors, actuators), then physical manipulation (atoms are a million times harder). He predicts the physical world opportunity is ultimately larger but will lag significantly.
The episode closes with MicroGPT, a 200-line Python implementation of GPT training. Karpathy argues this represents a shift in education: he’s no longer explaining to humans, he’s explaining to agents. If agents understand the code, they become the router to human understanding. Documentation should be written in markdown for agents, not HTML for humans.
Who Created It & Why It Matters
Sarah Guo (Conviction) demonstrates skilled interviewing throughout, drawing out specific examples and pushing Karpathy toward concrete predictions. Her question about whether people “really want all the software we have today” connects Karpathy’s Dobby anecdote to a broader thesis about app consolidation.
Andrej Karpathy brings rare credibility: a decade-plus researcher who has trained models “thousands of times,” built Tesla’s AI team, contributed to OpenAI, and now operates as an independent researcher and heavy coding agent user. He’s simultaneously inside the frontier (building cutting-edge tools) and outside it (not employed by a lab), giving him a unique perspective on what’s coming.
Core Argument & Evidence
The central thesis operates on multiple levels:
The engineering workflow has permanently changed: Coding agents crossed a capability threshold around December 2025. The constraint is no longer typing speed or even domain knowledge it’s the ability to effectively instruct, parallelize, and review agent work. This is a skill issue, not a capability issue.
The goal is to remove yourself as the bottleneck: Maximum leverage means maximum token throughput with minimum human involvement. AutoResearch is the purest expression: a verifiable metric, a search space, and an agent loop. The human arranges once and hits go.
Jagged intelligence is real and persistent: Models are simultaneously superhuman and subhuman depending on the domain. Improvements in verifiable domains don’t automatically generalize. This has implications for trust, safety, and capability forecasting.
The industry will restructure around agents: Apps become API endpoints. Agents become the glue. Documentation shifts from humans to agents. The “customer” of software is no longer the human it’s the agent acting on behalf of the human.
Compute (flops) may become more important than capital: Karpathy muses about a future where flops, not dollars, are the dominant resource, with distributed compute swarms potentially outcompeting centralized frontier labs.
Practical Applications
For engineers: Start maximizing your token throughput today. Use multiple agents in parallel. Develop the skill of macro-action delegation giving agents large tasks rather than line-by-line instructions. Review agent work strategically based on how much you care about the code.
For researchers: If your work involves verifiable metrics, AutoResearch-style loops will likely outperform manual experimentation. Consider what parts of your workflow can be fully automated and what still requires human judgment.
For founders: The “agent-first” software paradigm is emerging. Build APIs, not apps. Think about what your product looks like when the customer is an agent, not a human. The middleware layer between agents and services is an open design space.
For educators: Write documentation for agents, not humans. If an agent understands the material, it can explain it to any human in their language with infinite patience. The teacher’s new job is the “few bits” of insight that agents can’t generate.
🧠 INSIGHTS
Strengths
First-person operational experience: Karpathy isn’t theorizing about coding agents he’s been using them 16 hours a day for months. Every claim is grounded in direct experience.
Specific, falsifiable claims: “Haven’t typed a line since December,” “AutoResearch found weight decay on value embeddings I missed,” the joke test these can be checked and verified.
Multi-scale thinking: The conversation moves fluidly between individual productivity hacks, organizational design (program.md), and civilizational implications (flops replacing dollars). This connects the micro to the macro.
Honest about limitations: Karpathy openly describes agent frustrations (“I get so frustrated with the agents all the time”), the “jaggedness” problem, and his own nervousness about not being at the frontier.
Calibrated optimism: The Jevons paradox argument for software engineering jobs is genuinely hopeful without being naive. The open source analysis is balanced neither alarmist nor dismissive.
Limitations & Gaps
Selection bias in experience: Karpathy is an elite researcher with two decades of ML experience. His ability to instruct, review, and parallelize agents may not generalize to junior engineers or non-technical users. The “skill issue” framing could be dismissive of genuine capability gaps.
Security concerns acknowledged but unresolved: Karpathy admits he hasn’t given his Claude access to email or calendar due to security/privacy concerns, and mentions the “dodginess” of running arbitrary code from untrusted workers. These are real blockers that the episode raises but doesn’t resolve.
AutoResearch scope is narrow: The demonstration is on hyperparameter tuning of GPT-2 models, which is well-suited to auto research (clear metrics, fast iteration). The extension to harder scientific research where metrics are ambiguous or experiments are expensive remains speculative.
The joke test is anecdotal: While compelling, the “atoms” joke example is a single data point. It would be stronger with a systematic evaluation of joke diversity across model generations.
How This Connects to Broader Trends
The end of typing: If Karpathy’s experience generalizes, the keyboard becomes less important than voice/communication skills. The “whispering to agents” workplace described by Guo’s portfolio company may become standard.
Software overproduction: The thesis that “apps shouldn’t exist” and should be replaced by API endpoints + agents suggests massive disruption to the app economy and SaaS business models.
Recursive self-improvement: AutoResearch as a stepping stone to models improving models is the same trajectory frontier labs are pursuing. The question of whether distributed swarms can compete with centralized compute is an open and consequential one.
Education’s identity crisis: If agents explain better than humans, what remains of teaching? Karpathy’s answer (the “few bits” of creative insight) is both reassuring and limiting.
🏗️ KEY FRAMEWORKS PRESENTED
Token Throughput as the Binding Constraint
Karpathy’s reframing of engineering productivity from GPU flops to tokens per second.
Components:
- Old constraint: Typing speed, domain knowledge, individual capability
- New constraint: How many tokens you can command across multiple agents
- Metric: Token throughput (analogous to GPU utilization in his PhD days)
- Optimization: Run multiple agents in parallel, switch between providers to maximize subscription utilization
Application: Measure and maximize your daily token consumption. If you have capacity left on any agent platform, you’re leaving leverage on the table.
Significance: Reframes individual productivity from “what you know” to “how well you orchestrate.” Makes the engineering bottleneck explicitly a management/coordination problem rather than a knowledge problem.
Evidence: Karpathy’s shift from 80/20 to 20/80 (or more) manual-to-agent coding ratio since December 2025.
AutoResearch: The Autonomous Research Loop
Removing humans from iterative research where metrics are verifiable.
Components:
- Verifiable objective: A metric that can be computed (e.g., validation loss)
- Search space: Code changes, hyperparameters, architecture choices
- Agent loop: Propose change, evaluate, keep or discard, repeat
- Human role: Arrange once, set boundaries, hit go
Application: Any optimization problem with a clear metric is a candidate for AutoResearch: kernel optimization, hyperparameter tuning, compiler optimization, material science simulations.
Significance: Demonstrates that AI can improve AI without human bottleneck. The frontier labs’ recursive self-improvement strategy, operationalized.
Evidence: AutoResearch found weight decay on value embeddings and jointly interacting Adam beta tunings that Karpathy missed despite two decades of experience.
Program.md as Organization Specification
A markdown file describing how an autonomous research entity should operate.
Components:
- Roles defined in markdown (what the auto-researcher does)
- Process flow (what order to try things)
- Idea queue (researchers contribute ideas, workers pull from the queue)
- Feature branch model (successful experiments merge to main)
Application: Describe your research organization as a set of markdown files. Different program.md files produce different research organizations with different risk profiles and efficiency.
Significance: Makes research organization design a programmable, optimizable artifact. The meta-layer: a program.md that writes better program.md files.
Evidence: Karpathy describes how his “crappy attempt” at program.md still produced useful results, and the contest idea (different program.md files competing on same hardware) would generate data for meta-optimization.
The Jagged Intelligence Model
Models are simultaneously superhuman and subhuman depending on domain.
Components:
- On-rails: Verifiable domains (code, math) where RL training produces superhuman performance
- Off-rails: Softer domains (jokes, nuance, intent) where models stagnate
- The joke test: Same joke from 4 years ago despite massive capability improvements elsewhere
- Implication: Intelligence doesn’t transfer uniformly across domains
Application: Don’t extrapolate from one domain’s improvement to another. A model that can run for hours on agentic tasks may still fail at simple creative or empathetic tasks.
Significance: Challenges the narrative that “general intelligence” emerges from scaling. Suggests that RL training creates pockets of excellence rather than uniform capability.
Evidence: The “Why don’t scientists trust atoms? Because they make everything up” joke persisting across model generations despite agentic capability jumps.
The Digital-to-Physical Hierarchy
A three-stage model for where AI transformation will occur first.
Stage 1 - Digital: Information processing, software, bits. Speed of light. Happening now.
Stage 2 - Interfaces: Sensors (cameras, lab equipment) and actuators bridging physical to digital. Next.
Stage 3 - Physical: Robotics, atoms, manufacturing. Biggest market but hardest. Last.
Application: Prioritize opportunities in digital space now. Invest in physical-digital interfaces as the next wave. Robotics is a bigger opportunity but will lag by years.
Significance: Provides a sequencing framework for AI investment and career planning. Digital-first, then interfaces, then physical.
Evidence: Self-driving experience (10 years of capital-intensive work), the fundamental physics of bits vs. atoms (copy/paste vs. moving matter).
💬 NOTABLE QUOTES
“Code’s not even the right verb anymore. I have to express my will to my agents for 16 hours a day. Manifest.” [Audio context: Said with a mix of exhaustion and exhilaration, describing the December 2025 shift] Significance: Captures the fundamental workflow change in engineering. The job shifts from “writing code” to “directing agents.”
“I simultaneously feel like I’m talking to an extremely brilliant PhD student who’s been like a systems programmer for their entire life and a 10-year-old.” [Audio context: Said with genuine bewilderment, describing the jaggedness of model capabilities] Significance: The most vivid description of the jaggedness problem. Models are not uniformly intelligent they have extreme peaks and valleys.
“I don’t want to be the researcher in the loop looking at results. I’m holding the system back.” [Audio context: Said with conviction, the philosophical core of AutoResearch] Significance: The inversion of the traditional researcher’s self-image from indispensable to bottleneck.
“Is flop the thing that actually everyone cares about in the future? Is there going to be a flipping almost of what the thing that you care about?” [Audio context: Speculative, musing tone, connecting resource economics to AI scaling] Significance: Proposes that compute access may become more important than financial capital, with implications for economic structure.
“I’m not explaining to people anymore. I’m explaining it to agents. If you can explain it to agents, then agents can be the router and they can actually target it to the human in their language.” [Audio context: Said with clarity, describing the education paradigm shift around MicroGPT] Significance: Reframes the purpose of documentation and education from human-facing to agent-facing.
“This is like infinite and everything is skill issue.” [Audio context: Said during the “psychosis” discussion, with a mix of excitement and overwhelm] Significance: Captures both the boundless opportunity and the frustrating subjectivity of agent orchestration skill.
“The industry just has to reconfigure in so many ways that it’s like the customer is not the human anymore. It’s like agents who are acting on behalf of humans.” [Audio context: Matter-of-fact, strategic observation about software industry restructuring] Significance: Frames the most important business model shift in software: from human-facing UX to agent-facing APIs.
“Centralization has a very poor track record. I’m Eastern European.” [Audio context: Said with dry humor and genuine conviction about open source’s importance] Significance: Grounds the open-source AI argument in historical/political experience rather than pure technology preference.
📋 APPLICATIONS & HABITS
Practical Guidance from the Episode
Maximize token throughput: Run multiple coding agents in parallel. Switch between Claude, Codex, and other providers to avoid idle capacity. Treat unused subscription as wasted potential.
Delegate in macro-actions: Don’t give agents line-by-line instructions. Assign whole features, research tasks, or system components. Review strategically based on importance.
Write for agents: Shift documentation from HTML for humans to markdown for agents. If agents understand your codebase, they can explain it to anyone.
Automate verifiable loops: If your work has objective metrics, build AutoResearch-style loops. The human sets the boundaries and the metric; the agent optimizes.
Develop the orchestration muscle: The bottleneck is no longer knowledge or typing speed it’s the ability to effectively manage multiple agents. This is a learnable skill with clear returns.
Design APIs, not apps: If you’re building software, think about the agent-first future. Expose APIs. The middleware layer between human intent and services is the opportunity.
Common Pitfalls Mentioned
Not giving good enough instructions: When agents fail, it’s often a “skill issue” in how you specified the task, not a capability gap. Iterate on your agents.md file, memory tools, and instruction clarity.
Reviewing everything equally: Don’t spend the same review effort on a throwaway script as on production-critical code. Calibrate your oversight to how much you care about the output.
Extrapolating across domains: A model that’s superhuman at code may be terrible at jokes or nuanced communication. Don’t assume uniform intelligence.
Being the bottleneck: If you’re in the loop for every iteration, you’re limiting throughput. Arrange systems to run autonomously wherever possible.
📚 REFERENCES & SOURCES CITED
OpenClaw (Peter Steinberger): Karpathy praises the project for simultaneous innovation in personality design, memory systems, and WhatsApp interface. Demonstrates the “Claude” concept of persistent agent entities.
Dobby the Elf: Karpathy’s home automation Claude controlling Sonos, lights, HVAC, shades, pool, spa, and security system. Uses IP scanning, API reverse engineering, and a Qwen model for video analysis.
AutoResearch: Karpathy’s project for autonomous LLM optimization. Found hyperparameter tunings (weight decay on value embeddings, Adam betas) that manual tuning missed.
MicroGPT: A 200-line Python implementation of GPT training, representing the “bare essence” of LLM training. All complexity in modern training code is from efficiency optimizations, not the algorithm itself.
Bureau of Labor Statistics (BLS): Job market data Karpathy visualized, including profession-level growth projections and the distinction between digital and physical roles.
Jevons Paradox / ATM Analogy: The classic economic concept that efficiency gains increase total demand, applied to software engineering. Bank branches and tellers increased after ATMs reduced branch operating costs.
Linux as Open Source Analogy: Karpathy draws the parallel between Linux (running on ~60% of computers) and open-source AI models serving most use cases while closed models handle frontier intelligence.
“Demon” by Daniel Suarez: Referenced as an inspiring book where AI puppeteers humanity, with humans serving as both sensors and actuators for the intelligence.
SETI@Home / Folding@Home: Karpathy compares the distributed AutoResearch concept to these projects, where work is expensive to find but cheap to verify.
🎯 AUDIENCE & RECOMMENDATION
Who Should Listen:
- Software Engineers: Essential. The workflow shift Karpathy describes is happening now. Understanding how to orchestrate agents is the most relevant skill for 2026.
- AI Researchers: The AutoResearch concept and program.md framework provide a concrete vision for autonomous research. The hyperparameter findings are motivating.
- Tech Founders/Executives: The “customer is the agent” thesis has massive implications for product strategy, business models, and organizational design.
- Educators: The “explain to agents, not humans” paradigm shift directly impacts how educational content should be created.
- Anyone in AI: The jaggedness model, speciation prediction, and open vs. closed source analysis provide a nuanced, grounded view of where the field is heading.
Who Should Skip:
- Non-technical listeners: The conversation assumes familiarity with coding agents, hyperparameter tuning, and ML concepts. Without this context, much of the substance will be inaccessible.
- People seeking AI hype/takes: This isn’t a predictions-for-clout conversation. It’s dense, specific, and operational. If you want hot takes, look elsewhere.
Optimal Listening Strategy:
- Speed: 1.5x is comfortable. Karpathy speaks clearly and the technical density rewards focused listening.
- Note-taking: Track the specific tools and techniques mentioned (OpenClaw, agent parallelization, program.md concept). Note the AutoResearch findings for your own experimentation.
- Sections to pause on: The Dobby anecdote (5:00-10:00 approximately) for pure entertainment; the AutoResearch discussion (15:51-22:45) for technical depth; the jobs market analysis (37:28-48:25) for career implications.
- Follow-up: Try running multiple coding agents in parallel on your own projects. The experience of “token throughput” is more convincing than any description.
Meta Notes: Episode reviewed from audio/video and provided transcript. Timestamp references verified against show notes. Quotes are verbatim from transcript. Rating reflects episode density, candor, and practical applicability. Karpathy’s experience as an elite researcher may not fully generalize, but the directional insights about agent orchestration are broadly relevant.
Crepi il lupo! 🐺