Stanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence

⬅️ Back to Videos

Stanford CS153: Frontier Systems, Spring 2026 Guest Lecturer: Jensen Huang, CEO of NVIDIA Instructors: Anjney Midha (AMP PBC) & Mike Abbott (ex-Apple/Twitter/KP)


I’m not sure there’s a better window into where NVIDIA is heading than this 65-minute lecture. CS153 has become the most oversubscribed class at Stanford with 500 students, a waitlist, and a speaker roster that reads like a Davitz dinner table. Jensen showed up and basically gave a masterclass on why computing as we know it is ending.

His opening claim stopped me: computing is being reinvented for the first time in 64 years. Since the IBM System/360, we’ve been running prerecorded instructions. Software was a frozen set of commands. Jensen’s argument is that we’ve flipped to real-time generation, models don’t execute code so much as generate answers on the fly, and the entire hardware stack has to be rebuilt around that reality.

The numbers are obscene. NVIDIA’s co-design across chips, compilers, switches, and networking delivered a million-fold speedup over the last decade. Moore’s Law over that same period would’ve given you 100x. Jensen’s point isn’t that Moore’s Law is dead, it’s that it was never enough for this problem.


The Architecture Story

Jensen walked through NVIDIA’s silicon roadmap in a way I haven’t heard him do before. Each generation maps to a specific phase of the AI lifecycle:

Hopper: built for pre-training. Get the weights right.

Grace Blackwell NVLink72: designed for inference and decode. The insight here is that token generation is memory-bandwidth bound. One chip can’t feed itself fast enough, so they ganged up 72 of them with a custom interconnect. Result: 50x improvement over Hopper in two years. Moore’s Law would have done 2x.

Vera Rubin: built for agents. Jensen’s framing: “The goal isn’t just to think. The goal is to do work.” Agents have a different compute pattern than models. They need persistent context, tool use, and the ability to iterate. Vera Rubin is the first NVIDIA architecture designed from scratch for that workload.

Feynman (in development): designed for swarms of agents and sub-agents. Jensen mentioned this almost in passing, but it’s the most telling part of the talk. NVIDIA is already thinking about what comes after agents: systems where agents spawn other agents, where the compute pattern is less like a single brain and more like a colony.


MFU Is a Distraction

Jensen pushed back on Model FLOP Utilization as a metric. His argument: MFU measures how hard you’re using the silicon, not whether you’re solving the right problem. He prefers tokens-per-watt and real-world evals. This matters because the industry has been obsessing over utilization numbers while missing the actual goal; useful work per unit of energy.

I think he’s right. MFU has become a vanity metric. Everyone wants to say they’re hitting 60% utilization. But if your architecture is wrong, 60% of the wrong thing is still wrong.


Open Models and the Democratization Argument

He defended NVIDIA’s open model strategy (Nemotron, BioNemo, Alpamayo) on three grounds:

Safety. Closed models create a monoculture. If one architecture has a vulnerability, everyone running that model is exposed. Open models let researchers find and fix problems independently.

Transparency. You can’t audit what you can’t see. For high-stakes applications: medicine, law, critical infrastructure; black-box models are a liability.

Democratization. Most languages don’t have GPT-level support. Most scientific domains don’t have specialized models. Open models let the long tail fill itself in.

This is self-serving, of course. NVIDIA sells chips whether the model is open or closed. But it’s also true.


The Energy Question

Jensen’s energy projection: roughly 1000x more compute power needed than we currently have.

He framed this as the strongest market signal in history for sustainable energy investment. Not a prediction. A necessity. The grid has to grow, nuclear has to scale, and the efficiency curve has to bend faster than token demand grows.

I keep coming back to this number. A thousand times. We’re not optimizing our way out of that. We’re building our way out.


What Stuck With Me

Jensen talks about NVIDIA’s organizational structure the same way he talks about their chips, as a system design problem. Extreme co-design applies to the company too. Flat structure, direct communication, no middle management filtering information. He doesn’t present this as a management philosophy. He presents it as an engineering requirement. If information moves slowly, the chips will be wrong.


Crepi il lupo! 🐺