Stop Wasting Tokens: How Senior AI Engineers Cut Their Coding Bills

⬅️ Back to Tutorials

Credit: This guide is based on a thread by @DeRonin_ on X, citing Andrej Karpathy’s observation that 90% of your AI coding bill pays for context you did not need to send.

Karpathy’s number sounds right. Most people using AI coding tools have no idea where their tokens go. The thread lays out exactly which habits burn money and what to replace them with.

Here is the breakdown.

What to Stop Doing

Loading fifty files for a three-line fix. Agents that slurp whole directories spend roughly $1.20 per turn on context that never gets read. That is roughly 80% waste on every request. Rely on grep. Find the relevant file first, then hand it to the model.
Running your most expensive model on cleanup work. Linting, formatting, renaming, small refactors. Paying Opus rates for these means spending $0.60 for work a cheap model does for $0.02. Keep the expensive model for architecture and reasoning. Let the cheap one sweep the floor.
Letting tool loops resend everything on every retry. Each retry in an agentic cycle reloads the full context window. Five retries means five times the cost before you see a result. Fixing retry handling alone drops 30-50% off most bills.
Defaulting to Sonnet for everything. Kimi 2.6 matches Sonnet on most coding tasks at one-sixth the cost. If your default model has not changed since 2025, you are paying 60-70% more than you need to. The best default changed. Update yours.
Streaming responses on repeatable workflows. When the first 90% of your prompt never changes, streaming kills prompt caching. You end up paying ten times what the same work should cost. Cache the stable prefix and serve it as a block.
Including files just in case. A prompt that should be 3,000 tokens balloons to 80,000 because someone threw in every vaguely related file. Context bloat adds up silently. Be precise about what you include.
Rebuilding knowledge every session. Ten minutes writing a SKILL.md once turns a $4 session into a $0.30 one. The model stops rediscovering your conventions and preferences on every run. That ratio gets better the more you use it.
Sticking to one model for everything. Running a premium model on every task is the single most expensive mistake in AI coding right now. Route cheap models to simple tasks and expensive ones to hard problems. That split is where the savings live.
Asking ten small questions one at a time. Each call charges the input prefix separately. Batching ten small questions into one prompt saves 70-90% on routine workflows. The model handles multiple requests just fine.
Keeping every subscription active. Claude Pro, ChatGPT Plus, Cursor Pro, Copilot. Most people actively use one of these. The rest are recurring payments for habits that formed during a trial period. Audit your list. Kill the ghosts.

What Actually Compounds

The people spending least while getting most do these instead.

Context discipline. Grep before fetch. Always. Know which files you need before asking the model to look at anything.
Prompt caching on every stable prefix. Structure repeated workflows so the first 90% of the prompt is identical every time. Cache hits turn those tokens into cents.
Multi-model routing. Kimi 2.6 as the daily driver. Opus for the 10% of tasks that genuinely need it. Everything else falls in between. Match the model to the difficulty.
Graduated skills via SKILL.md files. Write the skill once. Refine it through use. Stop paying the context tax for re-explaining your workflow on every run.
Profiling tool calls before optimizing prompts. Most people tweak wording when the real waste is in how many times a tool re-reads the same files. Profile the retries first. Fix the loops. Then touch the prompt.
The routing mindset. Right model for the right job. Right context for the right scope. Right infrastructure for the right workflow. Senior engineers think in tiers, not defaults.

Stop Wasting Tokens: How Senior AI Engineers Cut Their Coding Bills

What to Stop Doing

What Actually Compounds

Links