Tests Are Productivity: Starting Every Session with pytest

📝 ARTICLE INFORMATION

Article: Tests Are Productivity: Starting Every Session with pytest
Date: March 22, 2026
URL: [Local workflow reflection]

🎯 HOOK

Most developers think of tests as quality assurance—a necessary overhead that pays off in bug prevention. But when you’re working with AI agents, tests become something entirely different: they’re the fastest feedback loop for productivity. Running uv run pytest at the start of every session is about anchoring the agent in reality and getting unstuck in seconds.

💡 ONE-SENTENCE TAKEAWAY

Tests are a productivity tool, and the agents and workflows that treat them as such will outpace those that treat them as compliance checkbox.

📖 SUMMARY

There’s a simple ritual that’s transformed how I work with AI coding agents: before any coding session, before any implementation, before any refactoring: I run uv run pytest. Every single time, no exceptions.

Running tests is about anchoring the agent in a known state.

When an agent starts a session without running tests first, it operates on assumptions. It thinks the code looks like it looked last time. It builds on a mental model that may have drifted. It makes changes that seem reasonable but are actually incompatible with what’s already there. And then it spends 20 minutes debugging failures that existed before it started working.

Running uv run pytest first is a reset. It tells the agent: “This is the current state of reality. These tests pass. Now you can build on top of this foundation with confidence.”

But there’s a deeper shift in thinking that’s harder to implement: tests aren’t just for quality. They’re for productivity. And this distinction changes everything about how you design tests, how you run them, and when you trust them.

🔍 INSIGHTS

Core Insights:

Tests are the agent’s ground truth. When an agent runs tests first, it gets a shared reality with the human. No more “I thought it would work this way.” No more building on incorrect assumptions. The test results are the single source of truth for what the code actually does, not what the documentation says it does.
The first 30 seconds of a session determine the next 30 minutes. Running tests first is a cheap investment with high returns. Thirty seconds of test execution saves thirty minutes of debugging on broken assumptions. This is the fundamental productivity trade-off of TDD in agentic workflows.
Productivity tests and quality tests are different artifacts. A productivity test is designed to run fast, fail fast, and tell you exactly what’s broken. A quality test (integration, property-based, fuzzing) is designed to catch edge cases. Most agent sessions need productivity tests first—fast feedback that keeps the agent moving.
The Enterprise Context Layer needs test anchoring. In the ECL workflow described by Andy Chen, agents synthesize organizational knowledge and maintain it over time. But synthesis without verification drifts. Tests are the verification layer that keeps the ECL codebase honest. When an ECL agent runs tests before synthesizing, it ensures new content doesn’t break existing functionality.
Superpowers enforces this discipline. Jesse Vincent’s Superpowers framework includes a test-driven-development skill that loads before any implementation. The skill doesn’t just prompt “write tests”, it makes TDD a mandatory workflow step. Before the agent writes a single line of implementation, it loads the test pattern, writes a failing test, runs it to confirm failure, then implements. This is discipline as speed.

Broader Connections:

The future of Python is agent-first, test-first. As AI agents become the primary interface for coding, the workflows that work best are the ones designed for agents, not humans. Humans can read code and infer intent. Agents need explicit contracts. Tests are those contracts. The developer who designs their codebase to be agent-navigable via tests, will outpace the one who relies on natural language explanations.
Computer-using agents need test feedback loops. The a16z thesis on computer-using agents describes systems that navigate browsers and GUIs. But the most effective computer-using agent isn’t the one that can click fastest, it’s the one that can verify its work fastest. Tests are that verification layer. An agent that runs tests after every change can iterate faster than one that relies on manual inspection.
The ECL + Superpowers + pytest stack. The three-layer context hierarchy Chen describes (enterprise, team, personal) is powerful. But there’s a fourth layer that’s often missing: the test layer. Tests encode the behavioral contract of your codebase in a way that any agent can read, execute, and act on. Combined with Superpowers for discipline and the ECL for organizational context, test-anchored workflows become the fourth pillar of agentic development.
Tests as the interface between human intent and agent action. In traditional software development, tests verify that code works. In agentic development, tests verify that the agent understood what you wanted. The test is the specification. Running it first confirms what “done” looks like. This reframes tests from “did I break anything?” to “did I build what I meant to build?”

🛠️ FRAMEWORKS & MODES

1. The Test-First Session Pattern:

Phase	Command	Purpose
Anchor	`uv run pytest`	Establish current ground truth
Specify	Write failing test	Define what “done” means
Implement	Write code	Make test pass
Verify	`uv run pytest`	Confirm success
Commit	Git with context	Document the change

2. Two Types of Tests for Agentic Development:

Type	Speed	Purpose	When to Run
Productivity Tests	< 1 second	Fast feedback on core functionality	Every session start, every change
Quality Tests	> 1 second	Edge cases, integration, property-based	Before commit, CI/CD

3. The ECL + Superpowers + TDD Integration:

Layer	Role	Test Integration
Enterprise Context Layer	Organizational knowledge	Tests verify synthesis doesn’t break existing knowledge
Superpowers Skills	Workflow discipline	`test-driven-development` skill enforces TDD
pytest	Execution harness	`uv run pytest` as session anchor
Agent	Action engine	Runs tests before and after every change

⚡ APPLICATIONS

Practical Guidance:

Run uv run pytest at the start of every session, no exceptions. Make it a ritual. Before you write any code, before you refactor anything, before you even read the files you’re about to modify; run the tests. This anchors you (and the agent) in reality.
Write fast tests that fail fast. Productivity tests should execute in under a second. If your test suite takes 30 seconds to run, you’ve lost the agent’s attention. Slice tests to give fast feedback on the most critical paths.
Use the Superpowers test-driven-development skill. Before any implementation, load the skill. It enforces the discipline: test first, implement, verify. This isn’t bureaucracy, it’s acceleration.
Design tests as contracts, not coverage. The question isn’t “are these tests comprehensive?” It’s “can an agent read these tests and know exactly what to build?” Write tests that specify behavior, not tests that exercise code paths.
Anchor your ECL with tests. If you’re maintaining an Enterprise Context Layer, run tests before every synthesis. Verify that new organizational knowledge doesn’t break existing functionality. Tests are the guardrail that keeps your ECL honest.

For Founders and Builders:

Build test-native agent workflows. The agentic coding tools that win won’t be the ones with the best natural language understanding, they’ll be the ones with the best test integration. Fast feedback loops are the competitive advantage.
Invest in test infrastructure as a first-class concern. If your agent can’t run tests, it can’t verify its work. Build test execution into your agent’s core loop, not as an afterthought.
Treat tests as the primary interface. When designing a codebase for agentic development, ask: “can an agent figure out what this does by running tests?” If not, your interface is too human-centric.

Pitfalls to Avoid:

Don’t run tests only when you think something broke. This defeats the purpose. Tests are for productivity, not debugging. Run them when things are working to ensure they stay working.
Don’t treat tests as a quality gate. Quality gates are about stopping bad code from shipping. Productivity tests are about keeping the agent moving fast. Different mindset, different design.
Don’t skip tests for speed. The temptation is real: “I’ll just write the code and test manually.” But manual testing is slower than running tests, every single time. The agent that runs tests is the agent that ships faster.

📚 REFERENCES

Related TMFNK Articles:

The Enterprise Context Layer — Andy Chen’s synthesis-over-retrieval pattern; tests anchor the ECL by verifying synthesis doesn’t break existing knowledge
The Rise of Computer Use and Agentic Coworkers — a16z thesis on computer-using agents; tests are the fastest verification layer for agentic workflows

Superpowers Framework:

Vincent, J. (obra). Superpowers — Agentic skills framework with mandatory test-driven-development skill

Core Tooling:

pytest — The Python test harness that anchors every session
uv — Fast Python package manager; uv run pytest as the universal session start command

Traction Indicators:

Agentic coding frameworks (Cursor, Claude Code, Roo) that prioritize test integration
Superpowers adoption for enforcing TDD discipline in agent workflows
ECL implementations that anchor synthesis with test verification

Crepi il lupo! 🐺