Tests Are Productivity: Starting Every Session with pytest
📝 ARTICLE INFORMATION
- Article: Tests Are Productivity: Starting Every Session with pytest
- Date: March 22, 2026
- URL: [Local workflow reflection]
🎯 HOOK
Most developers think of tests as quality assurance—a necessary overhead that pays off in bug prevention. But when you’re working with AI agents, tests become something entirely different: they’re the fastest feedback loop for productivity. Running uv run pytest at the start of every session is about anchoring the agent in reality and getting unstuck in seconds.
💡 ONE-SENTENCE TAKEAWAY
Tests are a productivity tool, and the agents and workflows that treat them as such will outpace those that treat them as compliance checkbox.
📖 SUMMARY
There’s a simple ritual that’s transformed how I work with AI coding agents: before any coding session, before any implementation, before any refactoring: I run uv run pytest. Every single time, no exceptions.
Running tests is about anchoring the agent in a known state.
When an agent starts a session without running tests first, it operates on assumptions. It thinks the code looks like it looked last time. It builds on a mental model that may have drifted. It makes changes that seem reasonable but are actually incompatible with what’s already there. And then it spends 20 minutes debugging failures that existed before it started working.
Running uv run pytest first is a reset. It tells the agent: “This is the current state of reality. These tests pass. Now you can build on top of this foundation with confidence.”
But there’s a deeper shift in thinking that’s harder to implement: tests aren’t just for quality. They’re for productivity. And this distinction changes everything about how you design tests, how you run them, and when you trust them.
🔍 INSIGHTS
Core Insights:
Tests are the agent’s ground truth. When an agent runs tests first, it gets a shared reality with the human. No more “I thought it would work this way.” No more building on incorrect assumptions. The test results are the single source of truth for what the code actually does, not what the documentation says it does.
The first 30 seconds of a session determine the next 30 minutes. Running tests first is a cheap investment with high returns. Thirty seconds of test execution saves thirty minutes of debugging on broken assumptions. This is the fundamental productivity trade-off of TDD in agentic workflows.
Productivity tests and quality tests are different artifacts. A productivity test is designed to run fast, fail fast, and tell you exactly what’s broken. A quality test (integration, property-based, fuzzing) is designed to catch edge cases. Most agent sessions need productivity tests first—fast feedback that keeps the agent moving.
The Enterprise Context Layer needs test anchoring. In the ECL workflow described by Andy Chen, agents synthesize organizational knowledge and maintain it over time. But synthesis without verification drifts. Tests are the verification layer that keeps the ECL codebase honest. When an ECL agent runs tests before synthesizing, it ensures new content doesn’t break existing functionality.
Superpowers enforces this discipline. Jesse Vincent’s Superpowers framework includes a
test-driven-developmentskill that loads before any implementation. The skill doesn’t just prompt “write tests”, it makes TDD a mandatory workflow step. Before the agent writes a single line of implementation, it loads the test pattern, writes a failing test, runs it to confirm failure, then implements. This is discipline as speed.
Broader Connections:
The future of Python is agent-first, test-first. As AI agents become the primary interface for coding, the workflows that work best are the ones designed for agents, not humans. Humans can read code and infer intent. Agents need explicit contracts. Tests are those contracts. The developer who designs their codebase to be agent-navigable via tests, will outpace the one who relies on natural language explanations.
Computer-using agents need test feedback loops. The a16z thesis on computer-using agents describes systems that navigate browsers and GUIs. But the most effective computer-using agent isn’t the one that can click fastest, it’s the one that can verify its work fastest. Tests are that verification layer. An agent that runs tests after every change can iterate faster than one that relies on manual inspection.
The ECL + Superpowers + pytest stack. The three-layer context hierarchy Chen describes (enterprise, team, personal) is powerful. But there’s a fourth layer that’s often missing: the test layer. Tests encode the behavioral contract of your codebase in a way that any agent can read, execute, and act on. Combined with Superpowers for discipline and the ECL for organizational context, test-anchored workflows become the fourth pillar of agentic development.
Tests as the interface between human intent and agent action. In traditional software development, tests verify that code works. In agentic development, tests verify that the agent understood what you wanted. The test is the specification. Running it first confirms what “done” looks like. This reframes tests from “did I break anything?” to “did I build what I meant to build?”
🛠️ FRAMEWORKS & MODES
1. The Test-First Session Pattern:
| Phase | Command | Purpose |
|---|---|---|
| Anchor | uv run pytest | Establish current ground truth |
| Specify | Write failing test | Define what “done” means |
| Implement | Write code | Make test pass |
| Verify | uv run pytest | Confirm success |
| Commit | Git with context | Document the change |
2. Two Types of Tests for Agentic Development:
| Type | Speed | Purpose | When to Run |
|---|---|---|---|
| Productivity Tests | < 1 second | Fast feedback on core functionality | Every session start, every change |
| Quality Tests | > 1 second | Edge cases, integration, property-based | Before commit, CI/CD |
3. The ECL + Superpowers + TDD Integration:
| Layer | Role | Test Integration |
|---|---|---|
| Enterprise Context Layer | Organizational knowledge | Tests verify synthesis doesn’t break existing knowledge |
| Superpowers Skills | Workflow discipline | test-driven-development skill enforces TDD |
| pytest | Execution harness | uv run pytest as session anchor |
| Agent | Action engine | Runs tests before and after every change |
⚡ APPLICATIONS
Practical Guidance:
Run
uv run pytestat the start of every session, no exceptions. Make it a ritual. Before you write any code, before you refactor anything, before you even read the files you’re about to modify; run the tests. This anchors you (and the agent) in reality.Write fast tests that fail fast. Productivity tests should execute in under a second. If your test suite takes 30 seconds to run, you’ve lost the agent’s attention. Slice tests to give fast feedback on the most critical paths.
Use the Superpowers
test-driven-developmentskill. Before any implementation, load the skill. It enforces the discipline: test first, implement, verify. This isn’t bureaucracy, it’s acceleration.Design tests as contracts, not coverage. The question isn’t “are these tests comprehensive?” It’s “can an agent read these tests and know exactly what to build?” Write tests that specify behavior, not tests that exercise code paths.
Anchor your ECL with tests. If you’re maintaining an Enterprise Context Layer, run tests before every synthesis. Verify that new organizational knowledge doesn’t break existing functionality. Tests are the guardrail that keeps your ECL honest.
For Founders and Builders:
Build test-native agent workflows. The agentic coding tools that win won’t be the ones with the best natural language understanding, they’ll be the ones with the best test integration. Fast feedback loops are the competitive advantage.
Invest in test infrastructure as a first-class concern. If your agent can’t run tests, it can’t verify its work. Build test execution into your agent’s core loop, not as an afterthought.
Treat tests as the primary interface. When designing a codebase for agentic development, ask: “can an agent figure out what this does by running tests?” If not, your interface is too human-centric.
Pitfalls to Avoid:
Don’t run tests only when you think something broke. This defeats the purpose. Tests are for productivity, not debugging. Run them when things are working to ensure they stay working.
Don’t treat tests as a quality gate. Quality gates are about stopping bad code from shipping. Productivity tests are about keeping the agent moving fast. Different mindset, different design.
Don’t skip tests for speed. The temptation is real: “I’ll just write the code and test manually.” But manual testing is slower than running tests, every single time. The agent that runs tests is the agent that ships faster.
📚 REFERENCES
Related TMFNK Articles:
- The Enterprise Context Layer — Andy Chen’s synthesis-over-retrieval pattern; tests anchor the ECL by verifying synthesis doesn’t break existing knowledge
- The Rise of Computer Use and Agentic Coworkers — a16z thesis on computer-using agents; tests are the fastest verification layer for agentic workflows
Superpowers Framework:
- Vincent, J. (obra). Superpowers — Agentic skills framework with mandatory
test-driven-developmentskill
Core Tooling:
- pytest — The Python test harness that anchors every session
- uv — Fast Python package manager;
uv run pytestas the universal session start command
Traction Indicators:
- Agentic coding frameworks (Cursor, Claude Code, Roo) that prioritize test integration
- Superpowers adoption for enforcing TDD discipline in agent workflows
- ECL implementations that anchor synthesis with test verification
Crepi il lupo! 🐺