Feedback Loops Are Everything: Lessons from 4 Months of Building with AI Agents

Feedback Loops Are Everything: Lessons from 4 Months of Building with AI Agents
AI Agent and Human Working Side by Side

I've spent the last four months building securegpt.ru — a full-stack product with multiple microservices and LLM integrations. Almost all of the code was written by Claude Code.

When I started, I didn't have much experience working with AI coding agents. I'd used them here and there, but never as the primary way of building something. The scope of the project didn't really leave me a choice — there was more work than I could realistically handle by writing everything myself, so I leaned into it out of necessity. That turned out to be a good thing, because it forced me to figure out what actually works pretty quickly.

Along the way I've developed a set of personal systems and habits — from how I write specs before touching any code, to the feedback loops I build so agents can verify their own work. In this article I want to share the lessons I've picked up and the toolkit that's made this way of working practical for me.

Start with the spec

The single biggest improvement to my workflow had nothing to do with prompting or tooling — it was the decision to always write a proper specification before any code gets written.

Early on I'd just describe what I wanted and let the agent go. It worked for small things, but for anything non-trivial I'd end up correcting course constantly — re-explaining context, catching misunderstandings late, spending more time on fixes than the original task would have taken by hand. The problem wasn't the agent. The problem was that I hadn't thought things through well enough myself.

So I started writing specs. Not rough notes — actual structured documents. A product requirements doc describing what the feature should do and why, followed by a technical spec covering the implementation approach. I eventually standardized these into templates that I reuse across the project, which made the habit easier to maintain.

The process of writing the spec is itself valuable. I usually start by discussing the idea with Claude in a regular conversation — poking at edge cases, asking what I might be missing, iterating on the approach. This stage is where you can't cut corners. Go back and forth, challenge your own assumptions, keep pushing until you're genuinely satisfied — not just "good enough" satisfied. It feels slow, but this is where you're actually locking in your own understanding of what needs to be built. If you're fuzzy on the spec, the agent will be fuzzy on the implementation, and you'll pay for it later in corrections that take longer than the spec work would have.

I'd also recommend developing your own spec template. Mine has a PRD section and a technical spec section with a consistent structure I reuse across features. Having a template makes the habit sustainable — you're not reinventing the format every time, you're just filling in the details. What works for your project will be different from mine, but the point is to have something standardized that you actually stick to.

Feedback loops are everything

Here's the core idea behind all of this: when an agent can produce code in seconds, any time you invest in automated verification pays for itself many times over.

Think about it in terms of where your time goes. Without good feedback loops, the cycle looks like this: you give the agent a task, it writes code, you manually check the result, you find a problem, you explain the problem, the agent fixes it, you check again. You're the bottleneck at every step. With good feedback loops, the agent can verify its own output and fix issues without you being involved — and your job shifts from checking the agent's work to building the infrastructure that lets it check its own.

The tricky part is that there's no universal recipe for what feedback loop you need. It depends entirely on what you're building. A backend API needs different verification than a UI component, which needs different verification than a data pipeline. Each time, you have to stop and think: what's the most convenient way for the agent to interact with the thing it's building? How can it tell whether what it did actually works?

This is where the real craft is. Everything is fair game — the agent can read logs, take screenshots, query databases, hit APIs, run Playwright tests, inspect traces in LangSmith. The next section is a catalog of the tools I've found useful, roughly ordered from simple to advanced. Not all of them will apply to your project, but the underlying question is always the same: how do I close the loop so the agent can verify its own output?

The feedback loop toolkit

Build, tests, and own instances

The most basic feedback loops: can the code compile, and do the tests pass? This sounds trivial, but the important thing is making it frictionless. Prepare one-liner commands for building and running tests. The agent shouldn't have to figure out your build system or remember which test runner you use. The general principle applies everywhere — eliminate any friction that isn't the actual task.

On the same note — give agents their own running instances of the application. Prepare simple commands so the agent can launch the app as a background task. Don't let it rely on your running instance. If the app is running on your side, the agent has no access to the logs and is essentially flying blind. Even better, set up your codebase so that multiple agents can run separate instances on different ports without stepping on each other. Full autonomy, no dependency on you or on each other.

The deeper point here is about test discipline. If you plan your test scenarios at the spec stage and make passing tests a hard requirement for completing a feature, your test suite becomes an incredibly powerful anchor for the agent. It writes code, runs the tests, sees what fails, fixes it — all without you. But this only works if the tests actually exist and cover what matters. If you let test coverage slide, you lose one of the best feedback loops you have.

Hitting API Endpoints & Visual Verification

Once the code builds and the tests pass, the next question is: does it actually work when you use it? This is where runtime verification comes in.

For backend work, making your API endpoints easily accessible to the agent is a big deal. Set up convenient auth, no hacks or workarounds. Sometimes it's worth building a small dedicated interface specifically for the agent to hit. The easier it is for the agent to poke at the running service, the more it can verify on its own.

For anything with a UI, Playwright MCP is an MCP server that gives the agent direct control over a browser — it can navigate, click, fill forms, and read page content. This is the most expensive tool in terms of tokens, but it has the highest verification value. The key is to tell the agent to run through the entire feature flow from start to end without cutting corners. Don't let it just check one happy path — make it go through the full scenario the way a real user would. On Claude's Max plan you can use it practically without limits. On the Pro plan, the token budget is tighter, but it's still worth it for critical flows.

Screenshots are a separate thing and shouldn't be overlooked. Playwright sees the DOM tree, but the actual visual rendering can be something else entirely. A page can have the right elements in the right places and still look completely broken. Modern agents understand images well, so a screenshot is genuinely useful information — it catches an entire category of problems that DOM inspection misses.

Database and external services

Some things you can only verify by looking at the actual state. Did the data get written correctly? Are the relationships intact? Is the cache populated the way you expect?

For database inspection, I'd recommend two things: a skill that describes your database — the key tables, what they mean, the naming conventions, the relationships — and a CLI tool or MCP server that lets the agent actually run queries. The skill gives it understanding, the tool gives it access. Don't rely on the agent figuring out your data model from the schema alone. When it knows the semantics, it can query the database and actually make sense of what it's looking at.

The same applies to any other external services your application depends on — caches, message queues, other microservices. Write a skill that explains how the service fits into your system, and give the agent a CLI tool or MCP server to interact with it — whichever is most convenient. The more of your system the agent can observe, the more it can verify on its own.

Verification that actually sticks

Once you have good feedback loops, there are a couple of habits that make them much more effective.

The first is using sub-agents to verify work against the spec. I have a hard rule: no task is considered done until a separate agent has checked that the result actually matches the specification. It has to be a different agent — not the one that wrote the code. Why? Fresh context. Even the best agents lose focus over long sessions and gradually drift from what was specified. A separate agent with a clean context catches an enormous number of issues that the original agent has become blind to.

The second is the screenshot rule: no task is done until there's a screenshot of the working scenario. This sounds like a small thing, but it forces a moment of actual verification. The agent can't just say "done" — it has to show the result.

The shift

Four months in, the way I think about my role has changed. I spend very little time writing code. Most of my work is building the infrastructure that lets agents work autonomously — specs, feedback loops, skills, verification rules. It's a different kind of effort, but it compounds. Every feedback loop you build makes the next feature faster and more reliable.

If there's one takeaway from all of this, it's that the time you spend making the agent's job easier is never wasted. The agent can write code faster than you. Your job is to make sure it's writing the right code — and to build the systems that let it figure that out on its own.