Fast Checks for Code Generation
I’ve been thinking a lot about code generation with AI agents lately, and wanted to start writing a bit about how I’m approaching the problem. I’ve been using AI-assisted workflows since ChatGPT came out but care a lot about code quality and maintaining a good mental model for the work I’m doing. In the early days I would mostly use these tools for scaffolding, but I’m finding we’re at a bit of an inflection point where they can be used for more production-ready coding tasks. This is a first article exploring it.
This one technique, around the principle “Automate Feedback”, is one that I found to be quite powerful to ensure that the codegen is up to my standards. In my project Indie Web, which powers FloppyDisk.link and BrowserChords.com, I use taskfiles for my script runner. I added a task check that automates all the fast iterative checks I care about.
I found myself frustrated with constantly correcting the agent, and slow loops. It failed to apply automatic code formatting pretty much every time, and my atomic commits became messy with formatting fixes after the fact. On side projects I still prefer a clean commit style. I also found myself constantly correcting annoying coding patterns and style lints. Suddenly, my side project needed classic linting checks for untrusted contributors. We have had this problem solved for quite a while with linting and automated feedback.
For agentic loops, this changes. Here I want a single unambiguous check command that is as fast as possible to keep the agent on task without a human in the loop, until I am ready to review the work. In order to trust plausible-looking agent code, I need some kind of validation that is quick and that I can trust. Here I got the check down to a few seconds of parallelized runs. AI codegen is really good at creating plausible code that is fundamentally flawed. task check was my way to keep things higher quality.
How the command works – human in the loop
1 | task check |
This is all that is needed to run the check.
1 | ➤ task check |
Once it finishes in an interactive terminal the output is quick and simple:
1 | ~/dev/indie-web on indie-web/main 🍏 |
For failures only the failing task is shared.
1 | ➤ task check |
This way it is quick to diagnose errors and understand failures whenever I want to run a check.
How the command works – non-TTY
If I pipe a command into cat then it will run in a non-interactive mode. This is the same way an agent or standard CI would interact with the command. Here I wanted to output the format in a log-friendly way without interactive check marks updating over time. I wanted to maximize signal for the agent, but retain signal for me as well.
1 | ➤ task check | cat |
For the success case I got:
1 | ➤ task check | cat |
Here it informs the agent that everything is passing as intended, and how long the check took. 5.5s for the longest task means that everything works quickly in an agentic loop.
How this affects the agentic loop
I found with this technique things just started working. I didn’t have to correct the agent with my time. Instead it would figure out every time that there was a prettier error, and only run task lint-fix once it was done solving the bigger problems. I found I could get more reliable results.
I found this AI summary of my work pretty accurate:
- The command should be:
- Fast enough to run constantly.
- Narrow enough that failures are relevant.
- Real enough that passing means something.
- Quiet enough that an agent can use the output.
- Pleasant enough that I will still run it.
In my project to accomplish “fast enough” I parallelized everything. I care a lot about performant code, and my tests were already pretty fast to begin with. For this project, this is actually all of the CI checks running. I could see that in larger projects you would need to make task check smart enough to only run checks that would be relevant. For my use case, my tests were fast enough to just run everything in parallel on my beefy dev machine.
In order to make failure modes even faster for the agent flows, I made it even faster by killing all the remaining tasks that hadn’t failed. Agents can just fix one failure at a time, so you can order checks by importance.
What this all looked like for me
An example Taskfile.yml shape:
1 | check: |
And then the check orchestrator lists the tasks in order.
1 | const checks = [ |
Finally, I keep my AGENTS.md pretty short, because of the principle of “Patterns Are Stronger Than Prompts” for codegen. The agent will figure it out contextually and interactively.
1 | Read all of README.md. I optimize for human usage over robot. |
This feedback loop has made me feel pretty productive without feeling sloppy, so that I can read the generated code and ensure it’s high quality for the task at hand.
