On This Page
What Agentic Workflows Changed
For the last three years, we used AI as a tool: we write prompts, get responses, integrate the results. AI was faster than typing from scratch, but we still drove.
In 2025, we started delegating entire tasks to agents. Not "write a React component" (a prompt). But "refactor this codebase to support multi-tenancy, run the tests, push a PR" (a multi-step task). The agent plans the work, implements it, detects failures, iterates, and commits.
We've now run agentic systems on:
- Code refactoring across 3 projects
- Adding new features (building on existing architecture)
- Debugging production issues and proposing fixes
- Test coverage expansion
- Documentation generation from code
The speed improvement is real. A refactor that would take 1-2 days of focused engineering time takes the agent 2-4 hours. We get the benefits of continuous improvement without the time cost. For a business, that's transformative.
But there's a catch.
The Speed Benefit Is Real
Let's be concrete. We assigned an agent to refactor a legacy Node.js service: extract shared utilities into a package, update imports across 15 files, ensure all tests still pass.
A human engineer: 6-8 hours. Careful, systematic, checks each file. Gets it right the first time.
The agent: 1.5 hours. It planned the refactor, executed it across all files, ran tests, found 3 breaking changes, fixed them, ran tests again, committed.
The output was identical. The time difference is because the agent doesn't get tired, doesn't second-guess, doesn't take 15-minute breaks. It just works.
For work that's well-defined and testable, agents are genuinely faster.
The Problem: Confident Incorrectness
But agents have a flaw: they don't know what they don't know. When the agent is wrong, it's often confidently wrong.
Example: We asked an agent to add a new API endpoint to a service and integrate it with the existing auth system. The agent wrote the endpoint, added it to the router, even wrote tests. All tests passed. It looked great.
But it had never seen the compliance requirement (this endpoint logs customer data, which needs an audit trail). The agent didn't know to add logging. How would it? The requirement wasn't in the code comments. It wasn't in the spec file. It was implicit organizational knowledge.
The agent didn't know it didn't know. So it confidently shipped something that passed all visible tests but violated a business constraint.
This is the real cost of agentic workflows: not the failures (which are detectable), but the unknown unknowns (which are not).
Patterns We've Seen
After running agents on 12+ tasks, we've identified what they do well and what they miss:
Agents are good at:
- Tasks with clear success criteria (all tests pass)
- Well-scoped work (refactor X, don't touch Y)
- Systematic tasks (update all files in this directory)
- Code that follows existing patterns (add a controller following the pattern of existing controllers)
- Problems with clear constraints (don't break backward compatibility, don't exceed X MB)
Agents are bad at:
- Implicit requirements ("add audit logging because compliance")
- Business context ("this endpoint is high-risk because of customer sensitivity")
- Trade-offs ("shipping this might cannibalize another feature")
- Judgment calls ("should we use cache or database query?")
- Anything requiring domain knowledge not in the code
The pattern: agents execute brilliantly on what's explicit. They fail on what's implicit. And they can't tell the difference.
When You Need to Override
After a few false starts, we've developed a pattern for when to actually trust an agent vs. when to review and override:
Review carefully if:
- The work touches business logic (not just infrastructure)
- The work involves compliance, security, or customer data
- The work affects multiple systems (agent might miss integration points)
- The task requires knowing "why" something is the way it is (agent sees the what, not the why)
Trust the agent if:
- The work is purely technical (refactor, test expansion, docs)
- Tests are comprehensive and they pass
- The scope is narrow and well-defined
- The work follows existing patterns in the codebase
Our review process: a senior engineer reads the agent's output, not to check syntax (tests do that), but to verify that the implicit requirements were met. "Does this follow our security model? Did it miss a logging requirement? Did it understand the tradeoff?" That review takes 20 minutes. Without the agent, the work took 6 hours. Net win: 5.5 hours, with lower risk than 1-person engineering.
What This Means for Engineering Teams
The common fear: "Will AI replace engineers?" The answer, based on 6 months of agent use, is no. But it does change what engineering means.
You need fewer people writing code. You need more people who can:
- Define problems clearly. The better your spec, the better the agent executes. Vague requirements are a problem for agents.
- Review and judge. Someone has to verify the agent didn't miss implicit requirements.
- Understand tradeoffs. "Should we use caching or a database query?" — the agent can't decide. A human architect can.
- Know the business. What are the compliance requirements? What's high-risk? What's commodity? The agent doesn't know.
The teams that will thrive are teams with senior engineers who can guide agents and junior engineers who can learn the business context (not just how to code). The teams that will struggle are teams of mid-level engineers who do the work the agent is now better at.
In short: agent-driven development rewards judgment and punishes commodity execution.
Ready to Ship Faster with Agents?
We know when to delegate to agents and when human judgment is essential. Let's talk about how to build faster.
Discuss AI-accelerated development