Designing Software for AI Code Generation

Previous article: Working Safely with AI Coding Agents

Vibe Coding and Real Systems

Lately, I keep seeing people argue that software can be built with vibe coding alone. I do not reject that idea completely. For very small tools, especially things that are short-lived and not especially risky, AI can often get you somewhere useful surprisingly quickly, and I also use it that way myself from time to time.

What I do not think is that this truth extends very far. Once the target becomes a real business system, the conditions change. Security, long-term maintenance, and data integrity all start to matter in a different way. And repeated modification matters much more than the first successful demo.

The place where that difference feels most obvious to me is the earliest design stage. In foundational design, there is rarely one obviously correct answer waiting to be discovered. The right structure depends on how the system will actually be operated, what the users are comfortable with, what shape the existing system already has, and even how much capacity the internal information systems team has to support it later.

Those are not only technical questions. They are questions about work style, trade-offs, constraints, and ownership. That is why I do not think it is very realistic to ask AI to decide the broad design by itself. It can help once a direction already exists. But deciding that direction still feels like a human responsibility to me.

How Much of My Own Code Comes From AI

If I had to describe it as a rough feeling, I think around eighty percent of the code output in my recent work now comes from AI or AI agents in one form or another. I could probably push that number even higher if I wanted to.

Even so, I still tend to write the core parts myself. That includes foundational design decisions and important entity design. Part of the reason is simple. Those parts shape the meaning of the system, and if I stop understanding them directly, the convenience I gain later starts to feel fragile.

There is another reason as well, and in practical terms it may matter just as much. If I stop writing important parts of the program consciously by myself, I think my coding ability would fall fairly quickly. And once that falls, it becomes much harder to judge whether the code produced by AI is actually good, only superficially plausible, or quietly dangerous.

That matters even more when something breaks. AI may help investigate a problem, but it will not always be able to do that. And even when it can help, someone still has to decide what information matters, which logs are relevant, which files are probably involved, and what should be treated as signal rather than noise. To do that well, I think it is important to keep myself in a state where I can still build the system directly.

To borrow a comparison from people operating at a far higher level than I am, I have heard that even though aircraft can land automatically now, pilots still often land manually. I mention that partly as a joke, but only partly. Delegating work and retaining skill are not the same thing.

What Keeps Feeling Wrong

What keeps bothering me about AI coding is not simply that AI makes mistakes. Human engineers make mistakes all the time as well. The deeper issue, as I currently see it, is that AI does not stably hold broad design intent. It works from a limited local range and then produces code that is often locally convincing.

When I look at online discussions around AI coding or vibe coding, I also keep noticing a difference in tone that is hard to ignore. This is only my own impression, so I do not present it as a measured fact. Still, people who say that AI can now build everything, or that it already replaced the need for real engineering judgment, often do not sound like people who have spent much time dealing with long-term maintenance, production operation, or the consequences of software decisions after release. By contrast, people who are more cautious, meaning they actively use AI but do not feel comfortable handing everything over to it, often sound like people who have spent more time living with exactly those consequences.

That may simply reflect the difference between building something once and having to live with it afterward. If you have spent enough time maintaining software, it becomes much easier to feel how dangerous locally plausible code can become over time. That is where this limitation starts to matter.

Repeated maintenance is where the real risk begins to appear. A change can look reasonable inside the visible area while still damaging the larger structure. Human engineers often feel a vague discomfort before they can fully explain it. They think something feels wrong here, or this may become dangerous later. That kind of unease comes partly from long experience.

AI generally does not have that kind of feeling. It can describe patterns. It can imitate caution. But that is not the same thing as actually sensing structural danger in the way an experienced owner often does.

There is another part of this that I find hard to ignore. AI will sometimes ignore direct instructions, ignore AGENTS-style personalization, ignore existing implementation that should obviously be respected, and sometimes even ignore checks that are sitting right next to the area being changed. I cannot say exactly what triggers that shift. But I do think there are moments when the whole output suddenly becomes less stable, less careful, and more half-finished than it was just before.

Context Is Narrower Than It Looks

One thing becomes obvious very quickly when using coding agents such as Codex. The amount of history and project context they actually work from is much smaller than people sometimes imagine.

I do not say that as a complaint. The tools are still useful enough that I use them heavily. But they are not continuously understanding the whole project in the way a human maintainer with long ownership gradually does. They are working from a moving slice, and sometimes that slice is enough while sometimes it clearly is not.

Part of me also suspects that this is not merely a temporary weakness of the tools. If left completely unconstrained, AI systems would probably keep consuming more and more resources. And if these tools are going to exist as commercial products that ordinary people can actually use, then their working range has to be limited somehow. So my guess is that they are built quite intentionally around a balance between the amount of context and computation they are allowed to consume and the practical value they are expected to return.

That point matters because fluency can easily create the illusion of broad understanding. The answer sounds coherent, so it is easy to imagine that the model is holding the whole system together in its head. I do not think that is usually what is happening.

What makes this more irritating is that the answer will sometimes sound as if the model checked the implementation or read the relevant documents carefully when in fact it did nothing of the kind. It simply fills the gap with a plausible-sounding guess. That happens far more often than I would like.

Why I Prepare Instruction Documents

This is one reason I often use ChatGPT to prepare instruction documents for Codex or Copilot. Longer conversations help me accumulate and refine design intent before the coding agent starts editing files.

I cannot prove this in any strict documented way, and I do not want to overstate it. Still, ChatGPT often feels as if the effect of accumulated conversation history is larger than what the immediate prompt alone would suggest. At least in actual use, there are times when a response seems influenced by a broader accumulated interaction rather than only the text directly in front of it. I remember at least one case, probably around the GPT-4 period, where it brought up information that had not appeared anywhere in that chat, and the response gave me the impression that something from earlier interaction had carried over into the answer.

There is another practical point as well. With the VS Code connection and similar tooling, I can to some extent control what information is brought closer to the working context and what is not. So my impression is that the result is influenced not only by the immediate prompt but also by a larger body of accumulated information that I can partially steer into view.

The document is not there because I enjoy paperwork. It is there because I need some way to compress design intent into a form the agent can actually follow. Once I started working this way, the amount of implementation that ignored the intended design dropped significantly.

That does not mean documents solve everything. They do not remove the need for review. But they do reduce the chance that implementation starts from an underspecified idea and then drifts toward a convenient but wrong structure.

What That Means for Design

Once I started thinking about it that way, the design side of the problem stopped feeling especially abstract. If AI only sees a relatively narrow area each time it works, then I think the software also has to be shaped so that each change can stay relatively narrow. At least right now, that still seems to me like the most practical response.

Put a little more plainly, compact feature boundaries matter more. A system does not become safe merely because prompts improve. It becomes safer when the structure itself limits how far one local mistake can spread.

I do not mean that structure alone solves everything. Of course it does not. But if one feature can be understood and edited in a compact unit, the agent is less likely to damage distant parts of the system without anyone noticing. That matters a great deal in AI-assisted development.

Why Razor Pages Plus Cotomy Fits This Well

In my own system, a screen is commonly built around files such as:

Page.cshtml
Page.cshtml.cs
Page.cshtml.css
Page.cshtml.ts

That structure brings the elements needed for one screen into the same area. Markup, server-side page handling, screen-specific styling, and client behavior stay close together, so the feature boundary is naturally compact.

I should also explain why I keep talking about screens here. Whether a system is written in a strongly object-oriented style or not, many of the tickets that appear during integration testing or after release still begin from a screen. That is where users notice the problem, that is where operators report it, and that is usually where investigation starts. So even if the underlying cause is deeper, the practical entry point for debugging and correction is often the screen boundary.

The model still matters, of course. If the model is weak, the whole system becomes unstable. But if the model has been examined with reasonable care, I do not think it is common for broad model-level revisions to happen constantly afterward. And when they do happen, the situation is usually serious enough that the team has to respond with full force anyway. At that point, the question is no longer whether AI happened to produce especially elegant local code. The problem is that the system itself now requires a large structural correction.

I should be clear here. I did not originally choose that structure for AI. I ended up there because I needed to write a large amount of CRUD, while also needing some parts of the system to remain server-rendered so they could be indexed by search engines. I also needed customer-facing features and partner-facing features to coexist safely without turning into a mess, and I wanted the work to stay organized in a way that made progress easier to measure and schedules easier to plan.

Razor Pages was important in that process. I did not first invent the full structure in my head and then go looking for a framework that matched it. Rather, I was trying to solve those practical constraints, arrived at Razor Pages because its page boundary fit that kind of work, and then kept developing the client-side structure from there. So the current shape was not designed from nothing by me alone. It emerged through that development path.

Looking back, I think Razor Pages itself also helped push the work toward a cleaner classification of CRUD behavior. List, detail, create, edit, and related screen transitions could be treated more explicitly as page-level units, and that made it easier to organize both the server side and the client-side behavior around the same operational boundary. Cotomy then grew on top of that reality, including the TypeScript side, rather than replacing it with a completely unrelated model.

Another part that mattered was the .NET solution and project structure itself. Managing multiple projects inside one solution was not merely an IDE convenience for me. For a large business system, it provided a practical way to split responsibilities into physical boundaries and keep the whole system from collapsing into one oversized unit.

That kind of decomposition is not unique to .NET. Other ecosystems also have their own monorepo and workspace models. Still, in my own experience, the solution and project structure in .NET made that separation especially explicit and operationally useful when the system became large and CRUD-heavy.

That was the real motivation. It was not designed as some AI-era pattern from the beginning. I was simply trying to build a large system without getting lost while building it. But in practice it seems to work quite well with AI coding agents. The agent can usually inspect a smaller and more meaningful set of files, and the working area for one change tends to stay relatively compact. Because of that, the impact of a single modification is often easier to reason about.

That matters more than it may sound. When the files for one screen are scattered too widely, the agent has more room to miss something important. When they stay close together, the local working set becomes easier for both the agent and the human reviewer to hold. And because so much real maintenance work begins from a screen-level ticket, that compactness is useful in ordinary operations, not only in AI-assisted implementation.

A Similarity to Japanese SI Structure

Thinking about this also brings back memories of older Japanese SI work. I do not mean that in any nostalgic way. There were many parts of that world that I disliked very strongly, and even now I still think much of that structure was unhealthy.

Still, one part of it looks a little different to me now. What I remember is not an elegant theory but the feel of the work itself. Design and coordination sat on one side, implementation sat on the other, and the implementation work was usually divided into pieces small enough to hand out screen by screen.

At the time, I mostly saw the bad side of that. It made the whole system harder to see. It encouraged local optimization. And it often produced a rigid way of working that was unpleasant to live with. I still think all of that is true.

But I also understand something now that I did not appreciate as much back then. That shape was also a way to keep development under control. If the work is divided into units small enough to assign, review, and estimate, then progress becomes easier to read and schedules become easier to plan. The final result also depends a little less on the strengths or weaknesses of one particular implementer.

Part of what I wanted in my own system was exactly that kind of stability. Even though I mostly work alone, there are still cases where I bring in help, and I do not want the result to depend too much on the personal style or skill level of whoever happens to touch one part of it. To be honest, that remains true even when the worker is an AI agent rather than another person.

That is one reason AI agents feel oddly familiar to me. They do not feel like broad-ownership designers. They feel closer to very fast implementers, or maybe coders would be the more precise word. They can produce a large amount of code quickly, they usually have no malicious intent, and yet they can still do damage that a human would not normally produce in quite the same way.

So when I think about the engineer directing the work and the AI agent carrying out a large share of the implementation, I cannot help seeing some resemblance to that older structure. The scale is different. The risks are different. But the relationship between the side that controls the work and the side that executes it does not feel entirely new to me.

Design Matters More, Not Less

So after thinking about all of this for a while, I do not end up feeling that design matters less because AI writes more code. What I feel is almost the opposite.

If the structure is weak, AI will only help that weakness spread faster. And if the structure is strong, AI can move quickly without letting one local change turn into a wider mess. That is why I keep coming back to the same point. The tool mostly accelerates whatever structure is already there.

So at this point, I no longer find myself wondering whether AI can write code at all. What stays with me instead is a different question: what kind of software structure lets AI move quickly without turning every local change into a wider risk?

Conclusion

AI is still developing, and I assume the tools will change a great deal from here. The limits we see today may shift. The workflows may shift with them.

So I do not think this article points to any final answer. The situation is still moving too quickly for that. Something much better, or much more efficient, may appear and change the development environment again.

Even so, I do not really expect that part to go away. Whatever tools appear, engineers will still end up having to think about the design and operating model that best fits the environment they are actually working in. If anything, the AI era seems more likely to force that question forward than to remove it.

Next article: Use Cases and Alignment in Solo Development