AI in Real Development Work

A grounded account of moving from early ChatGPT and Copilot usage to AI coding agents, with a focus on practical gains, structural risks, and design discipline.

Previous article: Early Architecture Attempts

Opening

When ChatGPT suddenly became a global topic from late 2022 into 2023, I was extremely busy with company work. I did not have the spare attention to follow every news cycle in detail, but I still remember very clearly that it had taken over the conversation almost everywhere.

There was also a small side detail that was strangely memorable in Japan: a surprising number of people said ChatGTP when they meant ChatGPT. It was a minor joke at the time, but it also showed how quickly the name had spread beyond people who normally follow software closely.

Even before I had fully organized my thoughts about it, I had already concluded that it was something I needed to try in real work.

ChatGPT Changed My Work Before It Changed My Code

I started using ChatGPT soon after it became impossible to ignore. At least in my memory, it felt as if only a few days passed before the paid tier arrived and serious usage started to separate itself from casual curiosity.

What I remember first is the GPT-3.5 period. It was already useful, but it still felt unstable, uneven, and sometimes oddly shallow. Then GPT-4 arrived, and I remember being genuinely surprised by how much the answer quality improved. It was not perfection, but it crossed a line where the tool started to feel materially different from ordinary search and ordinary code suggestion.

It was also not especially stable in those days. There were mornings when I would wake up, sit down to work, open ChatGPT, and find that it was down. On such days, my motivation for the entire morning dropped more than I would like to admit. That is half a joke, but only half.

What the Early ChatGPT Phase Was Actually Good For

At the beginning, my use was centered almost entirely on the browser UI. I would ask questions, request code, copy useful fragments, and then adapt them by hand.

But if I look back honestly, the main value for me was not dumping coding tasks into the chat and waiting for finished answers. The deeper value was design assistance. I could present an idea that existed only vaguely in my head, ask questions about structure, trade-offs, and alternatives, and use the conversation to turn rough intuition into something more concrete.

Code generation mattered, but design clarification mattered more. In that phase, AI was already affecting my engineering work, even when it was not directly writing large amounts of code.

Search Work Started Moving Toward AI

My work has never been limited to system development alone. I work as an internal systems engineer inside a non-IT company, but a meaningful part of my actual work is not directly about systems at all. Depending on the situation, that can include sales-related work, product and production management, procurement, quality control for consumer household goods, document preparation, and other ordinary business responsibilities that have little to do with software itself.

Because of that, a large portion of my day has always involved finding out how to do something. That behavior gradually shifted away from ordinary search engines and toward AI. Once GPT-4-level answers became normal, and later once web-assisted answers improved, that shift became much more visible.

I also used generative AI in areas that had little to do with programming. One practical example was writing an SDS for one of the household products our company handles when there was no one inside the company who was already used to that work. AI helped me understand the general format, the kind of writing expected, and how to investigate the necessary information. In the end, some of that research still had to be done through more traditional means, including going to the library, but without AI the path to getting that document done would have been much less clear.

Another example was preparing reskilling training material in a field outside my own specialty. Expert support was available only in limited amounts, so I used AI to reduce how much expert time I needed while still building training material at a reasonably high quality. In both cases, AI did not remove the need for judgment. It reduced the amount of blind searching and let me use scarce human support more efficiently.

I still use normal search when I need source verification, precise documents, or conflicting viewpoints. And depending on the task, I may also ask the same question to multiple AI tools and compare the answers. But for a very large part of practical knowledge work, asking AI became the faster first move. In that sense, ChatGPT changed how I worked before it changed how I coded.

Copilot and the Era of Narrow but Powerful Assistance

At that time, I was still using Visual Studio for Mac on my Mac. When the end of support became clear, I moved my daily work more seriously to VS Code. That migration also made it natural to start using the early GitHub Copilot more actively.

Compared with what people now call AI agents, the early Copilot experience was extremely narrow. Even so, it reduced development cost quite a lot.

It was especially strong in a very specific kind of task: cases where the method contract was already clear, but the implementation itself was tedious. If the input and output were well defined, Copilot could often help meaningfully at the method level. For repetitive CRUD API work and similar routine implementation, it made development noticeably easier.

Still, I think it is important to describe that period correctly. Copilot was not replacing software development. It was removing some of the most tedious and mistake-prone parts inside software development, and that was already a very large benefit.

When ChatGPT Started Looking at My Editor

I think this was around 2024, when ChatGPT on macOS entered a more useful phase for coding work through the Work with Apps feature and later expansion of supported coding applications. That mattered because the interaction stopped being only a detached browser conversation.

Once AI could look at files while I was working in VS Code, the practical pattern changed. I started selecting which files to expose, which context to emphasize, and how to create a situation in which the model could move with less ambiguity.

That was still not full agentic coding in the sense we now use the term, but it was already a major shift. The AI was no longer responding only to manually pasted excerpts. It was beginning to work while seeing more of the real local context.

Around Claude Code and Codex

Sometime around the middle to later part of 2025, I also tried Claude Code through the kinds of editor and terminal integrations that were available around that period. My impression at the time was mixed. To be clear, this was not because I thought Claude Code was unusable or weak. By that time, it was already clearly usable enough for real work, and many engineers were using it seriously. The main issue was simpler and more personal: it did not fit my preferences very well. In particular, in the version I tried, it occupied one of the editor panes, and that alone was enough to make the experience feel wrong for me. So even though it felt fast, and in some cases faster than the alternatives, the overall fit in terms of working style, UI friction, and cost balance was not comfortable enough for me to keep using it continuously. So I stopped after a relatively short trial.

I say that carefully because these tools were changing quickly. Later forms of native editor integration became more polished, and I do not want to pretend that my earlier trial represented the final state of the product.

The tool that fit me more naturally in practice was Codex. Sometime in late 2025, when using it from the VS Code sidebar started feeling familiar, it matched my habits better. It felt closer to the flow I already knew from Copilot Chat, so the barrier to using it seriously was lower. In practical terms, Codex became the first tool I used seriously for day-to-day agentic coding.

At this point, I mainly use Codex, ChatGPT, and GitHub Copilot together. There are cases where one tool is enough. But there is also real value in using different models for review, asking similar questions in parallel, or comparing how each tool approaches the same change.

Before Agents, the Risk Was Relatively Small

Before the current generation of coding agents, the range of work AI could really take over was limited. Because of that, the risk was also relatively limited.

The developer still had to understand the whole system, define the details, make the structural decisions, and translate those decisions into code. AI helped with one important part of that process: it could take over some of the most annoying and error-prone local work.

That alone was enough to raise my effective development speed substantially. If I had to describe it as a feeling rather than a measured number, it often felt like more than a fifty percent improvement. And because the instructions were still fairly simple, when the answer was wrong I could usually correct it with a small follow-up.

Agentic Coding Changed the Scale of Both Speed and Damage

When AI tools became agents, the situation changed completely.

Now they could move across the project, inspect multiple files together, make coordinated edits, summarize the changes, and present the result in a form that was much closer to delegated work than to suggestion-based assistance. That gave me productivity from a different dimension.

At the same time, it also gave me code contamination from a different dimension.

That stronger wording is intentional. The gain is real, but the failure mode is also real. The same tool that can compress hours of mechanical work can also spread bad structure across multiple files faster than an ordinary human mistake usually does.

What makes this worse, in my view, is not only the agent itself but also a weakness on the human side. When an enormous batch of changes arrives all at once, it becomes surprisingly difficult to maintain enough energy to verify every part of it properly. People tell themselves they will review it carefully, but in practice some part of the change is often waved through because the total volume is already exhausting. I cannot prove that this is the central reason damage spreads, but I do think it is one of the real reasons agentic coding can become more dangerous than it first appears.

Vibe Coding Can Produce Software, but Not Automatically Designed Software

What is sometimes called “vibe coding” is a relatively recent phrase. I do not reject it outright. If the goal is to produce something that works, especially at small scale, then yes, AI can often get you surprisingly far. I also use that style myself for disposable shell scripts and very small programs. In that kind of case, what matters is often the result rather than the internal elegance, and sometimes I do not need to inspect the inside very much as long as the output is correct. For that kind of short-lived work, especially when there is little chance of causing a security incident or some other serious operational problem, vibe coding can be extremely well suited.

But working software is not the same thing as designed software.

AI can generate a moving system. It does not automatically generate a system whose boundaries, responsibilities, extension points, and failure handling were shaped with clear intent. That difference matters more as the software becomes larger and more long-lived.

For a small tool, a rough shape may be enough. For business systems that will be modified repeatedly over time, it usually is not.

Why Design Remains Necessary

This is the center of the issue as I currently see it.

If a system is not designed with clear intent, then repeated modification will eventually break it, regardless of whether the system is large or small. AI does not remove that rule. If anything, it can accelerate the path toward that outcome.

Take a typical order management system. An AI agent can often produce the minimum working behavior without much trouble. But without design discipline, the processing starts concentrating into one oversized function. Similar logic gets duplicated across screens and endpoints, and awkward shared helpers appear that solve one local problem while damaging another part of the system.

After that, every change makes the structure a little more complicated. No one sees the whole shape clearly anymore. Eventually the system turns into debt, even if it still technically runs. In other words, the real problem is that software complexity keeps increasing unless something deliberately holds it in check.

That is why I do not think the core problem is whether AI can generate code. The core problem is whether the software is being shaped by explicit design decisions or only by local convenience.

Can AI Build Everything by Itself

There are people who believe AI can build software entirely on its own. I do not want to dismiss that possibility in absolute terms.

In fact, over roughly the last year, a fairly large share of the code I produce has come from AI in one form or another. So this is not an argument from distance.

My current view is simply that such output becomes practically valuable only when someone is still holding the whole picture, understanding the details, making design decisions, and directing the work. Even if model quality keeps improving, I suspect the basic issue does not disappear. The scale at which things break may move upward, but the tendency for software complexity to keep increasing does not vanish.

That is my present view, not a final law. I am open to being proven too pessimistic. But I have not yet seen enough in practical work to conclude otherwise. And even if enough experience eventually makes these problems far more manageable, I think that would still mean we are only shifting toward a different kind of required skill. In that case as well, the people using AI would still need to build techniques, discipline, and working methods of their own.

Finite Resources and the Problem of Experience

Another reason I remain cautious is more operational than philosophical. As long as these services are provided at prices ordinary users can actually pay, the computational resources available to each request are finite.

That means an AI system cannot always hold or process all of the information that might matter beyond the immediately relevant slice. It can make very good proposals based on learned knowledge. What I am less convinced about is whether it can reuse accumulated experience in the same practical way a human engineer does across long periods of ownership and maintenance.

My guess is that this is not only a model-quality issue. At a more fundamental level, current AI still seems to me like a way of converting computation into results. Because of that, I suspect the problems I am talking about may become more manageable and more resistant to failure, but not truly disappear.

That is also why I separate this from speculation about a genuinely different technological basis. Maybe a completely different computing paradigm could change the picture. For example, if something like quantum computing became practical and general enough, perhaps the situation could change. Whether that would actually solve the problem is far from clear. But as long as we are talking about the broad family of systems we use now, I do not think resource limits are a minor detail.

What AI Agents Keep Getting Wrong in My Own Framework

This became clearer after I built Cotomy and then used Cotomy to build several systems. Working that way gave me a concrete environment in which I could watch AI agents help productively while also creating recurring structural problems.

One thing I learned very quickly is that instructions are not the same as compliance. I can write rules in AGENTS files. I can repeat those rules in prompts. And still, they are sometimes ignored.

Cotomy is a good example. CotomyElement does expose its underlying DOM element through the element property. But the framework is not designed around treating raw HTMLElement access as the normal primary path for ordinary screen work. Even so, AI will casually reach for direct HTMLElement handling if that seems locally convenient. Because of that, I sometimes need to audit for DOM policy violations explicitly.

The same thing happens with coding style. I sometimes prefer very local definitions using anonymous classes because they keep certain kinds of UI logic close to the place where it is used. AI agents have a tendency to normalize that into a different style even when I did not ask for such a change.

I have also seen type-level misalignment. For example, where a generic flow is clearly intended to work with a CotomyElement subclass, an agent may still try to put an HTMLElement-oriented type into the design because that looks acceptable from a narrower local reading.

None of these mistakes are especially mysterious. They look to me like the result of incomplete retention of project-wide rules under finite context and finite resources, combined with a tendency to choose local optimization when global intent is not held firmly enough.

An Interim Conclusion

At this point, I think the problem is wider than hallucination. AI also reproduces many of the ordinary design mistakes and convention violations that humans make.

To let coding agents do large amounts of work safely, you need high-quality instructions and a software structure that gives the agent a rational path to follow without constant ambiguity. In other words, the age of AI does not make design less important. It makes design more important.

Closing

This article is only an introduction to the problem as I see it. In the next few articles, I want to look more directly at why AI agents break software, how to let them work more safely, and what software design might need to look like in an era where code generation is normal.

I am not interested in denying the value of AI. AI has already delivered productivity and even quality gains that would have been very difficult to obtain in the past. I expect that benefit to continue growing.

But everything depends on how it is used. For engineers, thinking seriously about how to use AI well is no longer optional. It is becoming part of the job itself.

Next article: Real Problems I Encountered When Developing With AI Agents

Learn Cotomy

Cotomy is a DOM-first UI runtime for long-lived business applications.