Entity Identity and Surrogate Key Design

A practical reflection on natural keys, surrogate keys, DocumentDB, Azure Search, and why stable record identity became a better fit for my business systems.

This note continues from Object-Oriented Thinking for Entity Design .

Introduction

I started thinking seriously about database design only after I had already begun working as a professional engineer.

For a long time, I designed keys differently for each table. At the time, that felt completely ordinary. In many teams, I suspect it still does.

Only much later did I learn that this style is usually called natural key design.

I do not think that approach is strange or irrational. In fact, it is probably the most human way to think about identity at first. What makes a record unique is usually decided by business requirements, so it feels natural to use that business meaning directly as the key.

An order record is a simple example. In many systems, the order number is the obvious identifier. A daily work report can be different. If several sites each submit one report per day, then site code plus date may feel like the most natural identifier. That way of thinking is easy to understand because it follows the shape of the work itself.

For a long time, I had no strong reason to question it.

Why I Moved Away from Natural Keys

I did not move to surrogate keys because I had studied the theory deeply, or because natural keys had already caused some dramatic failure.

The actual trigger was much more practical.

At one point, I planned and built a system for a related line of business that handled many kinds of operational reports and requests. One can think of it as something in the general area of building maintenance or security operations, where many different people, contractors, and sometimes customers submit information, and managers need to review that information to make decisions.

This was not just a support task where I inherited someone else’s structure. I was driving the design myself, so I had room to rethink the whole shape of the data flow.

An important part of the context was that the source systems could also be adjusted. More than half of them were systems I had built myself, and for the others I still knew the developers or the responsible departments well enough that coordination was easy. So I was not trapped by a fixed external contract. If the cross-system design needed a small adjustment on the registration side, that was realistic.

The goal was cross-domain search. I wanted to gather reports, requests, and other operational records from multiple systems into one place and let managers search across them as one body of information.

That was where a normal relational approach started to feel wrong. If the data stayed split into separate tables by source or by business type, then it stopped being truly flat from a search perspective. I could still search each table, of course, but I could not treat the incoming data as one unified searchable surface without building a more artificial structure on top of it.

This system was not intended for heavy clerical processing, and it did not need aggregation. It was mainly for collecting and searching operational information. Because of that, I started considering a key-value or document-style database, even though that was not the mainstream choice then and probably still is not the mainstream choice now for this kind of internal system.

That is why I decided to use what was then called Azure DocumentDB together with Azure Search. Today those products are called Azure Cosmos DB and Azure Cognitive Search, but at the time they still had their earlier names.

The key question became difficult at exactly that point. Once I tried to treat all of those records as one flat searchable body of data, it stopped making sense to let every incoming record carry a completely different key shape derived from its original business meaning.

Looking back, that experience clarified one design point very sharply. Once record shapes stop being uniform, natural-key assumptions also stop being uniform. If the schema is heterogeneous but the system still needs to store, update, index, and search records through one common flow, then identity has to be separated from business structure.

A Different Kind of Data Model Changed the Key Question

Using a key-value or document-style database changed the design pressure immediately.

The incoming data was roughly divided into three groups: daily reports, monthly reports, and records that arrived without any date-driven cycle at all. I could have forced dates into the key shape for consistency, but that started to feel wrong very quickly. If some data had no real relationship to a date, putting a date into its identifier just to preserve one naming pattern felt artificial.

That was the point where I stopped trying to derive identity directly from the business meaning of each record.

Instead, I standardized record identity on GUID values.

That choice fit the system better. In that system, the original business identifiers were not the primary meaning of the stored record anymore. They were attributes of the record. The record itself needed one stable identity so it could be stored, updated, indexed, and searched in one flat structure. From that perspective, letting the key shape change every time the source system changed would only have added implementation cost without adding real value.

It also fit the tools I was using at the time well enough that the implementation cost was low.

I was aware that people sometimes debate whether GUID collisions are truly impossible. I did not need philosophical certainty there. In that system, if a collision had somehow surfaced as an exception, retrying from the client side would have been enough. Most registrations were triggered automatically from other systems rather than typed manually by end users, so even a defensive retry path for that unlikely case would have been easy to add. The system’s primary purpose was search, not financial settlement or inventory control, so the operational risk was acceptable.

The broader context mattered too. Data was entered through several other systems, and those systems still used relational databases. Writing the generated GUID back to those source systems made it possible to update DocumentDB and Azure Search in the same transactional flow from the application side. That was enough to avoid the kind of inconsistency that would actually have been dangerous for the product.

Why That Design Felt More Rational Than I Expected

That system was not large in volume. It handled roughly one thousand records per day at most.

At that scale, almost any reasonable database technology could have worked. I was not solving a scale problem that only Cosmos DB and Azure Search could solve.

But the architecture still made sense because it matched the actual goal. Reports from multiple systems could be treated as flat searchable data. Full-text search became straightforward. Record-level sharing with other users also became easier because each record had one stable identifier that did not need to carry business meaning inside itself.

What mattered was not that the system was globally distributed or massively scalable. What mattered was that identity had become simpler.

That experience stayed with me.

Returning to Relational Databases

Later I returned to relational database design for internal business systems.

By then, I already knew that I liked working with records whose primary identity did not depend on a specific business number, date pair, or composite requirement. When I came back to MySQL-based design, it became obvious that relational databases did not force me to abandon that comfort.

A table can have one primary key and still define separate unique indexes for the business-level rules that actually matter.

That was the moment the idea clicked for me in a much more stable way. The system could use one surrogate key for record identity, while business uniqueness could still be enforced where needed.

Only after I had already settled into that style did I look up the terminology and learn that surrogate key was the established name for it.

Why Surrogate Keys Feel Better in Application Code

The most obvious benefit for me is that record identity becomes structurally uniform.

In my own systems, I usually manage entities through a shared base class. When every entity has the same kind of Id, I can treat record identity as one stable concern across the whole application. I do not need a different equality rule for each table just because one entity is identified by order number, another by site code plus month, and another by some external reference number.

That improves the code in a very practical way. Identity becomes simpler to reason about. Base classes remain cleaner. Cross-cutting code does not need to know the business rule for every individual table just to answer the question of whether two records refer to the same stored entity.

I still check the type as part of identity reasoning, of course. A customer with a certain Id and an order with the same Id are not the same thing. But once the entity type is known, the record key itself can stay simple.

Business Uniqueness Still Matters

Using surrogate keys does not mean business uniqueness disappears.

Daily and monthly data still need date-related constraints. Some records still need a master code, customer-controlled number, or another domain-specific identifier. But in a relational database, that is exactly what unique indexes are for.

I do not see that as a compromise. I see it as clearer separation of responsibility.

The primary key answers one question: which stored record is this.

A unique index answers another question: which business condition must never be duplicated.

Those are related questions, but they are not always the same question.

An order record is a good example. In one of my systems, the order itself is uniquely identified by its order number. But some customers also have their own inquiry number or management number and contact us using that identifier instead. Ignoring that number because it is not the official primary identifier would be an application defect. If it matters operationally, it should be modeled and, where necessary, protected with uniqueness rules.

Surrogate keys do not prevent that. They make it easier to express it without overloading record identity itself.

That distinction matters because I do not think of business keys as meaningless. I think of them as constraints and lookup handles rather than as the one universal identity that every layer of the system must carry. A screen, an API endpoint, and an entity instance usually need one concrete way to point to one stored record. Business rules often need more than that, but that is a different responsibility.

Looking at it now, I think this is part of a broader design principle. Business rules are often specific, changeable, and different across domains. System structure usually benefits from being more stable than that. For me, surrogate keys became valuable not only because they simplify record identity, but because they help separate changeable business uniqueness from the parts of the application that benefit from staying structurally uniform.

Why Relationships Become Easier to Change

The biggest advantage is not elegance. It is lower coupling.

When related tables reference a surrogate key, the link field itself carries almost no domain meaning. It is only an identity bridge. That makes relationships less fragile.

If I later discover that a business-level uniqueness rule was incomplete, and I need to add another field to a unique index, that change usually does not force a redesign of every foreign key relationship in the system. The record identity can stay the same while the business constraint becomes more precise.

That is a very practical form of safety during development. Design mistakes still happen, but the blast radius is smaller.

A Small but Real Benefit in URLs

Another benefit is smaller, but I still value it.

When natural keys are composite, URLs often drift toward query strings such as /orders?orderNo=… or /work-reports?siteCode=…&month=….

There is nothing inherently wrong with that. But for ordinary CRUD screens, I find it clearer when one record is addressed by one path segment, such as /orders/{id}.

That is also closer to how I prefer to structure APIs and screens in practice. Search conditions belong in query strings. A request for one concrete record feels cleaner when it uses a stable path identity instead.

This is partly a matter of taste, but it is not only taste. A uniform URL shape removes unnecessary variation from both frontend and backend code.

For me, this is not only a database design preference. It is a boundary design decision for the whole application. If one record is always addressed through one stable identifier, then the database key, the API path, the screen URL, and the entity form can all follow the same shape.

This is the principle I would state more directly now. Record identity should be single, stable, and independent of business meaning. Business identifiers should be treated as constraints and lookup keys, not as the primary identity of a stored record. Once that separation is made, the UI, API, and persistence layers can all rely on the same identity shape. Without a single stable identity, the layers cannot align. Without that single identity shape, the layers do not line up cleanly, and composite identity begins to distort the boundary between them.

Why This Also Fits Cotomy’s Entity Form Design

This preference also lines up with the current Cotomy entity-form flow.

CotomyEntityApiForm reads one entity key from data-cotomy-entity-key, appends that single key as the final path segment of the action URL, and switches from POST to PUT when the key exists. On a 201 Created response, it reads the Location header and stores the generated key back onto the form.

That flow assumes one stable record identifier. It is a good fit for the way I now design business systems, because the UI does not need to understand a composite business identifier just to edit one entity. Business-level uniqueness can stay in validation rules and database constraints where it belongs.

If composite natural keys were pushed directly into this kind of UI flow, the complexity would spread immediately. The URL shape stops being uniform. Hidden fields increase because the UI has to carry multiple identifying values. POST to PUT transitions stop being a simple one-key decision. Reload behavior also becomes heavier because re-fetching one record now depends on reconstructing multiple business conditions instead of reusing one stable identity. That is exactly the kind of cross-layer distortion I prefer to avoid.

Is This Unusual

From conversations and projects around me, I still get the impression that this style is not especially common in Japan. I often see designs where business identifiers remain close to the center of table identity, especially in internal business systems.

At the same time, when I look at major frameworks used internationally, surrogate-key-centered design does not look unusual at all. Many popular frameworks and ORMs assume one primary identifier such as Id by default, and then let developers define additional uniqueness constraints separately where business rules require them.

So I do not think of this as an eccentric design choice. It feels less like a special preference and more like one reasonable way to separate stored record identity from business-level uniqueness.

Where This Fits Best

This approach fits best in CRUD-oriented business systems where screens, APIs, and persistence all operate on one entity at a time. It is especially comfortable when the UI, URL structure, and API routes all benefit from one stable identifier shape.

I would be more careful when the same identifier must remain primary across external system boundaries, when the business key is legally or contractually authoritative, or when the natural key is guaranteed to be immutable and is already the true operational identity. Outside cases like those, I would not put a natural key at the center by default. In those cases, keeping the natural key at the center may still be the more honest design.

Conclusion

I did not arrive at surrogate keys by starting from database theory.

I arrived there because one search-oriented system forced me to stop embedding business meaning into every record identifier, and once I experienced that simpler identity model, I did not want to go back.

Natural keys still make sense as business constraints. In some cases, they are exactly the rules that matter most.

But today I treat record identity and business uniqueness as different concerns. A surrogate key identifies the stored entity. Unique indexes protect the business rules. That separation is not only a preference. It is what allows the database, API, URL structure, and entity form to keep the same shape, remove unnecessary branching, and stay structurally aligned across the application.

Design Series

This article is part of the Cotomy Design Series.

Series articles: CotomyElement Boundary , Page Lifecycle Coordination , Form AJAX Standardization , Inheritance and Composition in Business Application Design , API Exception Mapping and Validation Strategy , Why Modern Developers Avoid Inheritance , Inheritance, Composition, and Meaningful Types , Designing Meaningful Types , Object-Oriented Thinking for Entity Design , and Entity Identity and Surrogate Key Design.

Previous article: Object-Oriented Thinking for Entity Design

Learn Cotomy

Cotomy is a DOM-first UI runtime for long-lived business applications.