Identify the Identity (Part I: Technical Identifiers)

This is a post from my old blog which I have edited, expanded, repurposed and I’m putting out again. This time it’s in multiple parts so that I can focus on each important aspect, one at a time.

I am finding that despite the enormous leaps and bounds we have taken in UX and developing more intuitive software systems, this still seems to be a fundamental issue, and not just a legacy one either. I’m still seeing the symptoms in in-house implementations and in proprietary software products that we have paid good money to use. And there really isn’t an excuse for it anymore. So for this first post, I’m going to get straight to the point:

PLEASE DON’T USE RELATIONAL DATABASE KEYS AS APPLICATION IDENTIFIERS!

 

So What’s Wrong With Using Relational Database Keys As Identifiers?

When you’re identifying application identifiers, you aren’t just looking at uniqueness. You are committing to using these identifiers. Forever. (Well at least for the lifetime of your application, which I’m guessing you’ll hope will be closer to forever than say, next week given the effort you’ve put into it.)

Users or other integrated systems (more and more frequent in this connected world) will depend on the identifier being immutable. The irony is a lot of developers still think that using relational database sequences is the de-facto way of guaranteeing immutability.

“Using database keys for application identifiers is like lunching at McDonalds.”

Sure, for ensuring uniqueness within your database, database keys are quick, cheap (in development terms) and easy. But to use as application identifiers they are not a healthy choice. Using database keys for application identifiers is like lunching at McDonalds. And soon enough you’ll be hungry for an alternative solution.

Let’s look at this junk-food choice in more detail. Again, with the assumption that you want your application to stand the test of time, then it has to withstand the possibility of database change. The longer your application lives, the more inevitable such a change becomes.

The change could be:

  1. a database schema change due to a feature request or change request
  2. an application rebuild as a result of a migration or disaster recovery, possibly requiring a database regeneration.
  3. you need to scale out your application’s database and you decide to partition your data across multiple servers

This first 2 scenarios can definitely be overcome if you are thorough and vigilant with your full set of database objects (i.e. you have a DBA on your team), but why add to the maintenance headache?

The 3rd scenario however could cause you problems as your sequences are actually only unique per database server.

Even if you are not convinced that any of these 3 scenarios are likely, by exposing the database key of the record you are unnecessarily exposing your technical implementation to your users. This may not be a big deal to all developers and all situations, but to customers who want their applications to be as secure as possible, can you truthfully say that you have mitigated every possible security risk when you are advertising database IDs?

Not only have you unnecessarily introduced a potential security risk, you may be unintentionally misleading your customers. Listening to an old DotNetRocks podcast another reason why this is not a good idea was highlighted – your customers can be misled as to the meaning of the identifier. The example given in the podcast (thanks Richard & Carl) was that the customer was upset that they had their own customers identified in their system using the database primary key. When their biggest customer was ID 372 it was a disaster and insisted it had to be changed. No matter how arbitrary that ID is to a developer, perception is everything in business.

In essence you are coupling your database infrastructure to your domain, and giving yourself unnecessary problems as a result.

OK, so now you’ve decided you want the healthy option. What do you go for?

To ensure uniqueness of identifiers across machine instances, the logical evolution is to look at using UUIDs (or if you live in a Microsoft world, GUIDs).

You can generate a UUID or GUID in one line of application code (C# example):

or at database level (T-SQL example):

While you have both options, and it may be tempting to script this at database level for all tables, I’d recommend putting this logic into your application code for 3 reasons:

  1. It is easier to maintain application code than it is to maintain database objects (although admittedly it is getting easier with the Continuous Database Deployment movement).
  2. It is easier to test – always a good reason
  3. You are generating an identifier for your object (conceptually, not technically) and therefore belongs with the application domain rather than the application infrastructure. (More on this in a follow-up post)

So there you have it. If you want a longer lasting application with less maintenance headaches don’t choose the Mc(I)Ds option.

Leave a Reply

Your email address will not be published. Required fields are marked *