Saturday, November 9, 2013

Benefits of an "Autonumber" Synthetic Key

I've come to creating all my tables using an auto-number, integer primary key called "ID".

In other words:

CREATE TABLE Person
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    ...
)



In another post, I talked about the good characteristics of a synthentic, non-meaningful key. In summary, meaningful keys are likely to need to change at some point, which can cause a problem because:
  • all foreign keys will need updated
  • some foreign keys may not be easily updatable, for instance older records that are archived, or off-line
So if you are going to synthesize, or completely invent, a field to be the primary key, what do you want out of that field?

In order to guarantee Entity Integrity, and satisfy the Guaranteed Access Rule (GAR), all you really need is that a key field be UNIQUE and NOT NULL. (The GAR requires that every value in a database be uniquely addressable. This is accomplished by tablename.fieldname WHERE primarykey=x.)

Another necessity is to have enough unique values for the table in question. Remember that data can live a long time (years, decades). Ask this question: How many records might be inserted in this table over the next ten years? That might seem hard, but you can start with how many per day, and do the math. Also consider that your traffic may increase over time, especially if your system is useful.

What else would be nice?

First, the field value for a record should not change. As pointed out above, a changing value in a primary key field will require cascading updates to all foreign keys. This can be difficult to accomplish, especially as data becomes older, and perhaps not online.

Second, the field should be small. In general, using the number of bytes necessary, but no more, keeps your records concise, and improves the number of records that fit on a page. In other words, when the database engine has to do an expensive disk IO to get a page, you'd like to get as many records on that page as possible. You are paying a disk IO, you'd like to get as many records as possible. Would you rather get 10 pencils for a dollar or 12?

Third, it might be nice to arrange it so that new records enter in an orderly fashion. This is where an "AutoNumber" field is helpful (IDENTITY in MS SQL Server).

Here's a layman's example. Think of a line of cars at a tollbooth. Usually, cars arrive at the tollbooth, and go to the end of the line. There's no question about where they go, and they don't affect any other cars in the line. But Suppose that as cars arrive at the tollbooth, that they must be arranged in license plate order. First, it must be decided where they should be in the line. Then which other cars have to move, then how to move them, etc. That's lot of work, and will take a lot of time.

In the database setting it is also convenient to have new records get placed "at the end". If your table uses an ascending "AutoNumber" field as the primary key, new records will get placed at the end. Furthermore, the database won't even have to ask the quesion "where does this record belong?", it just puts it at the end. No previously inserted records need to be relocated.

And the toll booth analogy goes further. Cars go through the tollbooth, leaving the line. The "oldest" cars go through the tollbooth first. Suppose we were making cars line up by licence plate number. Then we said, "The car that has been here longest gets to go through the tollbooth first!" We'd have to figure out which one was oldest, then make other cars move, etc. That's crazy!

In a database, records can leave the transactional table as they are archived. These are typically, the "oldest" records. If you are using an AutoNumber primary key, all the records that need to get archived are together, at the beginning of the list.

Finally, using an AutoNumber tends to keep "hot" records in RAM, rather on disk, improving performance. In a transactional database the newest records tend to accessed more often. The older records tend to get accessed less often. As an example, suppose I place an order at an e-tailer. I'm much more likely to change my mind and modify the order within 10 minutes of placing the order than after a week.

Using an AutoNumber means that the newest records are all together, on the same data pages. The database engine is smart enough to keep frequently accessed data pages in RAM to avoid the expensive disk access charges. As an example, I've noticed that my dentist office pulls all the files for the current day, and keeps them at the check in desk. So they are convenient and quick to access. Yesterday's files are filed in the back, because we won't need them for awhile.

To summarize, Autonumber fields are helpful because:

  • they get inserted in order, without affecting other records
  • they get archived in order, without affecting other records
  • they help the database keep current records in RAM
Of course, there are exceptions to every rule. The advice above would apply to transactional systems with a variety of operations (INSERT, UPDATE, DELETE, SELECT).


No comments:

Post a Comment

Followers