Wednesday, January 21, 2015

Always define a unique ordering

Your SELECT statements should always have a unique ordering.

This ensures that "downstream" systems/code that consumes the data doesn't assume (wrongly) that the data is always in the same order.

Let me describe this using the Products table from the MS Northwind database.


Consider this SQL:

SELECT   ProductName
FROM     Products
ORDER BY ProductName

This is syntactically correct SQL, and we'll assume that it returns the correct records.

If all the products have a different ProductName, this will provide a unique ordering of the records. In other words, each time the SELECT statement is executed, the records come back in the same order.

However, in a dynamic environment, with a large number of records, it is likely that two products could have the same ProductName, e.g., "Socks", "Lipstick", "Hammer". (Unless uniqueness is enforced on ProductName.)

The danger here is that downstream systems or consumers of the data from this SELECT statement might actually depend on the records always being in the exact same order. And when they are not, things can go wrong.

So why would a programmer downstream assume that the records are in order? Well, why not? At first glance, they look like they are in order. But with a million records that are changing, how could the programmer be sure? And why sort a million records that look like they are already in order?

A simple safe SQL practice is to always specify a unique ordering. In this case, adding ProductID to the ORDER BY statement would be easy:

SELECT   ProductName
FROM     Products
ORDER BY ProductName, ProductID

This is not likely to be noticed by anyone, and may never make a difference. But it can prevent future errors.

By the way, I encountered a very similar situation on a large software project. The database people said the data was sorted, the programmers wrote their code to depend on the sort. And visual inspection of the data made us think it was sorted. We ended up noticing the changing order of records as we were stepping through code in a debugger. In other words, we were convinced it was a logic error in the code.

No comments:

Post a Comment

Followers