This ensures that "downstream" systems/code that consumes the data doesn't assume (wrongly) that the data is always in the same order.
SELECT ProductName
FROM Products
ORDER BY ProductName
This is syntactically correct SQL, and we'll assume that it returns the correct records.
If all the products have a different ProductName, this will provide a unique ordering of the records. In other words, each time the SELECT statement is executed, the records come back in the same order.
However, in a dynamic environment, with a large number of records, it is likely that two products could have the same ProductName, e.g., "Socks", "Lipstick", "Hammer". (Unless uniqueness is enforced on ProductName.)
The danger here is that downstream systems or consumers of the data from this SELECT statement might actually depend on the records always being in the exact same order. And when they are not, things can go wrong.
So why would a programmer downstream assume that the records are in order? Well, why not? At first glance, they look like they are in order. But with a million records that are changing, how could the programmer be sure? And why sort a million records that look like they are already in order?
A simple safe SQL practice is to always specify a unique ordering. In this case, adding ProductID to the ORDER BY statement would be easy:
SELECT ProductName
FROM Products
ORDER BY ProductName, ProductID
This is not likely to be noticed by anyone, and may never make a difference. But it can prevent future errors.
By the way, I encountered a very similar situation on a large software project. The database people said the data was sorted, the programmers wrote their code to depend on the sort. And visual inspection of the data made us think it was sorted. We ended up noticing the changing order of records as we were stepping through code in a debugger. In other words, we were convinced it was a logic error in the code.
No comments:
Post a Comment