It's incomplete, but covers basics, and gets a beginner started.
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
~Martin Fowler
Programs must be written for people to read, and only incidentally for machines to execute.-- formatting SQL
~Harold Abelson
-- version 2, with fixed formatting
-- Any fool can write code that a computer can understand. -- Good programmers write code that humans can understand. -- ~Martin Fowler -- Programs must be written for people to read, and only incidentally for machines to execute. -- ~Harold Abelson -- here's a slightly long rant/explanation -- of my personal coding style -- So here is something a company might want to know -- how much of each product did we sell in 1997 -- plus which category and supplier -- fairly simple and small relative to real-world queries -- this is produced by this query, directly from the SSMS query designer SELECT Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, SUM(([Order Details].Quantity * [Order Details].UnitPrice) * (1.0 - [Order Details].Discount)) AS totalRevenue FROM [Order Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID INNER JOIN Orders ON [Order Details].OrderID = Orders.OrderID LEFT OUTER JOIN Categories ON Products.CategoryID = Categories.CategoryID LEFT OUTER JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID GROUP BY Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, YEAR(Orders.OrderDate) HAVING (YEAR(Orders.OrderDate) = 1997) ORDER BY Products.ProductID -- here are a few questions, -- can you answer them with a quick scan of the code? ---- which tables are involved? ---- what are the foreign keys for the joins? ---- how many columns will come back? ---- which columns? ---- any outer joins? ---- how many joins? -- all of the above questions should be *easily* readable from the code -- but they aren't -- so here's some formatting modifications to help ----------------------- -- vertically align a single item from each list on its own line: -- --lists are anything with commas. They are typically in the SELECT, -- --GROUP BY, and ORDER BY -- --this makes the elements in the list clear. -- --I don't have to depend on spotting commas for list item separation SELECT Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, SUM(([Order Details].Quantity * [Order Details].UnitPrice) * (1.0 - [Order Details].Discount)) AS totalRevenue FROM [Order Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID INNER JOIN Orders ON [Order Details].OrderID = Orders.OrderID LEFT OUTER JOIN Categories ON Products.CategoryID = Categories.CategoryID LEFT OUTER JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID GROUP BY Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, YEAR(Orders.OrderDate) HAVING (YEAR(Orders.OrderDate) = 1997) ORDER BY Products.ProductID -- vertically align the tables in the FROM clause, and lead each with its JOIN ---- for me, this brings the really critical information about the type of joins to the front ---- I won't miss an outer join by accident SELECT Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, SUM(([Order Details].Quantity * [Order Details].UnitPrice) * (1.0 - [Order Details].Discount)) AS totalRevenue FROM [Order Details] INNER JOIN Products ON [Order Details].ProductID = Products.ProductID INNER JOIN Orders ON [Order Details].OrderID = Orders.OrderID LEFT OUTER JOIN Categories ON Products.CategoryID = Categories.CategoryID LEFT OUTER JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID GROUP BY Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, YEAR(Orders.OrderDate) HAVING (YEAR(Orders.OrderDate) = 1997) ORDER BY Products.ProductID -- remove verbose JOIN syntax ---- In general, extra words don't help readability ---- all important information about the type of join is retained ---- also vertically align components of each clause ---- with this, the overall structure/components ---- of the query is/are very apparent SELECT Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, SUM(([Order Details].Quantity * [Order Details].UnitPrice) * (1.0 - [Order Details].Discount)) AS totalRevenue FROM [Order Details] JOIN Products ON [Order Details].ProductID = Products.ProductID JOIN Orders ON [Order Details].OrderID = Orders.OrderID LEFT JOIN Categories ON Products.CategoryID = Categories.CategoryID LEFT JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID GROUP BY Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, YEAR(Orders.OrderDate) HAVING (YEAR(Orders.OrderDate) = 1997) ORDER BY Products.ProductID -- align ON clause elements, and space to the right ---- the tables involved become more clear ---- the "details" of the join are less important at first glance, ---- but are clearly presented if needed SELECT Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, SUM(([Order Details].Quantity * [Order Details].UnitPrice) * (1.0 - [Order Details].Discount)) AS totalRevenue FROM [Order Details] JOIN Products ON [Order Details].ProductID = Products.ProductID JOIN Orders ON [Order Details].OrderID = Orders.OrderID LEFT JOIN Categories ON Products.CategoryID = Categories.CategoryID LEFT JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID GROUP BY Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, YEAR(Orders.OrderDate) HAVING (YEAR(Orders.OrderDate) = 1997) ORDER BY Products.ProductID -- break long formulas by the operators ---- note also the separation of parentheses with a space ---- and alignment of the outer beginning and ending parentheses ---- this is judgement call material - every formula is different ---- you may choose to do this differently ---- but clearly, formatting of longer calculations helps readability SELECT Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, SUM( ( [Order Details].Quantity * [Order Details].UnitPrice) * (1.0 - [Order Details].Discount) ) AS totalRevenue FROM [Order Details] JOIN Products ON [Order Details].ProductID = Products.ProductID JOIN Orders ON [Order Details].OrderID = Orders.OrderID LEFT JOIN Categories ON Products.CategoryID = Categories.CategoryID LEFT JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID GROUP BY Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, YEAR(Orders.OrderDate) HAVING (YEAR(Orders.OrderDate) = 1997) ORDER BY Products.ProductID -- move the condition from the HAVING to the WHERE clause -- and remove verbose/unnecessary parentheses ---- condition in HAVING or WHERE is logically equivalent, but should absolutely be changed ---- this is subtly, but immensely stylistically better ---- because it conveys the fact that YEAR(orderdate) is ---- a property of each Order record, not of each product group ---- WHERE is for conditions on records ---- HAVING is for conditions on groups ---- It is also potentially performance-improving, because ---- records will be eliminated prior to grouping, not after ---- hopefully the query optimizer catches this, ---- but why not make sure instead of hope? SELECT Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, SUM( ( [Order Details].Quantity * [Order Details].UnitPrice) * (1.0 - [Order Details].Discount) ) AS totalRevenue FROM [Order Details] JOIN Products ON [Order Details].ProductID = Products.ProductID JOIN Orders ON [Order Details].OrderID = Orders.OrderID LEFT JOIN Categories ON Products.CategoryID = Categories.CategoryID LEFT JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID WHERE YEAR(Orders.OrderDate) = 1997 GROUP BY Products.ProductID, Products.ProductName, Categories.CategoryName, Suppliers.CompanyName, YEAR(Orders.OrderDate) ORDER BY Products.ProductID -- the above I would say is far superior to the original -- I can, with a quick scan, answer all the important questions I had above -- small/typo errors become apparent -- people new to the data model and codebase can read it -- I think the above is sufficient, but some would go further -- these changes I think improve the readability, but may not be worth the time/effort -- essentially, this is further "pretty-printing" of the code -- I think it helps in some places, but hurts in others -- to me, the code looks "choppy", and certain expression don't appear cohesive -- so because I can't *consistently* do this, I choose not to SELECT Products .ProductID, Products .ProductName, Categories .CategoryName, Suppliers .CompanyName, SUM( ( [Order Details].Quantity * [Order Details].UnitPrice) * (1.0 - [Order Details].Discount) ) AS totalRevenue FROM [Order Details] JOIN Products ON [Order Details].ProductID = Products .ProductID JOIN Orders ON [Order Details].OrderID = Orders .OrderID LEFT JOIN Categories ON Products .CategoryID = Categories .CategoryID LEFT JOIN Suppliers ON Products .SupplierID = Suppliers .SupplierID WHERE YEAR(Orders.OrderDate) = 1997 GROUP BY Products .ProductID, Products .ProductName, Categories .CategoryName, Suppliers .CompanyName, YEAR(Orders.OrderDate) ORDER BY Products .ProductID -- some people will also use table aliases, -- effectively creating a short "variable name" for each table -- notice in the example below that I've still been careful with alignment -- all table aliases are aligned vertially, so I -- have an easy reference for each "variable name" SELECT p.ProductID, p.ProductName, c.CategoryName, s.CompanyName, SUM( ( od.Quantity * od.UnitPrice) * (1.0 - od.Discount) ) AS totalRevenue FROM [Order Details] od JOIN Products p ON od.ProductID = p.ProductID JOIN Orders o ON od.OrderID = o.OrderID LEFT JOIN Categories c ON p.CategoryID = c.CategoryID LEFT JOIN Suppliers s ON p.SupplierID = s.SupplierID WHERE YEAR(o.OrderDate) = 1997 GROUP BY p.ProductID, p.ProductName, c.CategoryName, s.CompanyName, YEAR(o.OrderDate) ORDER BY p.ProductID -- the main benefits in readability, to me, -- is fewer characters -- and some vertical alignments are easier, -- since each "variable name" is a single letter -- it also has benefits in coding, by making "auto-complete" quicker -- however, -- there is additional cognitive overhead in translating -- the "variable name" to the table -- this gets worse the more tables are involved -- and if you have 6 tables that all begin with p, you have a problem -- and if you are not intimately familiar with the database/codebase, -- your cognitive overhead is very high -- so, can other people read the code? (see quotes at top) -- -- personally, I'm not a fan of this -- even with many years (*gasp* decades) of coding -- when I drop into a new environment/datamodel/codebase -- it takes me too long to adapt to their "variable names"
No comments:
Post a Comment