A short video covering the GROUP BY clause in SQL.
Part 3: uniqueness and multi-column groupingFor beginners.
Here is the SQL that goes with the demo.
-- Database by Doug -- Douglas Kline -- 2/4/2016 -- GROUP BY - Part 3 uniqueness and multi-column Grouping USE Northwind -- here's a basic query joining Categories with Products SELECT Categories.CategoryName, Products.ProductName, Products.Unitprice FROM Products JOIN Categories ON Products.CategoryID = Categories.CategoryID ORDER BY Categories.CategoryName -- now let's aggregate -- show the average unit price, by Category SELECT Categories.CategoryName, AVG(Products.unitprice) AS [average unit price] FROM Products JOIN Categories ON Products.CategoryID = Categories.CategoryID GROUP BY Categories.CategoryName ORDER BY Categories.CategoryName -- awesome! runs fine, looks reasonable... -- or is it -- let me add one more item SELECT Categories.CategoryName, AVG(Products.unitprice) AS [average unit price] FROM Products JOIN Categories ON Products.CategoryID = Categories.CategoryID GROUP BY Categories.CategoryName, Categories.CategoryID -- this is the added piece ORDER BY Categories.CategoryName -- notice that now I have an additional group -- let's add a column to see what's going on SELECT Categories.CategoryID, Categories.CategoryName, AVG(Products.unitprice) AS [average unit price] FROM Products JOIN Categories ON Products.CategoryID = Categories.CategoryID GROUP BY Categories.CategoryName, Categories.CategoryID -- this is the added piece ORDER BY Categories.CategoryName -- aha! there are two categories with the same name -- is that OK? -- well, CategoryName is not the primary key -- and apparently the schema allows duplicate values in the CategoryName -- so, the lesson here is to always specify a *unique* grouping -- in general, this means adding a primary key -- to the GROUP BY statement, even if it is not displayed -- for example: SELECT Categories.CategoryName, AVG(Products.unitprice) AS [average unit price] FROM Products JOIN Categories ON Products.CategoryID = Categories.CategoryID GROUP BY Categories.CategoryName, Categories.CategoryID ORDER BY Categories.CategoryName, Categories.CategoryID -- or another example SELECT Suppliers.CompanyName, AVG(Products.unitprice) AS [average unit price] FROM Products JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID GROUP BY Suppliers.CompanyName, Suppliers.SupplierID ORDER BY Suppliers.CompanyName, Suppliers.SupplierID -- notice in the second example that there are no duplicate -- company names -- so should we still add the SupplierID to the group by? -- absolutely -- your query needs to work on any data that might be in the database -- not just what's there now -- now for multi-level grouping -- consider this query SELECT Suppliers.CompanyName, Categories.CategoryName, Products.ProductName, Products.Unitprice FROM Suppliers JOIN Products ON Suppliers.SupplierID =Products.SupplierID JOIN Categories ON Categories.CategoryID=Products.CategoryID ORDER BY Suppliers.CompanyName, Categories.CategoryName -- notice that Exotic Liquids supplies us products in two categories -- same with Forets... -- so, let's calculate average unitprice by Supplier AND Category SELECT Suppliers.CompanyName, Categories.CategoryName, AVG(Products.Unitprice) AS [avg unit price] FROM Suppliers JOIN Products ON Suppliers.SupplierID =Products.SupplierID JOIN Categories ON Categories.CategoryID=Products.CategoryID GROUP BY Suppliers.CompanyName, Categories.CategoryName ORDER BY Suppliers.CompanyName, Categories.CategoryName -- notice that now Exotic liquids has two groups -- one for the Beverages category -- and one for the Spreads category -- so now, the grouping is based on the *combination* of the two fields -- note that the order of the fields in the GROUP BY doesn't matter -- but for clarity it is best to be consistent with the ORDER BY -- (the order of the fields in the ORDER BY *does* matter) -- and of course, to make sure our groupings are *unique* we should add -- a unique field to the GROUP BY SELECT Suppliers.CompanyName, Categories.CategoryName, AVG(Products.Unitprice) AS [avg unit price] FROM Suppliers JOIN Products ON Suppliers.SupplierID =Products.SupplierID JOIN Categories ON Categories.CategoryID=Products.CategoryID GROUP BY Suppliers.CompanyName, Categories.CategoryName, Suppliers.SupplierID, Categories.CategoryID ORDER BY Suppliers.CompanyName, Categories.CategoryName, Suppliers.SupplierID, Categories.CategoryID -- on a side note -- it's also important to have a unique ordering -- so that downstream consumers of your data aren't surprised -- when the ordering changes -- note that very different information that is provided if -- the column-ordering, and the row-ordering is reversed SELECT Categories.CategoryName, Suppliers.CompanyName, AVG(Products.Unitprice) AS [avg unit price] FROM Suppliers JOIN Products ON Suppliers.SupplierID =Products.SupplierID JOIN Categories ON Categories.CategoryID=Products.CategoryID GROUP BY Categories.CategoryName, Suppliers.CompanyName, Suppliers.SupplierID, Categories.CategoryID ORDER BY Categories.CategoryName, Suppliers.CompanyName, Suppliers.SupplierID, Categories.CategoryID -- all the data in the result set is the same as the previous example -- in the first example, the focus was on the Supplier -- and secondarily, on the categories -- in the second example, the focus is on the Category -- and secondarily, on the Suppliers -- in summary, make sure that your GROUP BY statement -- specifies a unique grouping -- this is generally done by adding the primary key -- you can specify multiple columns in the GROUP BY -- then groups are determined by *unique combinations* -- of the values in the columns -- finally, merely changing the column ordering -- and the row ordering can produce quite different -- information for human use -- Database by Doug -- Douglas Kline -- 2/4/2016 -- GROUP BY - Part 3 uniqueness & multi-column Grouping
No comments:
Post a Comment