A short video covering the GROUP BY clause in SQL.
Part 3: uniqueness and multi-column groupingFor beginners.
Here is the SQL that goes with the demo.
-- Database by Doug
-- Douglas Kline
-- 2/4/2016
-- GROUP BY - Part 3 uniqueness and multi-column Grouping
USE Northwind
-- here's a basic query joining Categories with Products
SELECT Categories.CategoryName,
Products.ProductName,
Products.Unitprice
FROM Products
JOIN Categories ON Products.CategoryID = Categories.CategoryID
ORDER BY Categories.CategoryName
-- now let's aggregate
-- show the average unit price, by Category
SELECT Categories.CategoryName,
AVG(Products.unitprice) AS [average unit price]
FROM Products
JOIN Categories ON Products.CategoryID = Categories.CategoryID
GROUP BY Categories.CategoryName
ORDER BY Categories.CategoryName
-- awesome! runs fine, looks reasonable...
-- or is it
-- let me add one more item
SELECT Categories.CategoryName,
AVG(Products.unitprice) AS [average unit price]
FROM Products
JOIN Categories ON Products.CategoryID = Categories.CategoryID
GROUP BY Categories.CategoryName,
Categories.CategoryID -- this is the added piece
ORDER BY Categories.CategoryName
-- notice that now I have an additional group
-- let's add a column to see what's going on
SELECT Categories.CategoryID,
Categories.CategoryName,
AVG(Products.unitprice) AS [average unit price]
FROM Products
JOIN Categories ON Products.CategoryID = Categories.CategoryID
GROUP BY Categories.CategoryName,
Categories.CategoryID -- this is the added piece
ORDER BY Categories.CategoryName
-- aha! there are two categories with the same name
-- is that OK?
-- well, CategoryName is not the primary key
-- and apparently the schema allows duplicate values in the CategoryName
-- so, the lesson here is to always specify a *unique* grouping
-- in general, this means adding a primary key
-- to the GROUP BY statement, even if it is not displayed
-- for example:
SELECT Categories.CategoryName,
AVG(Products.unitprice) AS [average unit price]
FROM Products
JOIN Categories ON Products.CategoryID = Categories.CategoryID
GROUP BY Categories.CategoryName,
Categories.CategoryID
ORDER BY Categories.CategoryName,
Categories.CategoryID
-- or another example
SELECT Suppliers.CompanyName,
AVG(Products.unitprice) AS [average unit price]
FROM Products
JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID
GROUP BY Suppliers.CompanyName,
Suppliers.SupplierID
ORDER BY Suppliers.CompanyName,
Suppliers.SupplierID
-- notice in the second example that there are no duplicate
-- company names
-- so should we still add the SupplierID to the group by?
-- absolutely
-- your query needs to work on any data that might be in the database
-- not just what's there now
-- now for multi-level grouping
-- consider this query
SELECT Suppliers.CompanyName,
Categories.CategoryName,
Products.ProductName,
Products.Unitprice
FROM Suppliers
JOIN Products ON Suppliers.SupplierID =Products.SupplierID
JOIN Categories ON Categories.CategoryID=Products.CategoryID
ORDER BY Suppliers.CompanyName,
Categories.CategoryName
-- notice that Exotic Liquids supplies us products in two categories
-- same with Forets...
-- so, let's calculate average unitprice by Supplier AND Category
SELECT Suppliers.CompanyName,
Categories.CategoryName,
AVG(Products.Unitprice) AS [avg unit price]
FROM Suppliers
JOIN Products ON Suppliers.SupplierID =Products.SupplierID
JOIN Categories ON Categories.CategoryID=Products.CategoryID
GROUP BY Suppliers.CompanyName,
Categories.CategoryName
ORDER BY Suppliers.CompanyName,
Categories.CategoryName
-- notice that now Exotic liquids has two groups
-- one for the Beverages category
-- and one for the Spreads category
-- so now, the grouping is based on the *combination* of the two fields
-- note that the order of the fields in the GROUP BY doesn't matter
-- but for clarity it is best to be consistent with the ORDER BY
-- (the order of the fields in the ORDER BY *does* matter)
-- and of course, to make sure our groupings are *unique* we should add
-- a unique field to the GROUP BY
SELECT Suppliers.CompanyName,
Categories.CategoryName,
AVG(Products.Unitprice) AS [avg unit price]
FROM Suppliers
JOIN Products ON Suppliers.SupplierID =Products.SupplierID
JOIN Categories ON Categories.CategoryID=Products.CategoryID
GROUP BY Suppliers.CompanyName,
Categories.CategoryName,
Suppliers.SupplierID,
Categories.CategoryID
ORDER BY Suppliers.CompanyName,
Categories.CategoryName,
Suppliers.SupplierID,
Categories.CategoryID
-- on a side note
-- it's also important to have a unique ordering
-- so that downstream consumers of your data aren't surprised
-- when the ordering changes
-- note that very different information that is provided if
-- the column-ordering, and the row-ordering is reversed
SELECT Categories.CategoryName,
Suppliers.CompanyName,
AVG(Products.Unitprice) AS [avg unit price]
FROM Suppliers
JOIN Products ON Suppliers.SupplierID =Products.SupplierID
JOIN Categories ON Categories.CategoryID=Products.CategoryID
GROUP BY Categories.CategoryName,
Suppliers.CompanyName,
Suppliers.SupplierID,
Categories.CategoryID
ORDER BY Categories.CategoryName,
Suppliers.CompanyName,
Suppliers.SupplierID,
Categories.CategoryID
-- all the data in the result set is the same as the previous example
-- in the first example, the focus was on the Supplier
-- and secondarily, on the categories
-- in the second example, the focus is on the Category
-- and secondarily, on the Suppliers
-- in summary, make sure that your GROUP BY statement
-- specifies a unique grouping
-- this is generally done by adding the primary key
-- you can specify multiple columns in the GROUP BY
-- then groups are determined by *unique combinations*
-- of the values in the columns
-- finally, merely changing the column ordering
-- and the row ordering can produce quite different
-- information for human use
-- Database by Doug
-- Douglas Kline
-- 2/4/2016
-- GROUP BY - Part 3 uniqueness & multi-column Grouping
No comments:
Post a Comment