A short video covering the GROUP BY clause in SQL.
For beginners.Here is the SQL that goes with the demo.
-- Database by Doug
-- Douglas Kline
-- 2/4/2016
-- GROUP BY - Part 1 Introduction
USE Northwind
-- here's a simple SELECT statement from the Products table
SELECT ProductID,
productname,
unitprice,
UnitsInStock
FROM Products
ORDER BY ProductID
-- notice that I get all the rows, because there is not a WHERE clause
-- if there were 10 million rows, I would see all of them
-- but sometimes I'd like to see a *summary* of the products
-- consider this:
SELECT AVG(unitprice) AS [average unit price]
FROM Products
-- in this case, we have summarized all the prices of all products
-- into a single number - the average
-- this summarization is known as "aggregating" or "aggregation"
-- we aggregate all the product records into a single group,
-- then we can ask questions about the group
-- what else can we find out about the group?
SELECT AVG(unitprice) AS [average unit price],
MIN(unitprice) AS [minimum unit price],
MAX(reorderlevel) AS [maximum reorder level],
COUNT(productID) AS [number of products],
SUM(unitsinStock) AS [total number of units in stock],
STDEV(unitprice) AS [standard deviation of unit price],
VAR(unitprice) AS [variance of unit price]
FROM Products
-- we can also include expressions in an aggregate function
SELECT SUM(unitprice * unitsInStock) AS [total value of inventory]
FROM Products
-- in the above examples we were aggregating over the whole table
-- but you can limit the aggregation
SELECT AVG(UnitPrice) AS [average unit price]
FROM Products
WHERE CategoryID IN (2, 4, 6)
-- in this example, the AVG is calculated over
-- products in the stated categories
-- think of the above example as happening in this order
-- 1 - the WHERE clause - records are eliminated first
-- 2 - the averaging - aggregate calculated over remaining records
-- now consider these two queries:
SELECT AVG(unitprice) AS [Avg price of category 1 products]
FROM Products
WHERE categoryID = 1
SELECT AVG(unitprice) AS [Avg price of category 2 products]
FROM Products
WHERE categoryID = 2
-- wouldn't it be nice to have these in the same table?
-- or a table that shows a list of categories, and the average for each category?
-- like this:
--**********************
--* CategoryID * Avg *
--**********************
--* 1 * 31.8 *
--* 2 * 23.1 *
--* 3 * 15.2 *
--* ... * ... *
--**********************
-- here it is:
SELECT CategoryID,
AVG(unitprice) AS [Avg unitprice]
FROM Products
GROUP BY CategoryID
ORDER BY CategoryID
-- the new part is the GROUP BY clause
-- it has to be before the ORDER BY, and after the WHERE
-- the logic is this:
-- arrange the products into groups, using the CategoryID
-- all CategoryID =1 records go into a group
-- all CategoryID =2 records go into a group
-- etc.
-- now, answer a question about each group - what is the average unit price?
SELECT CategoryID,
AVG(unitprice) AS [Avg unitprice]
FROM Products
GROUP BY CategoryID
ORDER BY CategoryID
-- and once the grouping is established, I can ask other aggregate questions:
SELECT CategoryID,
AVG(unitprice) AS [average unit price],
MIN(unitprice) AS [minimum unit price],
MAX(reorderlevel) AS [maximum reorder level],
COUNT(productID) AS [number of products],
SUM(unitsinStock) AS [total number of units in stock],
STDEV(unitprice) AS [standard deviation of unit price],
VAR(unitprice) AS [variance of unit price]
FROM Products
GROUP BY CategoryID
ORDER BY CategoryID
--
-- so how can we visualize what's going on?
SELECT CategoryID
FROM Products
ORDER BY CategoryID
-- in the above, notice the groupings of
-- the records, by categoryID
-- first two records are one group, with a NULL categoryID
-- next twelve records are the next group, where categoryID=1
-- etc.
-- now group them
SELECT CategoryID
FROM Products
GROUP BY CategoryID
ORDER BY CategoryID
-- each of the 9 resulting records represents
-- a group of products that have a common categoryID
-- now there are groups of products
-- one group for each different categoryID
-- (you can also do this with DISTINCT,
-- but that's for another video)
-- now let's add some attributes that we're interested in
SELECT CategoryID,
ProductID,
ProductName,
Unitprice
FROM Products
ORDER BY CategoryID,
ProductID
-- looking at the left-most column,
-- we can see values that will become groups
-- the two NULL categoryIDs will become a group
-- the next twelve records with categoryID=1 will become the next group
-- we're looking to get the average unit price by category
-- so let's pare down to just the columns we need
SELECT CategoryID,
Unitprice
FROM Products
ORDER BY CategoryID
-- so let's "convert" the above into an aggregate
-- and I'm expecting that there will be two products for categoryID NULL
-- and twelve products for categoryID=1
SELECT CategoryID,
AVG(unitprice) AS [average unit price] -- the AVG function is new
FROM Products
GROUP BY CategoryID -- the GROUP BY is new
ORDER BY CategoryID
-- the changes above are:
-- * add the GROUP BY clause with the CategoryID
-- * add the AVG function to the unitprice
-- in the english language, a common idiom is to ask for
-- "a count of products by category" or
-- "the average unit price by supplier"
-- when someone asks "... by category"
-- they usually mean that the category should be in the leftmost column
-- and that the records should be GROUP BY'ed category
-- and that the records should be ORDER BY'ed category
-- a couple of examples
-- Manager Bob: "show me the total value of inventory by category"
-- here's what Manager Bob most likely wants (but avoiding JOINs for now):
SELECT CategoryID,
SUM(unitprice * unitsinstock) AS [value of inventory]
FROM Products
GROUP BY CategoryID
ORDER BY CategoryID
-- Manager Barb: "show me the lowest unitprice of products by supplier"
-- here's what Manager Barb most likely wants (but avoiding JOINs again...):
SELECT SupplierID,
MIN(unitprice) AS [lowest unitprice]
FROM Products
GROUP BY SupplierID
ORDER BY SupplierID
-- more clearly
-- "by Supplier"
SELECT SupplierID, -- supplier in first column
MIN(unitprice) AS [lowest unitprice]
FROM Products
GROUP BY SupplierID -- supplier is the basis for grouping
ORDER BY SupplierID -- records ordered by supplier
-- throwing in a JOINs for clearer suppliers:
SELECT Suppliers.CompanyName,
MIN(Products.unitprice) AS [lowest unitprice]
FROM Products
JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID
GROUP BY Suppliers.CompanyName,
Suppliers.SupplierID -- this is important, but not covered here
ORDER BY Suppliers.CompanyName
-- Database by Doug
-- Douglas Kline
-- 2/4/2016
-- GROUP BY introduction
No comments:
Post a Comment