A short video covering the GROUP BY clause in SQL.
For beginners.Here is the SQL that goes with the demo.
-- Database by Doug -- Douglas Kline -- 2/4/2016 -- GROUP BY - Part 1 Introduction USE Northwind -- here's a simple SELECT statement from the Products table SELECT ProductID, productname, unitprice, UnitsInStock FROM Products ORDER BY ProductID -- notice that I get all the rows, because there is not a WHERE clause -- if there were 10 million rows, I would see all of them -- but sometimes I'd like to see a *summary* of the products -- consider this: SELECT AVG(unitprice) AS [average unit price] FROM Products -- in this case, we have summarized all the prices of all products -- into a single number - the average -- this summarization is known as "aggregating" or "aggregation" -- we aggregate all the product records into a single group, -- then we can ask questions about the group -- what else can we find out about the group? SELECT AVG(unitprice) AS [average unit price], MIN(unitprice) AS [minimum unit price], MAX(reorderlevel) AS [maximum reorder level], COUNT(productID) AS [number of products], SUM(unitsinStock) AS [total number of units in stock], STDEV(unitprice) AS [standard deviation of unit price], VAR(unitprice) AS [variance of unit price] FROM Products -- we can also include expressions in an aggregate function SELECT SUM(unitprice * unitsInStock) AS [total value of inventory] FROM Products -- in the above examples we were aggregating over the whole table -- but you can limit the aggregation SELECT AVG(UnitPrice) AS [average unit price] FROM Products WHERE CategoryID IN (2, 4, 6) -- in this example, the AVG is calculated over -- products in the stated categories -- think of the above example as happening in this order -- 1 - the WHERE clause - records are eliminated first -- 2 - the averaging - aggregate calculated over remaining records -- now consider these two queries: SELECT AVG(unitprice) AS [Avg price of category 1 products] FROM Products WHERE categoryID = 1 SELECT AVG(unitprice) AS [Avg price of category 2 products] FROM Products WHERE categoryID = 2 -- wouldn't it be nice to have these in the same table? -- or a table that shows a list of categories, and the average for each category? -- like this: --********************** --* CategoryID * Avg * --********************** --* 1 * 31.8 * --* 2 * 23.1 * --* 3 * 15.2 * --* ... * ... * --********************** -- here it is: SELECT CategoryID, AVG(unitprice) AS [Avg unitprice] FROM Products GROUP BY CategoryID ORDER BY CategoryID -- the new part is the GROUP BY clause -- it has to be before the ORDER BY, and after the WHERE -- the logic is this: -- arrange the products into groups, using the CategoryID -- all CategoryID =1 records go into a group -- all CategoryID =2 records go into a group -- etc. -- now, answer a question about each group - what is the average unit price? SELECT CategoryID, AVG(unitprice) AS [Avg unitprice] FROM Products GROUP BY CategoryID ORDER BY CategoryID -- and once the grouping is established, I can ask other aggregate questions: SELECT CategoryID, AVG(unitprice) AS [average unit price], MIN(unitprice) AS [minimum unit price], MAX(reorderlevel) AS [maximum reorder level], COUNT(productID) AS [number of products], SUM(unitsinStock) AS [total number of units in stock], STDEV(unitprice) AS [standard deviation of unit price], VAR(unitprice) AS [variance of unit price] FROM Products GROUP BY CategoryID ORDER BY CategoryID -- -- so how can we visualize what's going on? SELECT CategoryID FROM Products ORDER BY CategoryID -- in the above, notice the groupings of -- the records, by categoryID -- first two records are one group, with a NULL categoryID -- next twelve records are the next group, where categoryID=1 -- etc. -- now group them SELECT CategoryID FROM Products GROUP BY CategoryID ORDER BY CategoryID -- each of the 9 resulting records represents -- a group of products that have a common categoryID -- now there are groups of products -- one group for each different categoryID -- (you can also do this with DISTINCT, -- but that's for another video) -- now let's add some attributes that we're interested in SELECT CategoryID, ProductID, ProductName, Unitprice FROM Products ORDER BY CategoryID, ProductID -- looking at the left-most column, -- we can see values that will become groups -- the two NULL categoryIDs will become a group -- the next twelve records with categoryID=1 will become the next group -- we're looking to get the average unit price by category -- so let's pare down to just the columns we need SELECT CategoryID, Unitprice FROM Products ORDER BY CategoryID -- so let's "convert" the above into an aggregate -- and I'm expecting that there will be two products for categoryID NULL -- and twelve products for categoryID=1 SELECT CategoryID, AVG(unitprice) AS [average unit price] -- the AVG function is new FROM Products GROUP BY CategoryID -- the GROUP BY is new ORDER BY CategoryID -- the changes above are: -- * add the GROUP BY clause with the CategoryID -- * add the AVG function to the unitprice -- in the english language, a common idiom is to ask for -- "a count of products by category" or -- "the average unit price by supplier" -- when someone asks "... by category" -- they usually mean that the category should be in the leftmost column -- and that the records should be GROUP BY'ed category -- and that the records should be ORDER BY'ed category -- a couple of examples -- Manager Bob: "show me the total value of inventory by category" -- here's what Manager Bob most likely wants (but avoiding JOINs for now): SELECT CategoryID, SUM(unitprice * unitsinstock) AS [value of inventory] FROM Products GROUP BY CategoryID ORDER BY CategoryID -- Manager Barb: "show me the lowest unitprice of products by supplier" -- here's what Manager Barb most likely wants (but avoiding JOINs again...): SELECT SupplierID, MIN(unitprice) AS [lowest unitprice] FROM Products GROUP BY SupplierID ORDER BY SupplierID -- more clearly -- "by Supplier" SELECT SupplierID, -- supplier in first column MIN(unitprice) AS [lowest unitprice] FROM Products GROUP BY SupplierID -- supplier is the basis for grouping ORDER BY SupplierID -- records ordered by supplier -- throwing in a JOINs for clearer suppliers: SELECT Suppliers.CompanyName, MIN(Products.unitprice) AS [lowest unitprice] FROM Products JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID GROUP BY Suppliers.CompanyName, Suppliers.SupplierID -- this is important, but not covered here ORDER BY Suppliers.CompanyName -- Database by Doug -- Douglas Kline -- 2/4/2016 -- GROUP BY introduction
No comments:
Post a Comment