Wednesday, February 10, 2016

Group By Part 2: Null Handling and Count

A short video covering the GROUP BY clause in SQL.

Part 2: covering how NULL values are handled and Counts For beginners.


Here is the SQL that goes with the demo.

-- Database by Doug
-- Douglas Kline
-- 2/4/2016
-- GROUP BY - Part 2 NULL Handling & Count
USE Northwind

-- see GROUP BY - Part 1 Introduction
-- what about NULL values? How are they handled?

SELECT SUM(unitprice)/COUNT(ProductID)                AS [manual average unitprice],
       AVG(unitprice)                                 AS [correct average unitprice],
       SUM(unitprice)/COUNT(ProductID)-AVG(unitprice) AS [difference]
FROM   products

-- normally, you would expect these two averages to agree
-- but I've added a product with a NULL unitprice:

SELECT *
FROM   Products
WHERE  unitprice IS NULL

-- so let's look at all the pieces to see what's happening

SELECT SUM(unitprice)                  AS [total of all unitprices],
       COUNT(ProductID)                AS [number of records],
       SUM(unitprice)/COUNT(ProductID) AS [manual average unitprice],
       AVG(unitprice)                  AS [correct average unitprice]
FROM   products

-- the problem here is that there are 79 total records
-- but only 78 non-NULL unitprices
-- the AVG function handles the NULL appropriately
-- and only divides by 78

-- in general, only non-NULL values are included in MIN, MAX, SUM, STDEV, VAR, AVG, etc.

-- SO, depend on the functions, rather than writing your own
-- they will appropriately handle NULL values
SELECT      AVG(unitprice),
            SUM(unitprice),
            STDEV(unitprice),
            VAR(unitprice)
FROM        Products

-- COUNT needs more explanation
-- note the difference in the two number returned below:

SELECT   COUNT(ProductID)   AS [number of non-NULL productIDs],
         COUNT(unitprice)   AS [number of non-NULL unitPrices]
FROM     Products

-- COUNT requires a field to be specified
-- it will count records that have a non-NULL value in that field

-- but what about this?

SELECT   COUNT(*)          AS [count of products]
FROM     Products

-- recall that an asterisk in the SELECT clause is shorthand for 'all fields'
-- so the above can be interpreted as 
-- "check all fields for non-NULL values, and count if any field has a non-NULL value"

-- of course, if a table has a primary key, then the table can be guaranteed
-- to have at least one field that is non-NULL (the primary key field)

-- for code clarity, the asterisk should be avoided
-- it is much clearer to count the primary key of the entity that you intend to count

-- as an example:
SELECT         COUNT(*)    AS [what is being counted here?]
FROM           Products  
   INNER JOIN  Categories  ON Products.CategoryID = Categories.CategoryID

-- this is much clearer
-- we're counting products, because we are counting the primary key
-- of the products table
SELECT         COUNT(Products.ProductID) AS [product count]
FROM           Products  
   INNER JOIN  Categories  ON Products.CategoryID = Categories.CategoryID

-- but why are their only 77 records, and not 79?
-- because only 77 products satisfy the JOIN with Categories:

SELECT      ProductID,
            CategoryID
FROM        Products
ORDER BY    CategoryID

-- Database by Doug
-- Douglas Kline
-- 2/4/2016
-- GROUP BY - Part 2 NULL Handling & Count

No comments:

Post a Comment

Followers