Saturday, October 20, 2018

Using ANY and ALL in SQL

-- Database by Doug
-- Douglas Kline
-- 10/19/2018
-- ANY and ALL

-- comparing a value to a column of values
-- (aka acomparing a scalar to a vector of scalars)

-- consider this setup for testing
-- if the WHERE condition is true or false

SELECT 1       AS [isTrue?]
WHERE  10 = 10 -- testing if this is true or false

SELECT 1       AS [isTrue?]
WHERE  10 <> 10 -- testing if this is true or false

-- we get a record if it is true, otherwise no record

-- also, see how I can create a literal table

SELECT   tempField
FROM     (VALUES(11),(12),(7)) tempTable(tempField)

-- note that this creates a single column of values
-- which could be used in something like IN
-- for example
SELECT   1
WHERE    12 IN    (  SELECT   tempField
                     FROM     (VALUES(11),(12),(7)) tempTable(tempField))

-- I could rephrase this as:
SELECT   1
WHERE    12 = ANY (  SELECT   tempField
                     FROM     (VALUES(11),(12),(7)) tempTable(tempField))

-- back to the first example:
SELECT 1          AS [isTrue?]
WHERE  10 > 11    

-- I'm comparing a single scalar value, 10, with another
-- single scalar value, 11

-- now consider if I want to compare the value 10
-- to multiple other values
SELECT 1 
WHERE  10 > 11
  OR   10 > 12
  OR   10 > 7

-- this is three logical expressions
-- combined with OR into the whole logical expression
-- so if ANY of them are true
-- the whole logical expression is true 

-- now let's say that the values 11, 12, 7 are in a column
-- since we're in an RDBMS, they are *likely* to be in a column

-- I can rephrase this with ANY like this:

SELECT 1
WHERE  10 > ANY (SELECT tempfield 
                 FROM (VALUES
                           (11),
                           (12),
                           (7)
                       ) AS tempTable(tempfield))

-- this is exactly equivalent to:

SELECT 1 
WHERE  10 > 11
  OR   10 > 12
  OR   10 > 7

-- back to the first example again...
-- back to the first example:
SELECT 1          AS [isTrue?]
WHERE  10 > 11    

-- let's say I want to do multiple comparisons again
-- but AND them together like this:

SELECT 1 
WHERE  10 > 11    -- logical expression 1
  AND  10 > 12    -- logical expression 2
  AND  10 > 7     -- logical expression 3

-- this is three logical expressions
-- combined with ANDs into the whole WHERE
-- logical expression
-- so all three expressions must be true
-- for the WHERE clause to be true

-- I can make it true with different values:
SELECT 1 
WHERE  10 > 9    -- logical expression 1
  AND  10 > 8    -- logical expression 2
  AND  10 > 7    -- logical expression 3

-- I can rephrase this with ALL
-- with the values in a column

SELECT 1
WHERE  10 > ALL (SELECT tempfield 
                 FROM (VALUES
                           (11),
                           (12),
                           (7)
                       ) AS tempTable(tempfield))
 
 -- and get it to be true with a change to the values
 SELECT 1
WHERE  10 > ALL (SELECT tempfield 
                 FROM (VALUES
                           (9),
                           (8),
                           (7)
                       ) AS tempTable(tempfield))

-- so you can think of ANY and ALL 
-- as comparison operator modifiers

-- comparison operators usually take a scalar value on each side

-- scalar  scalar

--  3 < 7
--  3 > 7
--  3 = 7
--  3 != 7
--  3 <> 7
--  3 <= 7
--  3 >= 7

-- ANY and ALL take a scalar on the left, and a column on the right

-- scalar  ANY column
-- scalar  ALL column
-- 3 < ANY ((1),(2),(3))
-- 3 >= ALL ((1),(2),(3))

-- and the column is generally created with a SELECT statement

-- scalar  ANY (SELECT  FROM...)
-- scalar  ALL (SELECT  FROM...)

-- so here's a more concrete example using the Northwind database Orders table

-- suppose we want to know a list of customers who paid more than
-- $200 on freight on an order in 1996?
-- in other words $200 < ANY(orders in 1996)

SELECT   CompanyName
FROM     Customers
WHERE    $200 < ANY (SELECT freight
                     FROM    Orders
                     WHERE   Orders.CustomerID = Customers.CustomerID
                        AND  YEAR(Orderdate) = 1996)
ORDER BY CompanyName

-- notice that this is a correlated subquery
-- it refers to the outer query (Customers.customerID)

-- also notice that this could be rephrased
-- with a JOIN DISTINCT

SELECT   DISTINCT Customers.CompanyName
FROM     Customers
   JOIN  Orders      ON Customers.CustomerID = Orders.CustomerID
WHERE    Orders.Freight > $200
   AND   YEAR(Orders.OrderDate) = 1996
ORDER BY CompanyName

-- is there a difference? why one and not the other
-- it depends

-- depending on your situation, one SQL phrasing might 
-- be clearer than another

-- depending on your data model, volume, statistics, indexes, etc.
-- one might be faster than the other
-- in other words, you might get different query plans

-- in general, I recommend writing your SQL in the clearest
-- manner possible
-- then carefully rephrasing to a better performing, but
-- perhaps less understandable, form if there is a performance issue

-- in summary,
-- ANY and ALL modify comparison operators
-- they succinctly compare a single scalar to
-- a column of scalars

-- thanks for watching!

-- Database by Doug
-- Douglas Kline
-- 10/19/2018
-- ANY and ALL


Monday, October 15, 2018

Using the CAST() function in SQL

-- Database by Doug
-- Douglas Kline
-- 10/10/2018
-- CAST - converting to a new data type

-- sometimes you need to change one data type to another data type

-- consider this:

SELECT '4.0'

-- I might want to deal with this as a number 
-- for example

SELECT '4.0' + 2.0

-- this works
-- even though they '4.0' is a varchar 
-- and 2.0 is a floating point number

-- the db does an *implicit* conversion
-- of the '4.0' to a floating point
-- then does the addition
-- and returns a floating point

-- we hope the db "knows what I mean"
-- and are *assuming* it ends up doing the right thing

-- for simple things, this mostly works
-- the db is pretty smart

-- however, high quality code doesn't normally
-- make assumptions, so let's be *explicit*

SELECT CAST('4.0' AS float) + 2.0

-- here, I'm using the CAST function to
-- *explicitly* change a varchar to a float
-- I'm not relying on the db "knowing what I mean"

-- you can generally CAST between data types
-- fairly freely

-- see the full matrix of allowable conversions
-- for SQL Server 17
-- here: https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine?view=sql-server-2017

-- note that some to/from conversions are implicit / automatic
-- some are not allowed at all 
-- and some require explicit CASTs

-- here are a few common conversions you might want to do


-- converting numeric and dates to varchars
-- especially when needing to concatenat

SELECT 'Doug' + 1 AS [trying for Doug1]
SELECT 1 + 'Doug' AS [trying for 1Doug]
-- and the fixes
SELECT 'Doug' + CAST(1 AS VARCHAR)
SELECT CAST(1 AS VARCHAR) + 'Doug'

SELECT GETDATE() + 'Doug' -- error
SELECT CAST(GETDATE() AS VARCHAR) + 'Doug'

-- forcing specific types of operations

-- consider this

SELECT 3 / 2

-- notice that I get *integer* division
-- because both operands are integers

-- but what if I want to see 1.5 as the result?
-- I can fix literals easy enough
SELECT 3.0 / 2.0

-- but what about this:
SELECT UnitsInStock / unitsonorder
FROM   Products
WHERE  unitsOnorder <> 0

-- I'm getting integer division

-- here's how to get floating point division:

SELECT CAST(UnitsInStock AS FLOAT) / CAST(unitsonorder AS FLOAT)
FROM   Products
WHERE  unitsOnorder <> 0

-- another common conversion is from varchar to date/time

-- see the differences here:
SELECT '4/8/2018'
SELECT CAST('4/8/2018' AS DATE)
SELECT CAST('4/8/2018' AS DATETIME2)

-- and here:
SELECT '20180408 11:00'
SELECT CAST('20180408 11:00' AS DATE)
SELECT CAST('20180408 11:00' AS DATETIME2)

-- summary
-- for 'throw-away' code, the db will implicitly convert
-- for high quality code in a system
-- you should not rely on implicit conversions
-- instead, use CAST

-- Database by Doug
-- Douglas Kline
-- 10/10/2018
-- CAST - converting to a new data type

Wednesday, October 10, 2018

Using DISTINCT in SQL

-- Database by Doug
-- Douglas Kline
-- 10/9/2018
-- DISTINCT - removing duplicates, but in a "dumb" way

-- consider this

SELECT ProductID,
       ProductName,
       SupplierID
FROM   Products

-- this is a list of all products
-- note the repeats in the SupplierID

-- suppose I want top know the list of suppliers
-- in the Products table

SELECT   SupplierID
FROM     Products

-- again, notice the repeats
-- to remove the repeats, I can do this:

SELECT   DISTINCT SupplierID
FROM     Products

-- you might be saying, why not just do this:
SELECT   SupplierID
FROM     Suppliers

-- my response is: that's a different list
-- it's the list of all suppliers in the supplier table

-- What I'm looking for is the list of all suppliers
-- in the products table
-- in other words, all suppliers which we actually *use*

-- Let's look at another example, with a table I've created

SELECT   firstName
FROM     Person
ORDER BY firstName 

-- this is a list of all people's first names
-- as we scroll down, we'll start to see repeats
-- in other words, multiple people have the same first name

-- now let's say we want a list of all first distinct firstnames

SELECT   DISTINCT firstName
FROM     Person
ORDER BY firstName 

-- so now there isn't a record returned for every Person record
-- there's a record returned for every unique firstname
-- also note that NULL is considered to be a unique firstname

-- you might say, why not use Group By to do this?
-- like this:

SELECT   firstName
FROM     Person
GROUP BY firstName
ORDER BY firstName

-- logically, it returns the exact same records, and always will
-- however, GROUP BY does a lot more work
-- it actually sets up groups of records in preparation to 
-- calculate aggregates like SUM, COUNT, AVG, etc.

-- DISTINCT is much faster 
-- if it sees a value it has seen before, it just throws it out
-- in other words, it doesn't group the records
-- it just makes a list of unique values

-- so, don't use GROUP BY when what you really need is DISTINCT

-- alright, what if you want to count stuff?
SELECT   COUNT(ID)                  AS [# of people],
         COUNT(DISTINCT ID)         AS [# of distinct primary key values],
         COUNT(firstName)           AS [# of people with non-NULL firstnames],
         COUNT(DISTINCT firstName)  AS [# of distinct firstNames]
FROM     Person


-- note that the first two values are always the same, 
-- since primary key values are distinct aka unique

-- and also notice that there are 599 distinct first names, but recall

SELECT   DISTINCT firstName
FROM     Person
ORDER BY firstName 

-- so why is the COUNT(DISTINCT firstname) = 599
-- but DISTINCT firstname gives 600 records?

-- remember that COUNT counts non-NULL values

-- finally, DISTINCT is sort of "dumb", in that it doesn't
-- know anything about primary keys
-- or anything about the underlying table(s)
-- it only considers values from the fields you provide

-- consider this:

SELECT   DISTINCT firstname
FROM     Person

-- it doesn't give distinct Person records, just distinct firstnames
-- now this:

SELECT   DISTINCT lastname
FROM     Person

-- and the distinct applies to the *combination* 
-- of all the fields in the SELECT clause
-- in this example,
-- all distinct *combinations* of gender and firstname are shown

SELECT   DISTINCT gender,
         firstname
FROM     Person
ORDER BY gender,
         firstname

-- in summary,
-- distinct removes duplicates
-- it removes duplicates based on all fields in the SELECT list
-- when used with COUNT, it will not count duplicate values

-- thanks for watching!

-- Database by Doug
-- Douglas Kline
-- 10/9/2018
-- DISTINCT - removing duplicates, but in a "dumb" way



Thursday, October 4, 2018

Numeric Expressions in SQL


-- Database by Doug
-- Douglas Kline
-- 8/28/2018
-- Numeric Expressions - expressions that return a numeric data type

-- beginner level
-- simple numeric expression examples
-- and things to keep in mind

-- I might say 'db'
-- which is shorthand for 'database engine'
-- this is the server software that interprets
--   your SQL and performs actions

-- you can ask your database engine (db) to be your calculator:
SELECT   1 + 2 

-- if you are familar with spreadsheets
-- and how to type formulas
-- it is very similar

SELECT   1 + 2 

-- an expression is something that the db
-- will read, translate, and perform
-- so in the expression above
-- it reads a 1, then a plus sign, then a 2
-- interprets it as addition
-- performs the addition
-- and returns the results
--------------------------------------------------
SELECT   1 + 2

-- there are certain items in the expression

-- the plus sign (+) is an *operator*
-- more specifically, it is a *binary* operator
-- it defines an operation on two items (thus binary)
-- more specifically, the items on its left and right

-- the plus sign says to the db:
-- take the items on my left and right and add them
-- it is an instruction to the db

SELECT   1 + 2

-- the 1 and 2 are the *operands* of the operator
-- they are the items that the operator applies to

-- the 1 and 2 are also examples of *literals*
-- in other words 1 literally means the number 1
-- in coding, it means: not code, a value

-- here is an example of a unary operator (one operand)
SELECT  -5

-- here, the literal value 5
-- is modified to become negative
-- by the minus sign operator

------------------------------------------------------------
-- how does the db know what *is* an expression, and what *isn't*
-- in this example
SELECT   1 + 2

-- the db is *expecting* an expression to be in this position
-- (after the word SELECT)
-- so expressions are mainly known by their
-- location/position in your SELECT statement

---------------------------------------------------------
-- notice that usual order of operations is followed
-- * and / come before + and -
-- otherwise it is left-to-right,
-- unless you add parentheses
-- then sub-expressions in parentheses are evaluated first
SELECT 1 + 2 * 6
SELECT (1 + 2) * 6
SELECT (3 - 1) / 2

------------------------------------------------------
-- notice that operators sometimes change meaning
-- depending on their context

SELECT 1 / 3      -- does integer division
SELECT 1.0 / 3.0  -- floating point division
SELECT 1 / 3.0    -- floating point division
SELECT 1.0 / 3    -- floating point division

-- if both operands (items on either side of the operator)
-- are whole numbers, do integer division
-- otherwise do floating point division

-----------------------------------------------------------------------------
-- here's an operator that might be new to you: %
SELECT 5 % 3 -- modulus operator - gives the remainder after integer division
SELECT 8 % 3 -- same answer

SELECT (8 / 3) * 3 + 8 % 3 -- equals 8

-----------------------------------------------------------------------------
-- what if your data is not a literal?
-- in other words, it comes out of the db

SELECT   unitsInstock -- this is an expression, just very simple
FROM     Products

SELECT   unitsInStock + unitsOnOrder -- effective inventory
FROM     Products

-- plus is still the operator
-- the operands are UnitsInStock and unitsOnOrder
-- the expression is evaluated on each record, separately

-----------------------------------------------------
-- but what if the data type is not right for
--   the operations I want to do?
-- specifically, the data is whole numbers
-- but I want to do floating point division
SELECT   unitsInStock,
         reOrderLevel,
         unitsinstock / reOrderLevel
FROM     Products
WHERE    reOrderLevel <> 0  -- to avoid division by zero

-- but I'd really like to see the results as a floating point/decimal

-- you have to convert *prior* to dividing

SELECT   unitsInStock,
         reOrderLevel,
         CONVERT(real,unitsinstock) / reOrderLevel
FROM     Products
WHERE    reOrderLevel <> 0  -- to avoid division by zero

-- the unitsInStock is converted to a real (aka floating point number)
-- before it gets divided
-- since one of the items operands
-- is a floating point number, it does floating point division

-------------------------------------------------------------
-- perhaps a more realistic and useful expression
SELECT      ProductID,
            ProductName,
            unitsInStock * unitprice AS [Dollar Value in Inventory]
FROM        Products
ORDER BY    ProductID

-- note that unitprice is a money data type,
-- and the unitsInstock is a whole number
-- so the result is shown as a money data type
-- (note the 2 digits to the right of the decimal)

-- we can show this explicitly:
SELECT      SQL_VARIANT_PROPERTY(unitPrice, 'BaseType'),
            SQL_VARIANT_PROPERTY(unitsInStock, 'BaseType'),
            SQL_VARIANT_PROPERTY(unitsInStock * unitPrice, 'BaseType')
FROM        Products

-----------------------------------------------------------------------
-- in summary
-- expressions are instructions for the db to create new values
-- expressions are known by their location in the statement
-- expressions have operators and operands

-- Database by Doug
-- Douglas Kline
-- 8/28/2018
-- Numeric Expressions

Followers