Monday, November 12, 2018

Understanding Relational Division in SQL



Here is the script to create the JobSkills database, with data, used in this example.

-- Database by Doug
-- Douglas Kline
-- 11/10/2018
-- Understanding Relational Division - a (relatively) simple example

USE JobSkills

-- show the diagram of the data model
-- explain

-- here are our applicants (note - 4 of them)

SELECT   *
FROM     Applicant

-- here are our skills (note - 4 of them)

SELECT   *
FROM     Skill

-- now what if every applicant had every skill
-- that would mean that there would be 16 (4 x 4)
-- records in the ApplicantSkill table
-- it would look something like this:

SELECT         *
FROM           Applicant
   CROSS JOIN  Skill
ORDER BY       Applicant.applicantID,
               Skill.skillID

-- if you haven't seen this before,
-- this is a cross join 
-- a cross join of two tables creates all possible combinations
-- of the records in the two tables
-- with no condition - no requirement on lining up FKs and PKs

-- this is known as the *cartesian product* of all records in both tables
-- and you might recall that *product* is another way of saying 
-- 'the result of multiplication'

-- in other words, CROSS JOIN is kind of *relational multiplication*
-- sort of (Applicant records) x (Skill records)

-- think of this in math terms a * b = c
-- where a is Applicants
-- b is Skills
-- and c is the result

-- now if you remember your algebra,
-- if a * b = c
-- that means that c / b = a

-- in our context 4 records times 4 records equals 16
-- so 16 divided by 4 should equal 4

-- let's try to do that!
-- I'm going to use a common table expression (CTE),
-- so you can tell there's nothing up my sleeve!
-- note that this is exactly the CROSS JOIN from above

; WITH [c] (applicantID, lastName, skillID, skillName)
AS
(
   SELECT         *
   FROM           Applicant
      CROSS JOIN  Skill
)
SELECT   applicantID,
         lastName
FROM     [c]
GROUP BY applicantID,
         lastName  
HAVING   COUNT(skillID) = (SELECT COUNT(*) FROM Skill)

-- this could also be rephrased as a subquery like this:

 SELECT   applicantID,
         lastName
FROM     (
            SELECT         *
            FROM           Applicant
               CROSS JOIN  Skill
          ) [c]
GROUP BY applicantID,
         lastName  
HAVING   COUNT(skillID) = (SELECT COUNT(*) FROM Skill)

-- the logic is that I'm showing all applicants
-- that have the same number of skills
-- as are in the skill table

-- so if CROSS JOIN is the "relational multiplication" operator
-- what is the "relational division" operator?

-- unfortunately, there is no relational division operator
-- that's why we are using HAVING and COUNT to get the logical equivalent

-- summary so far
-- 3 applicants CROSS JOIN 4 skills = 12 results, e.g., everyone has every skill
-- then 12 results divided by 4 skills = 3 applicants, the original applicants

-- but this is maybe an unrealistically simple example
-- not all applicants will have all skills

-- what we'll have in reality is a job that requires a certain set of skills
-- then we'll have applicants that each have a certain set of skills
-- and what we want is a set of applicants that whose skills *match* the requirements

-- here is the data model

-- let's start simply, with a single job that requires two skills
-- and a set of applicants that each have a set of skills

-- I've cooked some data for that...
-- examining jobskill...

SELECT   Job.jobID,
         Job.jobName,
         Skill.SkillID,
         Skill.SkillName
FROM     Job
   JOIN  JobSkill ON Job.jobID = JobSkill.jobID
   JOIN  Skill    ON JobSkill.skillID = Skill.skillID
WHERE    Job.JobID = 10
ORDER BY Job.jobID, 
         Skill.SkillID

-- this says that the software developer job 
-- requires the sql and javascript skills

-- now let's cook up some applicants with some skills

SELECT   Applicant.applicantID,
         Applicant.lastName,
         Skill.skillID,
         Skill.skillName
FROM     Applicant
   JOIN  ApplicantSkill ON Applicant.applicantID = ApplicantSkill.applicantID
   JOIN  Skill          ON ApplicantSkill.skillID = Skill.skillid
ORDER BY Applicant.applicantID,
         Skill.skillID

-- so it looks like viable candidates are Jones and Brown
-- Jones has exactly the right skills and no more
-- Brown has the right skills, plus one

-- so to find a candidate that satisfies our job...
-- a * b = c
-- candidateSkills = jobskills
-- candidates * skills = jobskills
-- candidates = jobskills / skills

-- this analogy is not perfect, in that * is not really a cross join
-- and candidateskills and jobskills are actual records, 
-- for which we are trying to find alignment via joins
--
-- we'll solve this with the COUNT method we used before

SELECT   Applicant.applicantID,
         Applicant.lastName,
         COUNT(jobSkill.SkillID) AS [MatchingSkillCount]
FROM     ApplicantSkill
   JOIN  JobSkill ON ApplicantSkill.skillID = JobSkill.SkillID
   JOIN  Applicant ON Applicant.ApplicantID = ApplicantSkill.ApplicantID
WHERE    JobSkill.JobID = 10
GROUP BY Applicant.applicantID,
         Applicant.lastName
HAVING   COUNT(jobSkill.SkillID) = (SELECT COUNT(*) FROM JobSkill WHERE JobSkill.jobID = 10)
ORDER BY Applicant.applicantID

-- note that the outer query limits the applicant skill count to only
-- skills that match exactly the skills for this job
-- and the subquery does the same

-- let's look at this in a more general sense:

SELECT   Job.jobID,
         Job.jobName,
         Skill.SkillID,
         Skill.SkillName
FROM     Job
   JOIN  JobSkill ON Job.jobID = JobSkill.jobID
   JOIN  Skill    ON JobSkill.skillID = Skill.skillID
ORDER BY Job.jobID, 
         Skill.SkillID

-- two jobs, each of which require two skills
-- disjoint skill set

-- same set of applicants
SELECT   Applicant.applicantID,
         Applicant.lastName,
         Skill.skillID,
         Skill.skillName
FROM     Applicant
   JOIN  ApplicantSkill ON Applicant.applicantID = ApplicantSkill.applicantID
   JOIN  Skill          ON ApplicantSkill.skillID = Skill.skillid
ORDER BY Applicant.applicantID,
         Skill.skillID

-- and the number of matching applicant skills for each job

SELECT   Applicant.applicantID,
         Applicant.lastName,
         Job.JobID,
         Job.jobName,
         COUNT(ApplicantSkill.SkillID) AS [MatchingJobSkillCount]
FROM     ApplicantSkill
   JOIN  JobSkill ON ApplicantSkill.skillID = JobSkill.SkillID -- this is *key*
   JOIN  Job      ON Job.jobID = JobSkill.JobID
   JOIN  Applicant ON Applicant.ApplicantID = ApplicantSkill.ApplicantID
GROUP BY Applicant.applicantID,
         Applicant.lastName,
         Job.JobID,
         Job.jobName
ORDER BY Applicant.applicantID,
         Job.JobID

-- now we'd like to limit this to the applicants with jobskill count
-- equal to the jobskill count of each job
-- the trick here is that the the HAVING subquery must be correlated

SELECT   Applicant.applicantID,
         Applicant.lastName,
         Job.JobID,
         Job.jobName,
         COUNT(ApplicantSkill.SkillID) AS [MatchingJobSkillCount]
FROM     ApplicantSkill
   JOIN  JobSkill ON ApplicantSkill.skillID = JobSkill.SkillID -- this is *key*
   JOIN  Job      ON Job.jobID = JobSkill.JobID
   JOIN  Applicant ON Applicant.ApplicantID = ApplicantSkill.ApplicantID
GROUP BY Applicant.applicantID,
         Applicant.lastName,
         Job.JobID,
         Job.jobName
HAVING   COUNT(ApplicantSkill.SkillID) = (SELECT COUNT(js.skillID)
                                          FROM   JobSkill js
                                          WHERE  js.jobID = job.jobID)
ORDER BY Applicant.applicantID,
         Job.JobID

-- perhaps this is clearer as a CTE:
; WITH [JobWithSkillCount] (jobID, skillCount)
AS
(
   SELECT   jobID,
            COUNT(skillID)
   FROM     JobSkill
   GROUP BY jobID 
)
SELECT   Applicant.applicantID,
         Applicant.lastName,
         Job.JobID,
         Job.jobName,
         COUNT(ApplicantSkill.SkillID) AS [MatchingJobSkillCount]
FROM     ApplicantSkill
   JOIN  JobSkill ON ApplicantSkill.skillID = JobSkill.SkillID -- this is *key*
   JOIN  Job      ON Job.jobID = JobSkill.JobID
   JOIN  Applicant ON Applicant.ApplicantID = ApplicantSkill.ApplicantID
   JOIN  [JobWithSkillCount] ON [JobWithSkillCount].jobID = job.jobID
GROUP BY Applicant.applicantID,
         Applicant.lastName,
         Job.JobID,
         Job.jobName,
         JobWithSkillCount.skillCount
HAVING   COUNT(ApplicantSkill.SkillID) = JobWithSkillCount.skillCount
ORDER BY Applicant.applicantID,
         Job.JobID

-- hopefully, you find this a helpful example for understanding relational division

-- A couple last comments

-- I've presented this from the employer point of view:
   -- applicants who match job requirements
-- but it is exactly equivalent form an applicant point of view:
-- jobs that match skillsets

-- I'm not showing the "remainder" of the division
-- for example, Brown actually has *extra* skills!
-- this example shows applicants with the exact matching skillsets
-- and some may have extra skills

-- this is not necessarily the *most efficient* way to
-- do relational division
-- my goal was to create an example for understanding relational division

-- an performant solution would of course depend on many things:
-- the data volume, its statistical distribution, available indexes,
-- the schema, etc.

-- I hope that you found this helpful!

-- Database by Doug
-- Douglas Kline
-- 11/10/2018
-- Understanding Relational Division - a (relatively) simple example



JobSkills database schema and data for Understanding Relational Division

The script below creates a database called JobSkills with the following schema:


The script also creates records in each table that are consistent with the following demonstration:
Understanding Relational Division in SQL


USE [JobSkills]
GO
/****** Object:  Table [dbo].[Applicant]    Script Date: 11/12/2018 9:10:48 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Applicant](
	[applicantID] [int] NOT NULL,
	[lastName] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Applicant] PRIMARY KEY CLUSTERED 
(
	[applicantID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [dbo].[ApplicantSkill]    Script Date: 11/12/2018 9:10:49 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ApplicantSkill](
	[employeeSkillID] [int] NOT NULL,
	[applicantID] [int] NOT NULL,
	[skillID] [int] NOT NULL,
 CONSTRAINT [PK_ApplicantSkill] PRIMARY KEY CLUSTERED 
(
	[employeeSkillID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [dbo].[Job]    Script Date: 11/12/2018 9:10:49 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Job](
	[jobID] [int] NOT NULL,
	[jobName] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Job] PRIMARY KEY CLUSTERED 
(
	[jobID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [dbo].[JobSkill]    Script Date: 11/12/2018 9:10:49 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[JobSkill](
	[jobSkillID] [int] NOT NULL,
	[jobID] [int] NOT NULL,
	[skillID] [int] NOT NULL,
 CONSTRAINT [PK_JobSkill] PRIMARY KEY CLUSTERED 
(
	[jobSkillID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [dbo].[Skill]    Script Date: 11/12/2018 9:10:49 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Skill](
	[skillID] [int] NOT NULL,
	[skillName] [varchar](50) NOT NULL,
 CONSTRAINT [PK_Skill] PRIMARY KEY CLUSTERED 
(
	[skillID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT [dbo].[Applicant] ([applicantID], [lastName]) VALUES (100, N'Kline')
GO
INSERT [dbo].[Applicant] ([applicantID], [lastName]) VALUES (101, N'Smith')
GO
INSERT [dbo].[Applicant] ([applicantID], [lastName]) VALUES (102, N'Jones')
GO
INSERT [dbo].[Applicant] ([applicantID], [lastName]) VALUES (103, N'Brown')
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1000, 100, 1)
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1001, 100, 2)
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1002, 101, 3)
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1003, 102, 1)
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1004, 102, 4)
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1005, 103, 1)
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1006, 103, 4)
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1007, 103, 3)
GO
INSERT [dbo].[ApplicantSkill] ([employeeSkillID], [applicantID], [skillID]) VALUES (1008, 103, 2)
GO
INSERT [dbo].[Job] ([jobID], [jobName]) VALUES (10, N'Software Developer')
GO
INSERT [dbo].[Job] ([jobID], [jobName]) VALUES (11, N'Business Analyst')
GO
INSERT [dbo].[Job] ([jobID], [jobName]) VALUES (12, N'Data Analyst')
GO
INSERT [dbo].[JobSkill] ([jobSkillID], [jobID], [skillID]) VALUES (100, 10, 1)
GO
INSERT [dbo].[JobSkill] ([jobSkillID], [jobID], [skillID]) VALUES (101, 10, 4)
GO
INSERT [dbo].[JobSkill] ([jobSkillID], [jobID], [skillID]) VALUES (102, 11, 2)
GO
INSERT [dbo].[JobSkill] ([jobSkillID], [jobID], [skillID]) VALUES (103, 11, 3)
GO
INSERT [dbo].[Skill] ([skillID], [skillName]) VALUES (1, N'SQL')
GO
INSERT [dbo].[Skill] ([skillID], [skillName]) VALUES (2, N'Public Speaking')
GO
INSERT [dbo].[Skill] ([skillID], [skillName]) VALUES (3, N'Project Management')
GO
INSERT [dbo].[Skill] ([skillID], [skillName]) VALUES (4, N'Javascript')
GO
ALTER TABLE [dbo].[ApplicantSkill]  WITH CHECK ADD  CONSTRAINT [FK_ApplicantSkill_Applicant] FOREIGN KEY([applicantID])
REFERENCES [dbo].[Applicant] ([applicantID])
GO
ALTER TABLE [dbo].[ApplicantSkill] CHECK CONSTRAINT [FK_ApplicantSkill_Applicant]
GO
ALTER TABLE [dbo].[ApplicantSkill]  WITH CHECK ADD  CONSTRAINT [FK_ApplicantSkill_Skill] FOREIGN KEY([skillID])
REFERENCES [dbo].[Skill] ([skillID])
GO
ALTER TABLE [dbo].[ApplicantSkill] CHECK CONSTRAINT [FK_ApplicantSkill_Skill]
GO
ALTER TABLE [dbo].[JobSkill]  WITH CHECK ADD  CONSTRAINT [FK_JobSkill_Job] FOREIGN KEY([jobID])
REFERENCES [dbo].[Job] ([jobID])
GO
ALTER TABLE [dbo].[JobSkill] CHECK CONSTRAINT [FK_JobSkill_Job]
GO
ALTER TABLE [dbo].[JobSkill]  WITH CHECK ADD  CONSTRAINT [FK_JobSkill_Skill] FOREIGN KEY([skillID])
REFERENCES [dbo].[Skill] ([skillID])
GO
ALTER TABLE [dbo].[JobSkill] CHECK CONSTRAINT [FK_JobSkill_Skill]
GO

Saturday, October 20, 2018

Finding Unmatched Records in SQL

-- Database by Doug
-- Douglas Kline
-- 10/19/2018
--
-- Finding unmatched records
-- LEFT JOIN... WHERE IS NULL

-- suppose you want to find records in one table 
-- that do not match records in another table

-- some practical examples:

-- products that are not in a category
--     (in Products table, but not in Categories)

-- Categories that have no products
--     (in Categories table, but not in Products table)

-- two main cases,
-- first case: look for the NULL

-- products that are not in a category
SELECT   ProductID
FROM     Products
WHERE    CategoryID IS NULL

-- this is fairly simple, because
-- everything we need is in a single table
-- we don't need to inspect another table
-- this is when the Foreign Key is
-- in the table of interest

-- Categories that have no products
-- this is tougher
-- since which Categories to display
-- depends on what's in the Products table

-- here's the pattern, then we'll build up 
-- to why it works

SELECT         Categories.CategoryName
FROM           Categories
   LEFT JOIN   Products ON Categories.CategoryID = Products.CategoryID
WHERE          Products.ProductID IS NULL

-- the pattern is
-- <table of interest> LEFT JOIN <unmatched table&gt
-- WHERE <unmatched table&gt.<primary key&gt IS NULL

-- consider this, normal join
SELECT         Categories.CategoryName,
               Categories.CategoryID,
               Products.CategoryID,
               Products.ProductID,
               Products.ProductName
FROM           Categories
        JOIN   Products ON Categories.CategoryID = Products.CategoryID
ORDER BY       Categories.CategoryName

-- note 77 records
-- now the LEFT JOIN

SELECT         Categories.CategoryName,
               Categories.CategoryID,
               Products.CategoryID,
               Products.ProductID,
               Products.ProductName
FROM           Categories
   LEFT JOIN   Products ON Categories.CategoryID = Products.CategoryID
ORDER BY       Categories.CategoryName

-- now 78 records
-- and note the Office Supplies category with no Product
-- the record has been replaced with a *NULL record*

-- the LEFT JOIN includes Categories without matching Products
-- Categories LEFT JOIN Products -- Categories is on the LEFT

-- how do I know that it's really a *NULL record*
-- and not just NULLs in the fields? which might be ok
-- because Products.ProductID is NULL, which can't happen for a PK

-- so, let's limit to just that record
SELECT         Categories.CategoryName,
               Categories.CategoryID,
               Products.CategoryID,
               Products.ProductID,
               Products.ProductName
FROM           Categories
   LEFT JOIN   Products ON Categories.CategoryID = Products.CategoryID
WHERE          Products.ProductID IS NULL
ORDER BY       Categories.CategoryName

-- and paring it back to the simple form
SELECT         Categories.CategoryName
FROM           Categories
   LEFT JOIN   Products ON Categories.CategoryID = Products.CategoryID
WHERE          Products.ProductID IS NULL
ORDER BY       Categories.CategoryName


-- Database by Doug
-- Douglas Kline
-- 10/19/2018
--
-- Finding unmatched records
-- LEFT JOIN... WHERE IS NULL



Time zones and the DATETIMEOFFSET data type in SQL

-- Database by Doug
-- Douglas Kline
-- 10/18/2018
-- Time Zones and DateTimeOffset

-- how to use time zones and convert from one to another

-- first, how to think about date/time/datetime in SQL
-- suppose I do this:

SELECT GETDATE() AS [now, somewhere]

-- the data type that is returned is a datetime
-- I can show this:
SELECT SQL_VARIANT_PROPERTY(GETDATE(), 'BaseType')

-- think of the datetime data type as a 
-- multi-part object that holds each 
-- part as a separate value

-- for example:
SELECT   GETDATE(),
         DATEPART(YEAR, GETDATE())        AS [year],
         DATEPART(MONTH, GETDATE())       AS [month],
         DATEPART(DAY, GETDATE())         AS [day],
         DATEPART(HOUR, GETDATE())        AS [hour],
         DATEPART(SECOND, GETDATE())      AS [second],
         DATEPART(MILLISECOND, GETDATE()) AS [millisecond]

-- note that all the parts are whole numbers
-- DATEPART() always returns whole numbers - no decimals

-- and when we manipulate datetimes, we use whole numbers:
SELECT   GETDATE()                     AS [now],
         DATEADD(DAY, 2, GETDATE())    AS [2 days from now],
         DATEADD(DAY, 2.5, GETDATE())  AS [2.5 days from now?],
         DATEADD(HOUR, 12, 
            DATEADD(DAY, 2, GETDATE()))AS [2.5 days from now]

-- look at that last entry again
            
-- notice that 2.5 was truncated to an integer
-- there is no such thing as a fractional part of a datetime

-- regardless of how SQL Server stores the date/time/datetime
-- under the covers
-- it treats each part as a separate integer value

-- so what about time zones?
-- since datetime (and datetime2) doesn't have a time zone part
-- we need a new data type: datetimeoffset

-- you can the server datetime with time zone like this:
SELECT SYSDATETIMEOFFSET(), 
       SQL_VARIANT_PROPERTY(SYSDATETIMEOFFSET(), 'BaseType')

-- my current database server happens to be
-- in the eastern time zone of the US, which is -04:00 UTC

-- note that the datetimeoffset data type
-- has everything that datetime does
-- but additionally, it has another part, the time zone

-- so how would I show the actual east coast time?

-- you might think that the function TODATETIMEOFFSET
-- would do that for you, but all it really does is
-- add the time zone to a datetime

SELECT TODATETIMEOFFSET(GETDATE(),'-04:00') AS [time on east coast?],
       SYSDATETIMEOFFSET()                  AS [time on east coast],
       CAST(GETDATE() AS DATETIMEOFFSET)    AS [time on east coast?] 

-- note that the hours are the same
-- it didn't really move time zones for you
-- it just added the time zone information you gave it

-- to actually adjust the hours, you need to
-- actually add/subtract the hours

SELECT   GETDATE()                        AS [implied time zone],
         TODATETIMEOFFSET
         (
            GETDATE(), 
            '-00:00'
         )                                AS [explicit UTC +00:00 time zone], 
         DATEADD(hour, -4, GETDATE())     AS [east coast time, implied time zone],
         TODATETIMEOFFSET
         (
            DATEADD(hour, -4, GETDATE()),
            '-04:00'
         )                                AS [east coast with time zone]

SELECT GETDATE(), SYSDATETIMEOFFSET()

-- in summary
-- DATETIMEOFFSET is a DATETIME with additional information: the time zone
-- TODATETIMEOFFSET converts from DATETIME to DATETIMEOFFSET
-- but doesn't add or subtract time
-- to take a datetime from one time zone, and show it in another
-- timezone, you have to add/subtract the hours yourself
-- and set the correct time zone in a DATETIMEOFFSET

-- Database by Doug
-- Douglas Kline
-- 10/18/2018
-- Time Zones and DateTimeOffset

Using ANY and ALL in SQL

-- Database by Doug
-- Douglas Kline
-- 10/19/2018
-- ANY and ALL

-- comparing a value to a column of values
-- (aka acomparing a scalar to a vector of scalars)

-- consider this setup for testing
-- if the WHERE condition is true or false

SELECT 1       AS [isTrue?]
WHERE  10 = 10 -- testing if this is true or false

SELECT 1       AS [isTrue?]
WHERE  10 <> 10 -- testing if this is true or false

-- we get a record if it is true, otherwise no record

-- also, see how I can create a literal table

SELECT   tempField
FROM     (VALUES(11),(12),(7)) tempTable(tempField)

-- note that this creates a single column of values
-- which could be used in something like IN
-- for example
SELECT   1
WHERE    12 IN    (  SELECT   tempField
                     FROM     (VALUES(11),(12),(7)) tempTable(tempField))

-- I could rephrase this as:
SELECT   1
WHERE    12 = ANY (  SELECT   tempField
                     FROM     (VALUES(11),(12),(7)) tempTable(tempField))

-- back to the first example:
SELECT 1          AS [isTrue?]
WHERE  10 > 11    

-- I'm comparing a single scalar value, 10, with another
-- single scalar value, 11

-- now consider if I want to compare the value 10
-- to multiple other values
SELECT 1 
WHERE  10 > 11
  OR   10 > 12
  OR   10 > 7

-- this is three logical expressions
-- combined with OR into the whole logical expression
-- so if ANY of them are true
-- the whole logical expression is true 

-- now let's say that the values 11, 12, 7 are in a column
-- since we're in an RDBMS, they are *likely* to be in a column

-- I can rephrase this with ANY like this:

SELECT 1
WHERE  10 > ANY (SELECT tempfield 
                 FROM (VALUES
                           (11),
                           (12),
                           (7)
                       ) AS tempTable(tempfield))

-- this is exactly equivalent to:

SELECT 1 
WHERE  10 > 11
  OR   10 > 12
  OR   10 > 7

-- back to the first example again...
-- back to the first example:
SELECT 1          AS [isTrue?]
WHERE  10 > 11    

-- let's say I want to do multiple comparisons again
-- but AND them together like this:

SELECT 1 
WHERE  10 > 11    -- logical expression 1
  AND  10 > 12    -- logical expression 2
  AND  10 > 7     -- logical expression 3

-- this is three logical expressions
-- combined with ANDs into the whole WHERE
-- logical expression
-- so all three expressions must be true
-- for the WHERE clause to be true

-- I can make it true with different values:
SELECT 1 
WHERE  10 > 9    -- logical expression 1
  AND  10 > 8    -- logical expression 2
  AND  10 > 7    -- logical expression 3

-- I can rephrase this with ALL
-- with the values in a column

SELECT 1
WHERE  10 > ALL (SELECT tempfield 
                 FROM (VALUES
                           (11),
                           (12),
                           (7)
                       ) AS tempTable(tempfield))
 
 -- and get it to be true with a change to the values
 SELECT 1
WHERE  10 > ALL (SELECT tempfield 
                 FROM (VALUES
                           (9),
                           (8),
                           (7)
                       ) AS tempTable(tempfield))

-- so you can think of ANY and ALL 
-- as comparison operator modifiers

-- comparison operators usually take a scalar value on each side

-- scalar  scalar

--  3 < 7
--  3 > 7
--  3 = 7
--  3 != 7
--  3 <> 7
--  3 <= 7
--  3 >= 7

-- ANY and ALL take a scalar on the left, and a column on the right

-- scalar  ANY column
-- scalar  ALL column
-- 3 < ANY ((1),(2),(3))
-- 3 >= ALL ((1),(2),(3))

-- and the column is generally created with a SELECT statement

-- scalar  ANY (SELECT  FROM...)
-- scalar  ALL (SELECT  FROM...)

-- so here's a more concrete example using the Northwind database Orders table

-- suppose we want to know a list of customers who paid more than
-- $200 on freight on an order in 1996?
-- in other words $200 < ANY(orders in 1996)

SELECT   CompanyName
FROM     Customers
WHERE    $200 < ANY (SELECT freight
                     FROM    Orders
                     WHERE   Orders.CustomerID = Customers.CustomerID
                        AND  YEAR(Orderdate) = 1996)
ORDER BY CompanyName

-- notice that this is a correlated subquery
-- it refers to the outer query (Customers.customerID)

-- also notice that this could be rephrased
-- with a JOIN DISTINCT

SELECT   DISTINCT Customers.CompanyName
FROM     Customers
   JOIN  Orders      ON Customers.CustomerID = Orders.CustomerID
WHERE    Orders.Freight > $200
   AND   YEAR(Orders.OrderDate) = 1996
ORDER BY CompanyName

-- is there a difference? why one and not the other
-- it depends

-- depending on your situation, one SQL phrasing might 
-- be clearer than another

-- depending on your data model, volume, statistics, indexes, etc.
-- one might be faster than the other
-- in other words, you might get different query plans

-- in general, I recommend writing your SQL in the clearest
-- manner possible
-- then carefully rephrasing to a better performing, but
-- perhaps less understandable, form if there is a performance issue

-- in summary,
-- ANY and ALL modify comparison operators
-- they succinctly compare a single scalar to
-- a column of scalars

-- thanks for watching!

-- Database by Doug
-- Douglas Kline
-- 10/19/2018
-- ANY and ALL


Monday, October 15, 2018

Using the CAST() function in SQL

-- Database by Doug
-- Douglas Kline
-- 10/10/2018
-- CAST - converting to a new data type

-- sometimes you need to change one data type to another data type

-- consider this:

SELECT '4.0'

-- I might want to deal with this as a number 
-- for example

SELECT '4.0' + 2.0

-- this works
-- even though they '4.0' is a varchar 
-- and 2.0 is a floating point number

-- the db does an *implicit* conversion
-- of the '4.0' to a floating point
-- then does the addition
-- and returns a floating point

-- we hope the db "knows what I mean"
-- and are *assuming* it ends up doing the right thing

-- for simple things, this mostly works
-- the db is pretty smart

-- however, high quality code doesn't normally
-- make assumptions, so let's be *explicit*

SELECT CAST('4.0' AS float) + 2.0

-- here, I'm using the CAST function to
-- *explicitly* change a varchar to a float
-- I'm not relying on the db "knowing what I mean"

-- you can generally CAST between data types
-- fairly freely

-- see the full matrix of allowable conversions
-- for SQL Server 17
-- here: https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-type-conversion-database-engine?view=sql-server-2017

-- note that some to/from conversions are implicit / automatic
-- some are not allowed at all 
-- and some require explicit CASTs

-- here are a few common conversions you might want to do


-- converting numeric and dates to varchars
-- especially when needing to concatenat

SELECT 'Doug' + 1 AS [trying for Doug1]
SELECT 1 + 'Doug' AS [trying for 1Doug]
-- and the fixes
SELECT 'Doug' + CAST(1 AS VARCHAR)
SELECT CAST(1 AS VARCHAR) + 'Doug'

SELECT GETDATE() + 'Doug' -- error
SELECT CAST(GETDATE() AS VARCHAR) + 'Doug'

-- forcing specific types of operations

-- consider this

SELECT 3 / 2

-- notice that I get *integer* division
-- because both operands are integers

-- but what if I want to see 1.5 as the result?
-- I can fix literals easy enough
SELECT 3.0 / 2.0

-- but what about this:
SELECT UnitsInStock / unitsonorder
FROM   Products
WHERE  unitsOnorder <> 0

-- I'm getting integer division

-- here's how to get floating point division:

SELECT CAST(UnitsInStock AS FLOAT) / CAST(unitsonorder AS FLOAT)
FROM   Products
WHERE  unitsOnorder <> 0

-- another common conversion is from varchar to date/time

-- see the differences here:
SELECT '4/8/2018'
SELECT CAST('4/8/2018' AS DATE)
SELECT CAST('4/8/2018' AS DATETIME2)

-- and here:
SELECT '20180408 11:00'
SELECT CAST('20180408 11:00' AS DATE)
SELECT CAST('20180408 11:00' AS DATETIME2)

-- summary
-- for 'throw-away' code, the db will implicitly convert
-- for high quality code in a system
-- you should not rely on implicit conversions
-- instead, use CAST

-- Database by Doug
-- Douglas Kline
-- 10/10/2018
-- CAST - converting to a new data type

Wednesday, October 10, 2018

Using DISTINCT in SQL

-- Database by Doug
-- Douglas Kline
-- 10/9/2018
-- DISTINCT - removing duplicates, but in a "dumb" way

-- consider this

SELECT ProductID,
       ProductName,
       SupplierID
FROM   Products

-- this is a list of all products
-- note the repeats in the SupplierID

-- suppose I want top know the list of suppliers
-- in the Products table

SELECT   SupplierID
FROM     Products

-- again, notice the repeats
-- to remove the repeats, I can do this:

SELECT   DISTINCT SupplierID
FROM     Products

-- you might be saying, why not just do this:
SELECT   SupplierID
FROM     Suppliers

-- my response is: that's a different list
-- it's the list of all suppliers in the supplier table

-- What I'm looking for is the list of all suppliers
-- in the products table
-- in other words, all suppliers which we actually *use*

-- Let's look at another example, with a table I've created

SELECT   firstName
FROM     Person
ORDER BY firstName 

-- this is a list of all people's first names
-- as we scroll down, we'll start to see repeats
-- in other words, multiple people have the same first name

-- now let's say we want a list of all first distinct firstnames

SELECT   DISTINCT firstName
FROM     Person
ORDER BY firstName 

-- so now there isn't a record returned for every Person record
-- there's a record returned for every unique firstname
-- also note that NULL is considered to be a unique firstname

-- you might say, why not use Group By to do this?
-- like this:

SELECT   firstName
FROM     Person
GROUP BY firstName
ORDER BY firstName

-- logically, it returns the exact same records, and always will
-- however, GROUP BY does a lot more work
-- it actually sets up groups of records in preparation to 
-- calculate aggregates like SUM, COUNT, AVG, etc.

-- DISTINCT is much faster 
-- if it sees a value it has seen before, it just throws it out
-- in other words, it doesn't group the records
-- it just makes a list of unique values

-- so, don't use GROUP BY when what you really need is DISTINCT

-- alright, what if you want to count stuff?
SELECT   COUNT(ID)                  AS [# of people],
         COUNT(DISTINCT ID)         AS [# of distinct primary key values],
         COUNT(firstName)           AS [# of people with non-NULL firstnames],
         COUNT(DISTINCT firstName)  AS [# of distinct firstNames]
FROM     Person


-- note that the first two values are always the same, 
-- since primary key values are distinct aka unique

-- and also notice that there are 599 distinct first names, but recall

SELECT   DISTINCT firstName
FROM     Person
ORDER BY firstName 

-- so why is the COUNT(DISTINCT firstname) = 599
-- but DISTINCT firstname gives 600 records?

-- remember that COUNT counts non-NULL values

-- finally, DISTINCT is sort of "dumb", in that it doesn't
-- know anything about primary keys
-- or anything about the underlying table(s)
-- it only considers values from the fields you provide

-- consider this:

SELECT   DISTINCT firstname
FROM     Person

-- it doesn't give distinct Person records, just distinct firstnames
-- now this:

SELECT   DISTINCT lastname
FROM     Person

-- and the distinct applies to the *combination* 
-- of all the fields in the SELECT clause
-- in this example,
-- all distinct *combinations* of gender and firstname are shown

SELECT   DISTINCT gender,
         firstname
FROM     Person
ORDER BY gender,
         firstname

-- in summary,
-- distinct removes duplicates
-- it removes duplicates based on all fields in the SELECT list
-- when used with COUNT, it will not count duplicate values

-- thanks for watching!

-- Database by Doug
-- Douglas Kline
-- 10/9/2018
-- DISTINCT - removing duplicates, but in a "dumb" way



Thursday, October 4, 2018

Numeric Expressions in SQL


-- Database by Doug
-- Douglas Kline
-- 8/28/2018
-- Numeric Expressions - expressions that return a numeric data type

-- beginner level
-- simple numeric expression examples
-- and things to keep in mind

-- I might say 'db'
-- which is shorthand for 'database engine'
-- this is the server software that interprets
--   your SQL and performs actions

-- you can ask your database engine (db) to be your calculator:
SELECT   1 + 2 

-- if you are familar with spreadsheets
-- and how to type formulas
-- it is very similar

SELECT   1 + 2 

-- an expression is something that the db
-- will read, translate, and perform
-- so in the expression above
-- it reads a 1, then a plus sign, then a 2
-- interprets it as addition
-- performs the addition
-- and returns the results
--------------------------------------------------
SELECT   1 + 2

-- there are certain items in the expression

-- the plus sign (+) is an *operator*
-- more specifically, it is a *binary* operator
-- it defines an operation on two items (thus binary)
-- more specifically, the items on its left and right

-- the plus sign says to the db:
-- take the items on my left and right and add them
-- it is an instruction to the db

SELECT   1 + 2

-- the 1 and 2 are the *operands* of the operator
-- they are the items that the operator applies to

-- the 1 and 2 are also examples of *literals*
-- in other words 1 literally means the number 1
-- in coding, it means: not code, a value

-- here is an example of a unary operator (one operand)
SELECT  -5

-- here, the literal value 5
-- is modified to become negative
-- by the minus sign operator

------------------------------------------------------------
-- how does the db know what *is* an expression, and what *isn't*
-- in this example
SELECT   1 + 2

-- the db is *expecting* an expression to be in this position
-- (after the word SELECT)
-- so expressions are mainly known by their
-- location/position in your SELECT statement

---------------------------------------------------------
-- notice that usual order of operations is followed
-- * and / come before + and -
-- otherwise it is left-to-right,
-- unless you add parentheses
-- then sub-expressions in parentheses are evaluated first
SELECT 1 + 2 * 6
SELECT (1 + 2) * 6
SELECT (3 - 1) / 2

------------------------------------------------------
-- notice that operators sometimes change meaning
-- depending on their context

SELECT 1 / 3      -- does integer division
SELECT 1.0 / 3.0  -- floating point division
SELECT 1 / 3.0    -- floating point division
SELECT 1.0 / 3    -- floating point division

-- if both operands (items on either side of the operator)
-- are whole numbers, do integer division
-- otherwise do floating point division

-----------------------------------------------------------------------------
-- here's an operator that might be new to you: %
SELECT 5 % 3 -- modulus operator - gives the remainder after integer division
SELECT 8 % 3 -- same answer

SELECT (8 / 3) * 3 + 8 % 3 -- equals 8

-----------------------------------------------------------------------------
-- what if your data is not a literal?
-- in other words, it comes out of the db

SELECT   unitsInstock -- this is an expression, just very simple
FROM     Products

SELECT   unitsInStock + unitsOnOrder -- effective inventory
FROM     Products

-- plus is still the operator
-- the operands are UnitsInStock and unitsOnOrder
-- the expression is evaluated on each record, separately

-----------------------------------------------------
-- but what if the data type is not right for
--   the operations I want to do?
-- specifically, the data is whole numbers
-- but I want to do floating point division
SELECT   unitsInStock,
         reOrderLevel,
         unitsinstock / reOrderLevel
FROM     Products
WHERE    reOrderLevel <> 0  -- to avoid division by zero

-- but I'd really like to see the results as a floating point/decimal

-- you have to convert *prior* to dividing

SELECT   unitsInStock,
         reOrderLevel,
         CONVERT(real,unitsinstock) / reOrderLevel
FROM     Products
WHERE    reOrderLevel <> 0  -- to avoid division by zero

-- the unitsInStock is converted to a real (aka floating point number)
-- before it gets divided
-- since one of the items operands
-- is a floating point number, it does floating point division

-------------------------------------------------------------
-- perhaps a more realistic and useful expression
SELECT      ProductID,
            ProductName,
            unitsInStock * unitprice AS [Dollar Value in Inventory]
FROM        Products
ORDER BY    ProductID

-- note that unitprice is a money data type,
-- and the unitsInstock is a whole number
-- so the result is shown as a money data type
-- (note the 2 digits to the right of the decimal)

-- we can show this explicitly:
SELECT      SQL_VARIANT_PROPERTY(unitPrice, 'BaseType'),
            SQL_VARIANT_PROPERTY(unitsInStock, 'BaseType'),
            SQL_VARIANT_PROPERTY(unitsInStock * unitPrice, 'BaseType')
FROM        Products

-----------------------------------------------------------------------
-- in summary
-- expressions are instructions for the db to create new values
-- expressions are known by their location in the statement
-- expressions have operators and operands

-- Database by Doug
-- Douglas Kline
-- 8/28/2018
-- Numeric Expressions

String Expressions in SQL

-- Database by Doug
-- Douglas Kline
-- 8/28/2018
-- Character Expressions - expressions that return character data

-- beginner level
-- might want to check out numeric expressions
------------------------------------------------------------------
-- consider this statement

SELECT   'Fred',
         'Flintstone'

-- note that the single quotes are not shown in
-- the table results
-- this is because that are not *part of* the data
-- they are used to mark the beginning and end of the data
-- and are called *delimiters*

-- so the statement has two pieces of data
-- delimited by single quotes

SELECT   'Fred',
         'Flintstone'

-- the *type* of data is *character* data
-- more specifically, a *sequence of characters*

-- in programming, they would be called *strings*

-- another term might be *string literals*

-- now look at this:
SELECT      123   AS [numeric data type],
            '123' AS [character data type]

-- the first column is a number
-- the second column is a sequence of characters
--    first the character '1', then the character '2', ...

-- now consider this

SELECT       1   + 5  AS [numeric],
            '1' + '5' AS [character]
             
-- see how the plus sign means something different
-- depending on what's on either side?

-- with character data, the plus sign means *concatenate*
-- or 'put together'

-- so the basic operator for character expressions
-- is the plus sign

-- we can prove this:
SELECT      SQL_VARIANT_PROPERTY(1 + 5, 'BaseType'),
            SQL_VARIANT_PROPERTY('1' + '5', 'BaseType')

-- here'a another example:
SELECT   'Fred' + ' ' + 'Flintstone' AS [Full Name]

-------------------------------------------------------------
-- there's lots more we want to do with
-- character data
-- but we need functions....
-- here are a few simple ones:

SELECT LEN  (' Fred'),     -- find how many chars - note the spaces
       UPPER(' Fred'),     -- convert all chars to upper case
       LOWER(' Fred'),     -- convert all chars to lower case
       LEFT (' Fred', 2),  -- return just the left-most 2 chars
       RIGHT(' Fred', 2),  -- return just the right-most 2 chars
       LTRIM(' Fred')      -- get rid of any space chars on left

-- you might also want to find the location of a certain character

SELECT   CHARINDEX(' ', 'Fred Flintstone') AS [location of space character] 

-- and also, you can combine these functions into a more complex expression

SELECT   'Fred Flintstone'       AS [original full name],
         RIGHT('Fred Flintstone',                    -- right-most 10 characters
                 LEN('Fred Flintstone')              -- the number 15
                 - CHARINDEX(' ', 'Fred Flintstone') -- the number 5
              )                  AS [last name only]

---------------------------------------------------------------------------------
-- what about data from a table - not literals

SELECT   ProductName,
         LEN(ProductName) AS [length of name],
         UPPER(ProductName) AS [uppercase],
         CHARINDEX(' ', ProductName),
         RIGHT(ProductName, LEN(ProductName) - CHARINDEX(' ', ProductName))
FROM     Products

---------------------------------------------------------------------------------
-- in summary
-- string literals are delimited by single quotes
-- plus sign means concatenate
-- functions help with other tasks

-- Database by Doug
-- Douglas Kline
-- 8/28/2018
-- Character Expressions


Primary Keys in Relational Databases

-- Database by Doug
-- Douglas Kline
-- 9/28/2018
-- Primary Keys

USE Northwind
-- note that I'm starting out with a 
-- modified version of the Northwind.Categories table
-- called DBBDCategories
-- for demo purposes

-- Primary Keys
-- a primary key has two purposes: 
---- to guarantee access to every record in a table
---- to uniquely distinguish a record from all other records in the table

-- this is done by requiring a primary key to be:
---- NOT NULL
---- UNIQUE across all records in that table

-- the NOT NULL makes sure that *at least one* column is NOT NULL
-- and that there is a WHERE clause that will match on every record
-- for example:
--  WHERE categoryID = 7

-- guarantees one of:
-- categories with an categoryID of 7
-- there is no record with an *unknown* ID that *might be* 7

-- the UNIQUE makes sure that you can make a request of records
-- that guarantees at most one record
-- for example:
--  WHERE categoryID = 7

-- guarantees one of:
-- a *single* record with the category with an categoryID of 7
-- no category - there is definitively no record with an categoryID of 7

--***************************************************
-- tables don't absolutely require a primary key
-- however, guaranteed, unique access is extremely helpful in many situations
-- (particularly Online Transaction Processing Systems /OLTPs)
-- AND, there can be significant performance benefits to having a PK 
-- (for certain operations)

-- Consider my somewhat modified DBBDCategories table from Northwind
SELECT *
FROM    DBBDCategories

-- I happen to know that CategoryID is the primary key of this table

-- but how would you know otherwise?

-- you could look in the object inspector on the left
-- expand the database, the tables, the Categories table, the Columns
-- note the key icon on the left of the CategoryID, and the 'PK' in parentheses
-- (demo)

-- another way would be to put the table into a diagram
-- note the yellow key icon next to the CategoryID
-- (demo)

-- Finally, a really definitive way would be to script out the table
-- and see the full definition of the table in SQL
-- (demo)

-- here is a slightly abbreviated/edited version of 
-- the SQL CREATE statment for the DBBDCategories table:

--CREATE TABLE [dbo].[DBBDCategories]
--(
-- [CategoryID] [int] NOT NULL,
-- [CategoryName] [nvarchar](15) NOT NULL,
-- [Description] [ntext] NULL,
-- [Picture] [image] NULL,
-- CONSTRAINT [PK_DBBDCategories] PRIMARY KEY CLUSTERED 
--   (
--    [CategoryID] ASC
--   )
--) 

-- note the NOT NULL for CategoryID 
-- this means that every insert and update will be checked to make sure 
-- that the CategoryID does not become NULL
-- let's try it
-- demo insert new NULL PK
-- demo update PK to a NULL

-- note the PRIMARY KEY specified as a CONSTRAINT
-- this means that every insert and update will be checked to make sure
-- that no two CategoryIDs, in the WHOLE TABLE, get duplicated
-- let's try it
-- demo insert duplicate PK
-- demo update PK to a duplicate value


-- let's review
-- we are guaranteed that no record in the categories table has an unknown CategoryID
-- we are guaranteed that no two records in the categories table have the same CategoryID

-- in other words
-- WHERE CategoryID = 8
-- guarantees to return either no record, or one record
-- and we are guaranteed that there is no record
-- with an *unknown* CategoryID which is actually 8

-- so what does a categoryID of 8 actually mean?
--....?
--....?





-- NOTHING!!

-- primary keys don't *need* to mean anything
-- they just have to be UNIQUE and NOT NULL

-- now let's look at the definition of 
-- the *actual* Categories table from the Northwind DB

--CREATE TABLE [dbo].[Categories]
--(
-- [CategoryID] [int] IDENTITY(1,1) NOT NULL,  -- ** note IDENTITY(1,1)
-- [CategoryName] [nvarchar](15) NOT NULL,
-- [Description] [ntext] NULL,
-- [Picture] [image] NULL,
-- CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED -- ** note PRIMARY KEY
-- (
--  [CategoryID] ASC
-- )
--) ON [PRIMARY] 

-- note the PRIMARY KEY in place of UNIQUE
-- these are interchangeable, they mean the same thing

-- however, also note the IDENTITY(1,1)

-- this is an example of defining a *synthetic* primary key
-- we are relinquishing to the DB engine,
-- the task of providing  PK values for this column

-- specifically, the first PK value should be 1
-- the second PK value should be 1 + 1
-- the third PK value should be  2 + 1
-- the fourth PK value should be 3 + 1
-- in other words, begin with 1, and increment by 1 each time

-- this also means that NO ONE ELSE, besides the DB engine
-- can change the value of Categories.CategoryID

-- (demo) try to change a PK value
-- (demo) try to insert a PK value

-- also, IDENTITY values are never re-used
-- this guarantees uniqueness across the live table and all archives
-- for example, if we archive CategoryID = 7 (move to another place)
-- then end up reusing CategoryID = 7 for another record
-- those two records can be confused for each other

 -- (demo) create a new record, delete it, show next PK value, skipped

 -- why is this good?
 -- HUMANS might accidentally re-use archived PK values
 -- also, columns that *mean* something may need to be changed
 -- it does exactly what is needed, and no more
 -- internal to the db engine, this can significantly improve performance

-- in summary,
-- primary keys need to be UNIQUE and NOT NULL, and nothing more
-- this guarantees unique access to each record
-- *synthetic* keys can be helpful in keeping all key values UNIQUE

-- this is especially helpful in OLTP type systems

-- foreshadowing ...
-- primary keys also are referenced by FOREIGN KEYS
-- see the video on FOREIGN KEYS!

-- thanks for watching!

-- Database by Doug
-- Douglas Kline
-- 9/28/2018
-- Primary Keys

Foreign Keys in Relational Databases


-- Database by Doug
-- Douglas Kline
-- 10/1/2018
-- Foreign Keys

USE Northwind
-- Foreign Keys
-- a field in one table (table A)
-- that references a primary key value in another table (table B)

-- for example, the Categories and Products table from Northwind
-- (show diagram)

-- in this case, the foreign key is
-- Products.CategoryID, i.e., the CategoryID field in Products

-- from a practical usage standpoint
-- the foreign key is a "lookup"
-- consider this:

SELECT   ProductID,
         ProductName,
         CategoryID
FROM     Products

SELECT   CategoryID,
         CategoryName
FROM     Categories

-- note how I can use the Products.CategoryID
-- value to find the CategoryName of the right Category

-- note that Primary Keys are UNIQUE and NOT NULL
-- e.g., Products.ProductID, Categories.CategoryID
-- but Foreign Keys can be NOT UNIQUE and NULL
-- e.g., Products.CategoryID

-- in this situation, many products can be
-- in the same category
-- and a product could have an unknown category
-- (products can exist without a category)

-- in a sense, FKs define a "relationship"
-- between two records
-- in this case, each product record is related
-- to a category record
-- (or not, in the case of NULL FK)

-- to be able to "lookup" from one table to another
-- you just need two fields of similar data types

-- however, what if you want to make sure 
-- that the lookups values are always *good*
-- in other words, that values in
-- Products.CategoryID are actually over in
-- the Categories table?

-- this requirement, that FK values are valid
-- is called *referential integrity*

-- note that this is a constraint on
-- the Products.CategoryID field
-- it must be a valid value that
-- exists over in the Categories.CategoryID field

-- let's see it in the definition of Products

CREATE TABLE [dbo].[Products]
(
   [ProductID]    [int] IDENTITY(1,1)  NOT NULL,
 [ProductName]  [nvarchar](40)       NOT NULL,
 [CategoryID]   [int]                NULL
   -- other fields omitted for clarity
 CONSTRAINT [PK_Products] 
   PRIMARY KEY CLUSTERED 
   (
    [ProductID] ASC
   )
 CONSTRAINT [FK_Products_Categories]
   FOREIGN KEY
   (
      [CategoryID]
   )
   REFERENCES [dbo].[Categories] ([CategoryID])
) 

-- note the REFERENCES section
-- note the allowed NULL for CategoryID

-- this means that on any UPDATE or INSERT to Products
-- a Product record must always have a valid CategoryID
-- or a NULL value (unknown)

-- (demo this)

-- it also means that on any UPDATE or DELETE to Categories
-- all Product records must always have a valid CategoryID
-- or a NULL value (unknown)

-- (demo this)

-- If we have referential integrity for all Product records
-- then we can be confident that our JOINs between the two tables
-- provide valid results

SELECT   Products.ProductID,
         Products.ProductName,
         Products.CategoryID,
         Categories.CategoryID,
         Categories.CategoryName
FROM     Products 
   JOIN  Categories ON Products.CategoryID = Categories.CategoryID
ORDER BY Products.ProductID

-- JOINs are a way of doing multiple "lookups"
-- of FK values to PK values
-- but that's another video...

-- in summary,
-- foreign keys provide a "lookup" mechanism
-- for records in two different tables
-- this also defines how the two tables are "related"
-- *referential integrity* enforces "good" foreign keys
-- in other words, a FK should always have a valid value
-- FK values can repeat - be non-UNIQUE
-- FK values can be NULL - unknown

-- thanks for watching!

-- Database by Doug
-- Douglas Kline
-- 10/1/2018
-- Foreign Keys

Tuesday, August 21, 2018

The FROM clause in SQL





-- Database by Doug
-- Douglas Kline
-- 8/9/2018
-- FROM - setting the source of data

USE Northwind

-- might want to review the SELECT statement

/*************************************/

-- we can do SELECT statements like this to create a tabular result

SELECT      'Doug' AS [firstName],
            22     AS [age]

-- the data in this table is not stored anywhere
-- it was created in the SELECT statement
-- and placed in the tabular result
-- in the way we specified

-- but there's data in the database already
-- it's in a table

-- how do we get at it and display it?

-- we can do this

SELECT      ProductID,
            ProductName
FROM        Products      -- the important part!

-- the FROM statement sets the source of the data
-- Products is the name of a table
--   in the Northwind database 
--   (note the USE statement at the top of this code)

-- I also happen to know that there are fields
-- called ProductID and ProductName
-- in the Products table

-- notice what happens when I try to use
-- a column that doesn't exist in the table

SELECT      ProductID,
            ProductName,
            NonsenseField
FROM        Products  

-- intellisense underlines it in red
-- and when I run it, I get an "invalid column name" error

-- what do I do if I don't know what the columns are?

SELECT      *
FROM        Products

-- so what does the asterisk mean?

-- normally you put column names in the SELECT clause
-- when you put a * instead, it means "all columns"

-- **Important: SELECT * means ALL COLUMNS
-- ** it doesn't say anything about rows
-- ** which rows is defined by the WHERE clause
-- ** see the WHERE clause video

---------
-- so, let's type a new SELECT statement
/*
SELECT      ProductID,
            ProductName
FROM        Products  
*/
 -- writing SELECT first:
 -------------------------------
SELECT      ProductID,
            ProductName
FROM        Products
 ---------------------------------
 -- notice how the intellisense isn't very helpful

 -- now writing FROM first:
 -----------------------------------
 SELECT     ProductID,
            ProductName
 FROM       Products
 -----------------------------------
 -- notice how intellisense is helpful
 -- once I put the FROM statement in

-- remember how you can rename a column in the SELECT?
SELECT      ProductID   AS [ID],
            ProductName AS [Name]
FROM        Products 

-- you can also do this for the table names
SELECT      ProductID   AS [ID],
            ProductName AS [Name]
FROM        Products    AS [p]        -- important part

-- this is called an alias 
-- p is an alias for Products
-- you can use the AS statement
-- but the convention among professionals is normally to leave it out
-- like this:
SELECT      ProductID   AS [ID],
            ProductName AS [Name]
FROM        Products       [p]            -- Note no AS

-- note also that the square brackets are not necessary
-- you can do this:
SELECT      ProductID   AS ID,
            ProductName AS Name
FROM        Products       p

-- I like to use them to highlight the aliasing
-- and point out that [p] is a name that was made up
-- in the SELECT statement - it doesn't exist in the table

-- there are times that you MUST alias a table or column name
-- to achieve your objective 
-- (for example: same table in the FROM multiple times - more advanced)

-- it also has implications to "qualifying" a column name
-- here is the same query fully qualified

SELECT      Products.ProductID,
            Products.ProductName
FROM        Products  

-- the *not* fully qualified works
SELECT      ProductID,
            ProductName,
            Products.SupplierID
FROM        Products, Suppliers 

-- but relies on the db engine
-- to look in the Products table for the column names
-- if there are no ambiguities, everything is fine...
-- if the column names are not reserved words, everything is fine...

-- when you have multiple tables in the FROM (see JOIN video),
-- you'll NEED to fully qualify to resolve ambiguities
-- AND to write safe, high quality code

-- sometimes you'll see this:
SELECT      [p].ProductID,
            [p].ProductName
FROM        Products  [p]

-- which works fine
-- the products table is aliased to p
-- which forces me to use [p] to qualify the fields

-- this doesn't work:
SELECT      Products.ProductID,
            Products.ProductName
FROM        Products  [p]

-- once it is renamed, it doesn't exist anymore
-- in the context of this statement

-- also, note that many would write the above statement
-- without the square brackets:
SELECT      p.ProductID,
            p.ProductName
FROM        Products  p

-- this works fine
-- and it will sometimes save quite a bit of typing

-- however, it doesn't scale to many tables well
-- it becomes *very* difficult to understand

-- as your statements get bigger with more tables,
--  I think this is clearer
SELECT      Products.ProductID,
            Products.ProductName
FROM        Products

-- for me, code clarity trumps almost everything
-- I will type more so my code is very readable to others
-- (and myself, a month later!)

-- therefore,
-- I prefer to *avoid* aliasing tables
-- unless it is really necessary: 
-- for example, when there are multiple of the same table in a single

-- in summary,
-- FROM is to state the source table(s)
-- you can alias table names in the FROM
-- you can, and sometimes HAVE TO,
--   *fully qualify* your column names
--   by using this format: 
--   tableName.columnName 

-- thanks for watching!

-- Database by Doug
-- Douglas Kline
-- 8/9/2018
-- FROM - setting the source of data

Monday, July 30, 2018

Tables and their parts





-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Terminology : Tables, Columns, Rows

use Northwind

-- no pre-requisites

-- a very good way to organize data is in a table

-- why? 

-- mainly because humans find tables fairly easy to understand
-- what they should look like, how they should behave, how to look things up, etc.

-- for example, lets say I show somebody this table:

SELECT *
FROM ( VALUES
   ('Doug','2'),
   ('Dan', '3'),
   ('Debbie', '1'),
   ('2','Joe')) AS tbl([Name], [Number])

-- Most people will think that is weird
-- why is there a number in the Name column?

-- we know this innately, without knowing any special theory

-- we like it this way, because it keeps things organized

-- organized data is easy to get value out of

-- so let's go over some terminology, looking at some real table data

SELECT  ProductID,
   ProductName,
   Unitprice,
   UnitsInstock,
   QuantityPerUnit,
   Discontinued
FROM  Products

-- **highlighting entire table
-- the entire table is also sometimes called:
-- an entity, a table, a file, an object

-- entity - usually used by data modelers/architects
-- table - usually used by database analysts
-- file  - usually used by database administrators (especially of old databases, where each table was stored as a file) 
-- object - usually used by software developers (especially object-oriented developers, where an object maps to a table)

-- **highlighting a row
-- one row in a table also has several names:
-- row, record, entity instance, object instance

-- row  - used by most people, in other words its just a part of a table that's horizontal
-- record - usually used by database analysts, this is special terminology for db people
-- entity instance - usually used by data modelers/architects
-- object instance - usually used by software developers

-- **highlighting a column
-- one column in a table has several names:
-- column, attribute, field, property

-- column - used by most people, in other words its just a part of a table that's vertical
-- attribute- or entity attribute, used by data modelers, some value of interest about an entity
-- field - used by database analysts, special terminology for db people
-- property - or object property, used by OO software developers, object properties usually map to table fields

-- ** highlighting a cell
-- one cell, an intersection of a row and column has several names:
-- cell, value, field, field value, attribute, attribute value, property, property value, etc.

-- careful...

-- sometimes attribute means an entire column, e.g., Person table has an attribute named firstName
-- sometimes attribute means a single cell, e.g., Doug's firstName attribute is Doug

-- thanks for watching

-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Terminology : Tables, Columns, Rows

Structuring Data in Tables




-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Good Tables : Structuring your Data

-- here's an odd example
SELECT *
FROM ( VALUES
   ('Doug','2'),
   ('Dan', '3'),
   ('Debbie', '1'),
   ('2','Joe'))  AS tbl([Name], [Number])

-- this probably makes you a little uncomfortable 
-- it might bother some more than others

-- in Relational Databases, like SQL Server (or DB2 or Oracle or PostgreSQL or mySQL or...)
-- we can easily enforce organization on table

-- in this example, you shouldn't be able to put a number in the name column

SELECT  ProductID,
  ProductName,
  UnitPrice,
  UnitsInStock,
  QuantityPerUnit,
  Discontinued
FROM    Products

-- here are some fundamental organization mechanisms built-in to an RDBMS

-- every table must have a name
-- every column must have a name
-- every table is a rectangle 
-- -- every column has the same number of rows
-- -- every row has the same number of columns
-- -- there are never an "ragged" rows or columns
-- every table has at least one column
-- (but tables can have zero rows - an empty table)

-- just those simple things immediately keep the data organized

-- you don't have to do anything else
-- you don't HAVE to organize your data any further
-- the database won't AUTOMATICALLY organize your data any further

-- as an example,
-- you don't HAVE TO name your columns in a meaningful way
-- for example, we could do this:
SELECT  ProductID  AS [A],
  ProductName  AS [B],
  UnitPrice  AS [C],
  UnitsInStock AS [D],
  QuantityPerUnit AS [E],
  Discontinued AS [F]
FROM    Products

-- of course it goes without saying, but I need to say it
-- a clear, descriptive, concise name is critical
-- for your own sanity and for the sanity of those that come after...

-- there are very good benefits to keeping your data MORE organized than the bar minimum
SELECT  ProductID,
  ProductName,
  UnitPrice,
  UnitsInStock,
  QuantityPerUnit,
  Discontinued
FROM    Products
-- if we want to go further, here are the type of things you CAN enforce

-- a value in a cell should be filled in (NOT NULL), or can be left unknown
-- a value in a cell should be a specified data type (numeric, alphabetic, date, etc.)
-- a value in a cell should be in a certain range (more than zero, not include special characters, etc.)
-- values in a column must be unique (no duplicates across rows)
-- a value in a cell should be from a list somewhere (predefined or dynamic)
-- other things

-- adding these constraints keeps your data MORE organized, and easier/quicker/cheaper to turn into valuable information

-- Here are some more subtle ways to organize your data
-- these are critically important
-- and sometimes more theoretically presented as "normalizing" your data

-- ** Every record should have a unique identifier
-- this is called "entity integrity" in database theory
-- it enables us to uniquely find each record individually
-- for example, "the record with productID=7" 
-- at most, there is one record with 7 as the productID
-- if two records have that productID, I can't distinguish them

-- ** the meaning of a column should not change across rows
-- in other words, values in the ProductName column are always ProductNames
-- there are never any "dual-purpose" or "multi-purpose" columns
-- never "sometimes it means this, other times it means something else"

-- ** values in a row always pertain to that row
-- in other words, every value in the ProductID=1 record is about that product
-- another way, we never store anything in a record that is not specifically about that record
-- an example violation: storing the name of a supplier in a product record
-- the name of a supplier is an attribute of a Supplier, 
-- and should be stored in a Supplier table in a column named SupplierName

-- ** only one value in a cell
-- don't have a list of values with commas between them
-- don't have "Douglas M Kline" in a single column, 
-- rather have "Douglas", "M", and "Kline" in separate columns

-- In Summary

-- well-structured data has
-- good names
-- data types chosen well
-- valid values enforced
-- a unique, non-NULL identifier
-- each column has a single meaning
-- each row represents a single thing
-- each cell has a single value

-- thanks for watching

-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Good Tables : Structuring your Data


Followers