Thursday, October 8, 2015

What's slow on my database server?

What's slow on my database server?

A video investigating the relative performance of network, disk IO, and CPU on SQL Server.


Here is the SQL that goes with the demo.

-- Database by Doug
-- 10/8/2015
-- What's Slow Demo?

-- for database performance
-- There are 3 basic items to worry about:
-- * IO
-- * Network
-- * CPU

-- I see many people focus on them in this order:
-- 1 - CPU
-- 2 - IO
-- 3 - network

-- this demo should show that you should focus on them in this order:
-- 1 - network
-- 2 - IO
-- 3 - CPU


-- part 1: network vs IO
use referencedb

SET STATISTICS IO ON

SELECT *
FROM   Person

-- about 53 seconds

-- while this runs, here's some basics about the setup:

-- SQL Instance running as a virtual machine, no idea about the storage, general purpose
-- synthetic data with reasonable data distributions
-- 1.3M records
-- average record size is about 456 bytes
-- zero fragmentation
-- total table size about 587MB
-- some bogus fields for experimentation
-- no non-clustered indexes

--CREATE TABLE [dbo].[Person](
-- [ID] [int] IDENTITY(1,1) NOT NULL,
-- [salutory] [nvarchar](15) NULL,
-- [firstName] [nvarchar](25) NULL,
-- [middleInitial] [nchar](1) NULL,
-- [lastName] [nvarchar](25) NOT NULL,
-- [nameSuffix] [nvarchar](8) NULL,
-- [email] [nvarchar](255) NULL,
-- [entryDate] [datetime] NOT NULL CONSTRAINT [DF_Person_entryDate]  DEFAULT (getdate()),
-- [lastUpdateDate] [datetime] NOT NULL CONSTRAINT [DF_Person_lastUpdateDate]  DEFAULT (getdate()),
-- [weight] [float] NULL,
-- [gender] [char](1) NULL,
-- [dateOfBirth] [date] NULL,
-- [bogusGUID] [uniqueidentifier] ROWGUIDCOL  NULL CONSTRAINT [DF_Person_bogusGUID]  DEFAULT (newid()),
-- [bogusGUID2] [uniqueidentifier] NULL CONSTRAINT [DF_Person_bogusGUID2]  DEFAULT (newid()),
-- [bogusChar] [char](100) NOT NULL CONSTRAINT [DF_Person_bogusChar]  DEFAULT ('fred'),
-- [bogusNChar] [nchar](100) NOT NULL CONSTRAINT [DF_Person_bogusNChar]  DEFAULT (N'george'),
-- CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED ( [ID] ASC)
--) ON [PRIMARY]


-- ok, so what about this?
SELECT TOP 1 ID
FROM   Person

-- one record, one value, 4 bytes
-- 0 seconds - the time for a round trip

-- so what is slow?

SELECT ID
FROM   Person
-- 1.3M records, 4 bytes each
-- about 4 seconds

SELECT ID, ID
FROM   Person
-- 1.3M records, 8 bytes each
-- about 5 seconds

SELECT bogusGuid
FROM   Person
-- 1.3M records, 16 bytes each
-- about 5 seconds


SELECT bogusGUID,
       bogusGUID2
FROM   Person
-- 1.3M records, 32 bytes each
-- about 6 seconds

SELECT 75804 * 8/1024

SELECT bogusGUID,
       bogusGUID2,
       '1234567890'
FROM   Person
-- 1.3M records, 42 bytes each
-- about  seconds

SELECT bogusGUID,
       bogusGUID2,
       '1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'
FROM   Person
-- 1.3M records, 132 bytes each
-- about 16 seconds

SELECT bogusGUID,
       bogusGUID2,
       '1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890',
       '1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890'
FROM   Person
-- 1.3M records, 232 bytes each
-- about 27 seconds


-- keep going, and you get a graph that looks like ...

-- but you are cheating - you have all your data in memory...

-- now starve the db server for memory
  /**
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'max server memory', 256;
GO
RECONFIGURE;
GO
**/
SELECT ID
FROM   Person
-- 1.3M records, 4 bytes each
-- about 4 seconds



-- network is slower than IO
-- even with  physical IO

-- **************************************************


 /**
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'max server memory', 2147483647;
GO
RECONFIGURE;
GO
**/

-- part 2: IO vs CPU


SELECT AVG(weight)
FROM   Person
-- 1.3M records, but only 8 bytes back
-- about 0 seconds

-- can we make it worse?
-- can we make the CPU worse?

SELECT weight
FROM   Person
-- 1.3M records, 8 bytes back per record
-- 4 seconds

SELECT MAX(weight)
FROM   Person
-- about 0 seconds

SELECT AVG(LOG(weight/1.7))
FROM     Person
-- 1.3M records, 8 bytes back
-- about 0 seconds

SELECT AVG(SQRT(LOG(weight/1.7)))
FROM     Person
-- about 0 seconds

SELECT AVG(SIN(SQRT(LOG(weight/1.7))))
FROM     Person
-- about 0 seconds

SELECT AVG(SIN(SQRT(10+3.3*LOG(weight/1.7))))
FROM     Person
-- about 1 seconds

SELECT AVG(SIN(SQRT(10+3.3*LOG(weight/1.7)))*PI())
FROM   Person
-- about 1 seconds

-- taking just the average out
SELECT (SIN(SQRT(10+3.3*LOG(weight/1.7)))*PI())
FROM   Person
-- 4 seconds

-- you can do tons of calculations, and not impact speed

-- summary: focus on items in this order
-- 1 - network
-- 2 - IO
-- 3 - CPU

-- caveats:
-- your mileage may vary
-- use your own judgement
-- this is the order in which I do investigations on a new database

-- end of what's slow

Thursday, October 1, 2015

Introduction to JOINs

A short video covering the JOINs within a SQL SELECT statement.

For beginners. How to show columns from more than one table in a single SELECT statement. Covers joining with the WHERE clause and the JOIN statement, and some best practices.


Here is the SQL that goes with the demo.

-- Database by Doug
-- Douglas Kline
-- 10/1/2015
-- Introduction to JOINs

USE Northwind

-- suppose we want to look at our products
-- and what categories they are in

SELECT   *
FROM     Products

-- focus more on just a few columns

SELECT   ProductID,
         ProductName,
         CategoryID
FROM     Products
ORDER BY ProductID

-- so what category is CategoryID 1? 4?

SELECT   *
FROM     Categories

-- focus in on the categoryName

SELECT   CategoryID,
         CategoryName
FROM     Categories

-- great, categoryID=1 is 'Beverages'
-- categoryID=4 is 'Dairy Products'

-- we used the CategoryID field in the Products table
-- to "look up" the name of the Category

-- but we don't want to do that for lots of records

SELECT   ProductID,
         ProductName,
         CategoryID
FROM     Products
ORDER BY ProductID

-- why not just add the CategoryName field to the SELECT clause?

SELECT   ProductID,
         ProductName,
         CategoryID,
         CategoryName
FROM     Products
ORDER BY ProductID

-- CategoryName is not a field in the Products table

-- OK, let's add the Categories table to the FROM statement

SELECT   ProductID,
         ProductName,
         CategoryID,
         CategoryName
FROM     Products,
         Categories
ORDER BY ProductID

-- that got rid of the syntax error on CategoryName
-- but now CategoryID is ambiguous??

-- look at the columns list for each table
-- there's a CategoryID in both the Products table 
-- and the Products table

-- so, we need to be more specific - which one do we mean

-- here's how

SELECT   ProductID,
         ProductName,
         Products.CategoryID,
         CategoryName
FROM     Products,
         Categories
ORDER BY ProductID

-- great, now we don't have any syntax errors

-- but we have 702 rows?
-- where is that coming from?

SELECT COUNT(CategoryID) AS [CategoryCount]
FROM   Categories

-- 9 categories

SELECT COUNT(ProductID) AS [ProductCount]
FROM   Products

-- 78 products

SELECT 9 * 78

-- 702 - just like the number of records

SELECT   ProductID,
         ProductName,
         Products.CategoryID,
         CategoryName
FROM     Products,
         Categories
ORDER BY ProductID

-- let's look closer
-- adding the categoryID from the Categories table

SELECT   ProductID,
         ProductName,
         Products.CategoryID,
         Categories.CategoryID,
         CategoryName
FROM     Products,
         Categories
ORDER BY ProductID

-- that's odd, why is chai being shown next to the wrong category?
-- in fact, it's being shown next to *every* category

-- all products are shown next to every category

-- because we haven't stated how the products and categories should be related
-- the database engine returns all possible combinations of products and categories
-- this is called a CROSS JOIN
-- this is sometime what we want, but not usually

-- so , how do we limit the returned records below 
-- so that only the ones with matching categoryIDs are kept?

-- using the WHERE clause
--

SELECT   ProductID,
         ProductName,
         Products.CategoryID,
         Categories.CategoryID,
         CategoryName
FROM     Products,
         Categories
WHERE    Products.CategoryID = Categories.CategoryID -- ******** 
ORDER BY ProductID

-- this looks a lot better
-- chai has a categoryID = 1, and it is shown next to the category with categoryID = 1

-- but the record count is 77?
-- aren't there 78 products?

-- looking at the products again...

SELECT   ProductID,
         ProductName,
         CategoryID
FROM     Products
ORDER BY ProductID

-- scrolling down, we can see that Dougs Mustard has a NULL categoryID
-- and looking back our combined query

SELECT   ProductID,
         ProductName,
         Products.CategoryID,
         Categories.CategoryID,
         CategoryName
FROM     Products,
         Categories
WHERE    Products.CategoryID = Categories.CategoryID -- ******** 
ORDER BY ProductID

-- scrolling down, note that Dougs Mustard doesn't appear
-- because this Products.CategoryID = Categories.CategoryID
-- is essentially this: NULL = Categories.CategoryID
-- which is false (NULLs in another video)

-- ok, looks like we have the logic right with this:
SELECT   ProductID,
         ProductName,
         Products.CategoryID,
         Categories.CategoryID,
         CategoryName
FROM     Products,
         Categories
WHERE    Products.CategoryID = Categories.CategoryID 
ORDER BY ProductID

-- but humans don't really want to see the CAtegoryID 
-- so we can remove them

SELECT   ProductID,
         ProductName,
         CategoryName
FROM     Products,
         Categories
WHERE    Products.CategoryID = Categories.CategoryID 
ORDER BY ProductID

-- the above is called a JOIN of the Products and Categories table
-- the above is stated as a constraint in the WHERE clause
-- but it is a special kind of a constraint
-- between the Primary Key of one table, and the Foreign Key in another table

-- so one way to look at a JOIN is as a constraint
-- you make all possible combinations of the records from two tables
-- then remove the ones that don't have equal PK--FK combinations

-- another way to view a JOIN is as a specific combination of tables
-- combine the tables in a specific way to create a specific set of records
-- then show fields from the combined set of records

-- this is done in the FROM clause

SELECT   ProductID,
         ProductName,
         CategoryName
FROM     Products JOIN Categories ON Products.CategoryID = Categories.CategoryID 
ORDER BY ProductID

-- this is logically equivalent
-- it will always return the same records as the WHERE clause version

-- I like to format it like this:

SELECT   ProductID,
         ProductName,
         CategoryName
FROM     Products 
   JOIN  Categories     ON Products.CategoryID = Categories.CategoryID 
ORDER BY ProductID

-- I suggest you learn both ways
-- so you can read either way when you encounter it
-- also, sometimes one way is clearer than another in a certain context

-- the JOIN syntax signals your *intention* better than the WHERE clause
-- and generally organizes your statement better 

-- conceptually, it is an binary operator between two tables, 
-- with an ON clause to specify details that the operator will use
-- it is *not* a list of tables (separated by commas),
-- but an expression that states how to combine the tables

-- one last thing
-- safe SQL coding practice
-- right now the statement below works fine
-- it is syntactically correct

SELECT   ProductID,
         ProductName,
         CategoryName
FROM     Products 
   JOIN  Categories     ON Products.CategoryID = Categories.CategoryID 
ORDER BY ProductID

-- but what would happen if in the future
-- someone put a CategoryName field in the Products table?
-- then there would be a CategoryName field in both Products and Categories

-- they *should* be able to do that without breaking code
-- even if it doesn't make sense

-- then categoryName would become ambiguous and the code would not execute

-- so how do we prevent that possible future bug?

-- by *qualifying* every field like this

SELECT   Products.ProductID,
         Products.ProductName,
         Categories.CategoryName
FROM     Products 
   JOIN  Categories     ON Products.CategoryID = Categories.CategoryID 
ORDER BY Products.ProductID

-- safe coding rule:
-- whenever there is more than one table 
-- fully qualify every field in every clause: SELECT, FROM, WHERE, ORDER BY, etc.


-- intro to JOIN...

Followers