Friday, July 8, 2016

Simple indexing and what to look at for database performance

Simple Indexing and what to look at for database performance


Here is the SQL that goes with the demo.

-- Database by Doug
-- Douglas Kline
-- 7/8/2016
-- simple indexing - an introduction

-- the example that's in every book
-- but more about why it works, and what to look at

USE ReferenceDB

SELECT *
FROM   Person


/****** Object:  Table [dbo].[Person]    Script Date: 7/8/2016 10:55:36 AM ******/
--CREATE TABLE [dbo].[Person](
-- [ID] [int] IDENTITY(1,1) NOT NULL,
-- [salutory] [nvarchar](15) NULL,
-- [firstName] [nvarchar](25) NULL,
-- [middleInitial] [nchar](1) NULL,
-- [lastName] [nvarchar](25) NOT NULL,
-- [nameSuffix] [nvarchar](8) NULL,
-- [email] [nvarchar](255) NULL,
-- [entryDate] [datetime] NOT NULL CONSTRAINT [DF_Person_entryDate]  DEFAULT (getdate()),
-- [lastUpdateDate] [datetime] NOT NULL CONSTRAINT [DF_Person_lastUpdateDate]  DEFAULT (getdate()),
-- [weight] [float] NULL,
-- [gender] [char](1) NULL,
-- [dateOfBirth] [date] NULL,
-- [bogusGUID] [uniqueidentifier] ROWGUIDCOL  NULL CONSTRAINT [DF_Person_bogusGUID]  DEFAULT (newid()),
-- [bogusGUID2] [uniqueidentifier] NULL CONSTRAINT [DF_Person_bogusGUID2]  DEFAULT (newid()),
-- [bogusChar] [char](100) NOT NULL CONSTRAINT [DF_Person_bogusChar]  DEFAULT ('fred'),
-- [bogusNChar] [nchar](100) NOT NULL CONSTRAINT [DF_Person_bogusNChar]  DEFAULT (N'george'),
-- CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED 
--(
-- [ID] ASC
--)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 60) ON [PRIMARY]
--) ON [PRIMARY]

-- typical hard disk seek time:           6.43 ms
-- typical RAM seek time (CAS latency):  14.00 ns

-- note the ms and ns difference
-- in the same units
-- typical hard disk seek time:        6430.00 ms
-- typical RAM seek time (CAS latency):  14.00 ns

-- so RAM is roughly 500 times faster than a spinning hard drive
-- but it is also a lot more expensive, and volatile

-- Databases are built to leverage this difference
-- use both RAM and disk, each for what its best at

SET STATISTICS IO ON -- see the IO involved in each query

SELECT *
FROM   Person

-- logical reads: 138533
-- physical reads: 0

-- these are "page reads"
-- each page is 8Kb
-- they are the amount of data moved between RAM and disk

-- this means the entire table is in memory

-- but it might not be

-- in a "real" database, you might have 10 TB of data, but 16 GB of RAM
-- you can't keep all data in RAM

-- so think of logical page reads as "gifts"
-- you got lucky, the data you needed happened to be in RAM
-- and therefore came back 500 times faster than it might have

-- the best thing is to reduce the overall number of 
-- page reads, logical or physical

SELECT  firstname
FROM    Person

-- lots faster, but same number of reads
-- this is due to less data coming back over the network

SELECT *
FROM   Person
WHERE  ID = 2417634

-- almost instant
-- page IOs: 3

-- what about his?
SELECT *
FROM   Person
WHERE  lastname = 'kline'

-- page IOs: 138534

-- what is special about ID?

-- there is an index on ID

-- what if I want to make the lastname lookup faster?

-- we could create an index that would make that faster

USE [ReferenceDB]
GO

/****** a really simple create index statement
CREATE NONCLUSTERED INDEX [IX_LastName] ON [dbo].[Person]
(
 [lastName] ASC
)   
GO
 ******/

 -- try again?
SELECT *
FROM   Person
WHERE  lastname = 'kline'

-- page IOs: 393, vs. 138534 without the index

-- that's the perfect scenario for an index
-- it's what is covered in all the basic DB books

-- this is all upside
-- there are significant downsides to indexes
-- particularly regarding INSERTS, UPDATES, and DELETES
-- in future videos

-- books also tend to focus on the data structure performance
-- in other words, balanced trees and the *computational* efficiency of them

-- the real performance benefits are all in reduced disk IO

-- in the above, I'm not measuring performance by CPU
-- I'm measuring performance by disk IOs

-- Why? 

-- because CPU is 1000(?) times faster than RAM

-- for DB, generally worry about performance in this order:

-- network, which is 1000(?) slower than
-- disk,    which is 1000(?) slower than
-- RAM,     which is 1000(?) slower than
-- CPU


-- Database by Doug
-- Douglas Kline
-- 7/8/2016
-- simple indexing

Friday, July 1, 2016

SQL Server's bit data type

SQL Server's bit data type


Here is the SQL that goes with the demo.

-- Database by Doug
-- Douglas Kline
-- 7/1/2016
-- bit data type

-- odd implicit conversion
-- bit data type 
-- character data type

-- saw something odd today, and thought I would cover it

USE Northwind

-- consider this from the Products database

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
ORDER BY ProductID

-- discontinued is a bit data type
-- of course productID is an integer data type
-- which is generally considered a very small integer in its treatment
-- https://msdn.microsoft.com/en-us/library/ms187752.aspx 
-- bits are stored differently internally, but ostensibly behave like integers

-- this fails
UPDATE Products
SET    discontinued = NULL
WHERE  ProductID = 1
-- this is clearly not integer-like

-- so consider these WHERE clauses using the productID field
SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  productID = '1' -- implicit conversion to integer  
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  productID = 'a' -- implicit conversion fails  
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  productID = '12' -- implicit conversion succeeds  
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  productID = '1222222222' -- implicit conversion succeeds 
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  productID = '1.2' -- implicit conversion fails 
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  productID = '-1' -- implicit conversion works, no records 
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  productID = '' -- implicit conversion succeeds, no records 
ORDER BY ProductID

-- but what does it convert to?

SELECT CAST('' AS integer) -- it converts to integer value of zero

-- so what if we are using a bit field
-- and trying to use it like a boolean value
-- but its really treated as a very small integer (2 values)

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = '1' -- implicit conversion to bit  
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = '0' -- implicit conversion  
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = 'a' -- implicit conversion fails 
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = '12' -- implicit conversion with truncation? 
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = '122222222' -- implicit conversion with truncation
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = '022222222' -- implicit conversion with truncation? not really?
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = '0.2' -- implicit conversion fails
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = '-1' -- implicit conversion fails - not like an integer
ORDER BY ProductID

SELECT ProductID,
       ProductName,
       discontinued
FROM   Products
WHERE  discontinued = '' -- implicit conversion works
ORDER BY ProductID

-- so what does '' convert to?

SELECT CAST('' as bit) -- a zero

-- so, integer and bit seems inconsistent with how they treat negatives
-- what about tinyint?

SELECT CAST('-1' AS tinyint) -- gives overflow
SELECT CAST('-1' AS bit) -- gives conversion failure
SELECT CAST(-1 AS tinyint) -- gives overflow
SELECT CAST(-1 AS bit) -- succeeds, gives one

-- ok, that's strange

-- so, it appears that bit is neither a boolean nor an "exact numeric" 
-- in the way it is treated by SQL Server

-- just need to know how it behaves

-- in my experience, it is always *used* like it's a boolean/logical
-- personally, I would prefer it to behave consistently as a boolean/logical

-- Database by Doug
-- Douglas Kline
-- 7/1/2016
-- bit data type

Followers