Friday, July 8, 2016

Simple indexing and what to look at for database performance

Simple Indexing and what to look at for database performance


Here is the SQL that goes with the demo.

-- Database by Doug
-- Douglas Kline
-- 7/8/2016
-- simple indexing - an introduction

-- the example that's in every book
-- but more about why it works, and what to look at

USE ReferenceDB

SELECT *
FROM   Person


/****** Object:  Table [dbo].[Person]    Script Date: 7/8/2016 10:55:36 AM ******/
--CREATE TABLE [dbo].[Person](
-- [ID] [int] IDENTITY(1,1) NOT NULL,
-- [salutory] [nvarchar](15) NULL,
-- [firstName] [nvarchar](25) NULL,
-- [middleInitial] [nchar](1) NULL,
-- [lastName] [nvarchar](25) NOT NULL,
-- [nameSuffix] [nvarchar](8) NULL,
-- [email] [nvarchar](255) NULL,
-- [entryDate] [datetime] NOT NULL CONSTRAINT [DF_Person_entryDate]  DEFAULT (getdate()),
-- [lastUpdateDate] [datetime] NOT NULL CONSTRAINT [DF_Person_lastUpdateDate]  DEFAULT (getdate()),
-- [weight] [float] NULL,
-- [gender] [char](1) NULL,
-- [dateOfBirth] [date] NULL,
-- [bogusGUID] [uniqueidentifier] ROWGUIDCOL  NULL CONSTRAINT [DF_Person_bogusGUID]  DEFAULT (newid()),
-- [bogusGUID2] [uniqueidentifier] NULL CONSTRAINT [DF_Person_bogusGUID2]  DEFAULT (newid()),
-- [bogusChar] [char](100) NOT NULL CONSTRAINT [DF_Person_bogusChar]  DEFAULT ('fred'),
-- [bogusNChar] [nchar](100) NOT NULL CONSTRAINT [DF_Person_bogusNChar]  DEFAULT (N'george'),
-- CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED 
--(
-- [ID] ASC
--)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 60) ON [PRIMARY]
--) ON [PRIMARY]

-- typical hard disk seek time:           6.43 ms
-- typical RAM seek time (CAS latency):  14.00 ns

-- note the ms and ns difference
-- in the same units
-- typical hard disk seek time:        6430.00 ms
-- typical RAM seek time (CAS latency):  14.00 ns

-- so RAM is roughly 500 times faster than a spinning hard drive
-- but it is also a lot more expensive, and volatile

-- Databases are built to leverage this difference
-- use both RAM and disk, each for what its best at

SET STATISTICS IO ON -- see the IO involved in each query

SELECT *
FROM   Person

-- logical reads: 138533
-- physical reads: 0

-- these are "page reads"
-- each page is 8Kb
-- they are the amount of data moved between RAM and disk

-- this means the entire table is in memory

-- but it might not be

-- in a "real" database, you might have 10 TB of data, but 16 GB of RAM
-- you can't keep all data in RAM

-- so think of logical page reads as "gifts"
-- you got lucky, the data you needed happened to be in RAM
-- and therefore came back 500 times faster than it might have

-- the best thing is to reduce the overall number of 
-- page reads, logical or physical

SELECT  firstname
FROM    Person

-- lots faster, but same number of reads
-- this is due to less data coming back over the network

SELECT *
FROM   Person
WHERE  ID = 2417634

-- almost instant
-- page IOs: 3

-- what about his?
SELECT *
FROM   Person
WHERE  lastname = 'kline'

-- page IOs: 138534

-- what is special about ID?

-- there is an index on ID

-- what if I want to make the lastname lookup faster?

-- we could create an index that would make that faster

USE [ReferenceDB]
GO

/****** a really simple create index statement
CREATE NONCLUSTERED INDEX [IX_LastName] ON [dbo].[Person]
(
 [lastName] ASC
)   
GO
 ******/

 -- try again?
SELECT *
FROM   Person
WHERE  lastname = 'kline'

-- page IOs: 393, vs. 138534 without the index

-- that's the perfect scenario for an index
-- it's what is covered in all the basic DB books

-- this is all upside
-- there are significant downsides to indexes
-- particularly regarding INSERTS, UPDATES, and DELETES
-- in future videos

-- books also tend to focus on the data structure performance
-- in other words, balanced trees and the *computational* efficiency of them

-- the real performance benefits are all in reduced disk IO

-- in the above, I'm not measuring performance by CPU
-- I'm measuring performance by disk IOs

-- Why? 

-- because CPU is 1000(?) times faster than RAM

-- for DB, generally worry about performance in this order:

-- network, which is 1000(?) slower than
-- disk,    which is 1000(?) slower than
-- RAM,     which is 1000(?) slower than
-- CPU


-- Database by Doug
-- Douglas Kline
-- 7/8/2016
-- simple indexing

No comments:

Post a Comment

Followers