Database by Doug: February 2015

Tuesday, February 10, 2015

Fix for Zoomit.exe Error launching 64-bit version...

Had a small problem trying to get Zoomit running on a machine. It stopped working after some changes to our Group Policy in a response to a nasty virus that hit our network.

I use Zoomit frequently, and the Windows Magnifier is really not a satisfactory substitute.

Here's the error I was getting when I tried to install it:

Our tech support person found this post by Andrew Potts. It worked, but it lacked a few details that I had to work out. So I thought I would expand on it.

So it seems that running ZoomIt.exe attempts to write something into the Temp folder, which is blocked by the group policy. Apparently, this is only a problem on 64-bit machines.

The Temp folder is defined as an environment variable, which you can change in the Control Panel under System:

Or, if you are comfortable at the command line, you can change this with the SET command:

SET TEMP=C:\temp1

So here's the exact steps that I used to fix this on my machine:

get to a command prompt by running cmd under Start|Run
run the command "SET" to see the current environment variables including TEMP (no command arguments, just the command SET)
record the current TEMP variable value - you'll probably want to set this back to the original when you are done
create a folder C:\temp1
copy the ZoomIt.exe to the C:\temp1 folder
run the command "SET TEMP=C:\temp1" - sets the TEMP variable to the new folder
run the command "cd C:\temp1" - changes the working directory to the new folder
run the command "ZoomIt.exe" - runs the installer
you should now see a file "C:\temp1\ZoomIt64.exe"
run the "SET TEMP=..." command to set the TEMP variable back to its original value

You can copy this file wherever you want, or simply leave it where it is and create a shortcut to it.

Wednesday, February 4, 2015

One Way to Think About SQL Joins: It's just a WHERE Clause

When I teach relational database, I have several ways that I present the concept of joins.

One way to think about a join is that it is simply a WHERE clause.

Take the following data model from the MS Northwinds reference database.

I've modified the data so that we can focus on a limited number of records, and see what's going on more clearly.

SELECT SupplierID,

CompanyName

FROM Suppliers

gives:

SupplierID CompanyName
----------- ----------------------------------------
3 Grandma Kelly's Homestead

9 PB Knäckebröd AB

and

SELECT ProductID,

SupplierID,

ProductName

FROM Products

gives:

ProductID SupplierID ProductName

----------- ----------- --------------------------------

7 3 Grandma's Boysenberry Spread

    8           3           Uncle Bob's Organic Dried Pears

    9           3           Northwoods Cranberry Sauce

    23          9           Gustaf's Knäckebröd

    24          9           Tunnbröd

Given those tables and data, consider this statement:

SELECT   Suppliers.SupplierID,

         Suppliers.CompanyName,

         Products.ProductID,

         Products.SupplierID,

         Products.ProductName

FROM     Suppliers,

         Products

ORDER BY Suppliers.SupplierID,

         Products.ProductID

Notice that there is no explicit JOIN statement. (So this is a CROSS JOIN) This SELECT statement returns:

SupplierID  CompanyName                 ProductID   SupplierID  ProductName

----------- --------------------------- ----------- ----------- -------------------------------

          Grandma Kelly's Homestead   7           3           Grandma's Boysenberry Spread

          Grandma Kelly's Homestead   8           3           Uncle Bob's Organic Dried Pears

          Grandma Kelly's Homestead   9           3           Northwoods Cranberry Sauce

          Grandma Kelly's Homestead   23          9           Gustaf's Knäckebröd

          Grandma Kelly's Homestead   24          9           Tunnbröd

          PB Knäckebröd AB            7           3           Grandma's Boysenberry Spread

          PB Knäckebröd AB            8           3           Uncle Bob's Organic Dried Pears

          PB Knäckebröd AB            9           3           Northwoods Cranberry Sauce

          PB Knäckebröd AB            23          9           Gustaf's Knäckebröd

          PB Knäckebröd AB            24          9           Tunnbröd

A CROSS JOIN gives all possible combinations of the records in Suppliers and the records in Products. Think of the database engine as saying "You haven't specified how to match up the records, so I'm assuming any record can match with any other record."

But clearly, there in the resulting combinations, some suppliers are shown next to products that they do not supply, i.e., the supplierID from the supplier record does not match the supplierID from the product record (highlighted in red below):

SupplierID  CompanyName                 ProductID   SupplierID  ProductName

----------- --------------------------- ----------- ----------- -------------------------------

3           Grandma Kelly's Homestead   7           3           Grandma's Boysenberry Spread

3           Grandma Kelly's Homestead   8           3           Uncle Bob's Organic Dried Pears

3           Grandma Kelly's Homestead   9           3           Northwoods Cranberry Sauce

3           Grandma Kelly's Homestead   23          9           Gustaf's Knäckebröd

3           Grandma Kelly's Homestead   24          9           Tunnbröd

9           PB Knäckebröd AB            7           3           Grandma's Boysenberry Spread

9           PB Knäckebröd AB            8           3           Uncle Bob's Organic Dried Pears

9           PB Knäckebröd AB            9           3           Northwoods Cranberry Sauce

9           PB Knäckebröd AB            23          9           Gustaf's Knäckebröd

9           PB Knäckebröd AB            24          9           Tunnbröd

But some suppliers are shown next to products that they do supplier, i.e., the supplied ID from the supplier record does match the supplierID from the product record (highlighted in green below):

SupplierID  CompanyName                 ProductID   SupplierID  ProductName

----------- --------------------------- ----------- ----------- -------------------------------

3           Grandma Kelly's Homestead   7           3           Grandma's Boysenberry Spread

3           Grandma Kelly's Homestead   8           3           Uncle Bob's Organic Dried Pears

3           Grandma Kelly's Homestead   9           3           Northwoods Cranberry Sauce

3           Grandma Kelly's Homestead   23          9           Gustaf's Knäckebröd

3           Grandma Kelly's Homestead   24          9           Tunnbröd

9           PB Knäckebröd AB            7           3           Grandma's Boysenberry Spread

9           PB Knäckebröd AB            8           3           Uncle Bob's Organic Dried Pears

9           PB Knäckebröd AB            9           3           Northwoods Cranberry Sauce

9           PB Knäckebröd AB            23          9           Gustaf's Knäckebröd

9           PB Knäckebröd AB            24          9           Tunnbröd

So, if we were to add a condition in the WHERE clause that enforced that only the records in green above should be shown:

SELECT   Suppliers.SupplierID,

         Suppliers.CompanyName,

         Products.ProductID,

         Products.SupplierID,

         Products.ProductName

FROM     Suppliers,

         Products

WHERE    Suppliers.SupplierID = Products.SupplierID

ORDER BY Suppliers.SupplierID,

         Products.ProductID 

Then we would get the suppliers matched up with the correct products:

SupplierID  CompanyName                 ProductID   SupplierID  ProductName

----------- --------------------------- ----------- ----------- -------------------------------

3           Grandma Kelly's Homestead   7           3           Grandma's Boysenberry Spread

3           Grandma Kelly's Homestead   8           3           Uncle Bob's Organic Dried Pears

3           Grandma Kelly's Homestead   9           3           Northwoods Cranberry Sauce

9           PB Knäckebröd AB            23          9           Gustaf's Knäckebröd

9           PB Knäckebröd AB            24          9           Tunnbröd

A Single-Record Merge Statement

The Single-Record Merge Statement

I was looking for a very simple, pared-down example of a MERGE statement to use in teaching/training. Couldn't find one, so I made this one up.

In the process, I found that getting the @@IDENTITY value out of an inserted record in a MERGE statement is not supported. You have to set up a table variable to capture the IDENTITY-created value, then move it to a scalar variable.

This makes MERGE fairly clunky, in terms of code clarity, for this use case. However, there are certainly cases where the reduced seeks might make this worthwhile.

For this example, we'll use the Northwinds reference database Products table:

So here's the desired task, written as a stored procedure without using the MERGE statement:

CREATE PROC changeProductPrice
@productID int = NULL,
@newPrice money,
@newProductName nvarchar(40),
@newProductID int = NULL OUTPUT
AS
IF EXISTS(SELECT * FROM Products WHERE productID = @productID)
BEGIN
UPDATE Products
SET unitprice = @newPrice
WHERE productID = @productID
END
ELSE
BEGIN
INSERT INTO Products
(unitprice, ProductName)
VALUES (@newPrice, @newProductName)

SET @newProductID = @@IDENTITY
END
GO

The situation is:

there is a Product that may or may not exist as a record in the Products table
if the Product already exists, we'd like to update the unitPrice
if the Product record doesn't exist, we'd like to add the record
we also want to return the productID of the product if it is a new record

This is a fairly classic issue - UPDATE if it's already there, INSERT if it isn't.

Using the Stored Procedure requires 2 seeks on the table:

see if the record exists
locate the record to either UPDATE or find the location to INSERT (this might be an append if ProductID is an IDENTITY field)

See the resulting query plan:

So here's how to do it as a MERGE statement:

CREATE PROC changeProductPriceMerge

@productID int = NULL,

@newPrice money,

@newProductName nvarchar(40),

@newProductID int =NULL OUTPUT

DECLARE @newProductIDTable table(productID int);

MERGE Products

USING (SELECT @productID,

@newPrice,

@newProductName)

AS source (

productID,

newPrice,

newProductName)

ON (Products.productID = source.productID)

WHEN MATCHED THEN

UPDATE

SET unitprice = source.newPrice

WHEN NOT MATCHED THEN

INSERT ( unitPrice,

productName)

VALUES ( source.newPrice,

source.newProductName)

OUTPUT inserted.productID INTO @newProductIDTable;

SELECT TOP 1 @newProductID = productID FROM @newProductIDTable;

Frankly, I was hoping that the MERGE solution would be more elegant than using an IF statement to choose between INSERT and UPDATE. Don't think it ended up that way.

But this is still a very pared-down example of MERGE.

Let's go through the main parts of the MERGE statement:

MERGE - this is where we state which table will possibly be modified. In this case it is the Products table. This is sometimes renamed as [target] to be clear which table will possibly change.
USING - this is where we define the data that will be used to:

determine what will happen to records in the [target] (INSERT, UPDATE, DELETE, nothing)
used as values to make something happen in the [target]

ON - this must be a logical expression (evaluates to true or false), and is usually in the form of a JOIN-like expression that relates [target] and [source] records
WHEN MATCHED - defines the operation to occur when the expression in the ON clause evaluates to true
WHEN NOT MATCHED - defines the operation to occur when the expression in the ON clause evaluates to false
OUTPUT - used to get the data that was affected by the MERGE statement. In this case, we are using it to get the database-generated IDENTITY value

The clunky portion to all this is that the OUTPUT clause can only accept a Table (or Table variable) in the INTO portion. As a result, it's necessary to create a Table variable, @newProductTable, to hold the new IDENTITY scalar value.

However, this is more of a declarative way to state this operation, and results in only a single seek on the Products table.

Here's the query plan for the MERGE version of the stored procedure:

There are still two separate operations, due to the clunky handling of the new IDENTITY value in a table variable.

If the Products table were very large, and ProductID were not set as an IDENTITY field, the MERGE version might be faster. However, the MERGE version requires the instantiation of a table variable.

If we wrote this without providing the productID back as an OUTPUT, the MERGE version would certainly use only one operation.

All of this is not to say that MERGE is not helpful, but I think in this single-record situation, I'd choose the non-MERGE version for clarity.

Logical and Physical IOs: A chili-cooking analogy

I was teaching a class today, covering the topic of indexes with some hands-on activities.

A group of students get some card-stock pages with data on them, each one representing an 8K data or index page. With these, we go through a number of database read operations and look at the IOs that are generated.

This is a great exercise, because it focuses attention on IO, rather than computation.

However, I found that the students had some trouble differentiating between a physical IO and a logical IO. Or realizing that an IO was either logical or physical, and wasn't be both.

So I came up with a chili-cooking analogy.

Suppose I am cooking chili and need a can of kidney beans.

If I go to my cupboard, and there is no can of beans, then I'll need to get in the car and go to the store.

Dang.

That would be like a physical IO. The can of beans is not in my cupboard, and it's a pain (in time and effort) to go to the store for the can of beans.

If I go to the cupboard, and the can of beans is already there, then I don't need to go to the store.

Cool.

That would be like a logical IO. The can of beans is right there handy, and I don't need to spend the time and effort to go to the store.

This is not a perfect analogy, but it has some good parallels.

I need a can of beans. This is like the SQL Engine needing a data page. It would be considered an IO request. But we don't yet know whether it will be a logical or physical IO.
The can is in my cupboard or it is not.

It's in my cupboard. This is like a logical IO. No reason to go to disk (store).
It's not in my cupboard. This is like a physical IO. Have to go to the disk (store).

So I'm either going to the store (physical IO) or I'm not (logical IO).

So why count logical IOs?

Well, logical IOs are kind of bonus; you can't count on them. (Sometimes that can of beans is in the cupboard, and sometimes its not.) If you have an operation that takes 1000 IOs, it might be all physical IOs the first time, and all logical the next time. So, you should be looking at total IOs as a measure of potential IOs for the operation.

Database by Doug