Wednesday, February 17, 2016

The EXISTS clause in SQL

A short video on how to use the EXISTS clause in SQL.
For beginners. How and when to use the EXISTS clause to find records in one table that match records in another table. How it compares to JOINing and the IN clause.


Here is the SQL that goes with the demo.

-- Database by Doug
-- Douglas Kline
-- 10/27/2015
-- EXISTS

use northwind

-- the EXISTS clause is a logical operator that evaluates a subquery for existence of records
-- if the subquery has one or more records, EXISTS returns true
-- if the subquery has no records, EXISTS returns false

-- since EXISTS returns a logical value,
-- it is common to use EXISTS in the WHERE clause, which expects logical expressions

-- it can make some queries faster
-- and sometimes makes your query clearer

-- here is a simple example

SELECT *
FROM   Products
WHERE  EXISTS ( SELECT 17 )

-- which returns all records in the Products table
-- the logic is that EXISTS is evaluated for each record in the Products table
-- for each record, EXISTS (SELECT 17) is evaluated,
-- and since (SELECT 17) always returns a 1-row, 1-column table, 
-- EXISTS returns true for all records

-- that's not a very practical example

-- here's a very common use of the EXISTS 

SELECT CategoryName
FROM   Categories
WHERE  EXISTS 
         (SELECT *
          FROM   Products
          WHERE  Products.CategoryID = Categories.CategoryID)

-- this is "Categories that have products"
-- 
-- the subquery is a "correlated subquery", 
-- which means it references a table outside the subquery
-- in this case, the correlation is on Categories.CategoryID
-- note that Categories is not in the FROM clause of the subquery

-- you can do this same thing like this:

SELECT   DISTINCT CategoryName
FROM     Categories
   JOIN  Products    ON Products.CategoryID = Categories.CategoryID

-- note that we need to use DISTINCT to remove duplicates

-- so why not always use the JOIN version?

-- well, notice we don't really need to get any 
-- fields from the Products table
-- it's only there to figure out which categories to show

-- consider a case where there are 2 categories
-- and 20 billion products

-- the EXISTS version does this:
---- for category 1
---- look through products until it finds a product in category 1
---- stop
---- for category 2
---- look through products until it finds a product in category 2
---- stop

-- the JOIN w/DISTINCT version does this:
---- join the 2 categories with all 20 billion products
---- remove the 19,999,999,998 duplicates
   
-- so you can see that the EXISTS version can be much faster
-- depending on the situation

-- EXISTS is logically very similar to IN

-- here's another way to do the query above
SELECT CategoryName
FROM   Categories
WHERE  CategoryID IN 
         (SELECT CategoryID
          FROM   Products)

-- reminder of EXISTS
SELECT CategoryName
FROM   Categories
WHERE  EXISTS 
         (SELECT *
          FROM   Products
          WHERE  Products.CategoryID = Categories.CategoryID)

-- the work that the database engine has to do
-- is very similar here
-- the "IN" can stop as soon as it finds the first instance
-- it doesn't need to search through all the products

-- however, NOT EXISTS (and NOT IN) does not usually have the same performance benefit:
SELECT CategoryName
FROM   Categories
WHERE  NOT EXISTS 
         (SELECT *
          FROM   Products
          WHERE  Products.CategoryID = Categories.CategoryID)

-- logically, this will give you the correct records
-- however, proving that something DOES NOT exist means checking all of them
-- in our example above with 2 categories, and 20 billion products
-- assume that category 2 doesn't have any products
-- to prove that category 2 doesn't exist in Products,
-- all 20 billion products need to be checked

-- in summary, EXISTS can sometimes improve the performance 
-- of a query like this, over a JOIN
SELECT CategoryName
FROM   Categories
WHERE  EXISTS 
         (SELECT *
          FROM   Products
          WHERE  Products.CategoryID = Categories.CategoryID)

-- also, EXISTS can sometimes to be used to more clearly
-- phrase a query:
SELECT CategoryName
FROM   Categories
WHERE  EXISTS 
         (SELECT 
          WHERE  Products.CategoryID = Categories.CategoryID)

-- thanks for watching

-- Database by Doug
-- Douglas Kline
-- 10/27/2015
-- EXISTS

No comments:

Post a Comment

Followers