For beginners. How and when to use the EXISTS clause to find records in one table that match records in another table. How it compares to JOINing and the IN clause.
Here is the SQL that goes with the demo.
-- Database by Doug -- Douglas Kline -- 10/27/2015 -- EXISTS use northwind -- the EXISTS clause is a logical operator that evaluates a subquery for existence of records -- if the subquery has one or more records, EXISTS returns true -- if the subquery has no records, EXISTS returns false -- since EXISTS returns a logical value, -- it is common to use EXISTS in the WHERE clause, which expects logical expressions -- it can make some queries faster -- and sometimes makes your query clearer -- here is a simple example SELECT * FROM Products WHERE EXISTS ( SELECT 17 ) -- which returns all records in the Products table -- the logic is that EXISTS is evaluated for each record in the Products table -- for each record, EXISTS (SELECT 17) is evaluated, -- and since (SELECT 17) always returns a 1-row, 1-column table, -- EXISTS returns true for all records -- that's not a very practical example -- here's a very common use of the EXISTS SELECT CategoryName FROM Categories WHERE EXISTS (SELECT * FROM Products WHERE Products.CategoryID = Categories.CategoryID) -- this is "Categories that have products" -- -- the subquery is a "correlated subquery", -- which means it references a table outside the subquery -- in this case, the correlation is on Categories.CategoryID -- note that Categories is not in the FROM clause of the subquery -- you can do this same thing like this: SELECT DISTINCT CategoryName FROM Categories JOIN Products ON Products.CategoryID = Categories.CategoryID -- note that we need to use DISTINCT to remove duplicates -- so why not always use the JOIN version? -- well, notice we don't really need to get any -- fields from the Products table -- it's only there to figure out which categories to show -- consider a case where there are 2 categories -- and 20 billion products -- the EXISTS version does this: ---- for category 1 ---- look through products until it finds a product in category 1 ---- stop ---- for category 2 ---- look through products until it finds a product in category 2 ---- stop -- the JOIN w/DISTINCT version does this: ---- join the 2 categories with all 20 billion products ---- remove the 19,999,999,998 duplicates -- so you can see that the EXISTS version can be much faster -- depending on the situation -- EXISTS is logically very similar to IN -- here's another way to do the query above SELECT CategoryName FROM Categories WHERE CategoryID IN (SELECT CategoryID FROM Products) -- reminder of EXISTS SELECT CategoryName FROM Categories WHERE EXISTS (SELECT * FROM Products WHERE Products.CategoryID = Categories.CategoryID) -- the work that the database engine has to do -- is very similar here -- the "IN" can stop as soon as it finds the first instance -- it doesn't need to search through all the products -- however, NOT EXISTS (and NOT IN) does not usually have the same performance benefit: SELECT CategoryName FROM Categories WHERE NOT EXISTS (SELECT * FROM Products WHERE Products.CategoryID = Categories.CategoryID) -- logically, this will give you the correct records -- however, proving that something DOES NOT exist means checking all of them -- in our example above with 2 categories, and 20 billion products -- assume that category 2 doesn't have any products -- to prove that category 2 doesn't exist in Products, -- all 20 billion products need to be checked -- in summary, EXISTS can sometimes improve the performance -- of a query like this, over a JOIN SELECT CategoryName FROM Categories WHERE EXISTS (SELECT * FROM Products WHERE Products.CategoryID = Categories.CategoryID) -- also, EXISTS can sometimes to be used to more clearly -- phrase a query: SELECT CategoryName FROM Categories WHERE EXISTS (SELECTWHERE Products.CategoryID = Categories.CategoryID) -- thanks for watching -- Database by Doug -- Douglas Kline -- 10/27/2015 -- EXISTS
No comments:
Post a Comment