A short video covering the JOINs within a SQL SELECT statement.
For beginners. How to show columns from more than one table in a single SELECT statement. Covers joining with the WHERE clause and the JOIN statement, and some best practices.Here is the SQL that goes with the demo.
-- Database by Doug
-- Douglas Kline
-- 10/1/2015
-- Introduction to JOINs
USE Northwind
-- suppose we want to look at our products
-- and what categories they are in
SELECT *
FROM Products
-- focus more on just a few columns
SELECT ProductID,
ProductName,
CategoryID
FROM Products
ORDER BY ProductID
-- so what category is CategoryID 1? 4?
SELECT *
FROM Categories
-- focus in on the categoryName
SELECT CategoryID,
CategoryName
FROM Categories
-- great, categoryID=1 is 'Beverages'
-- categoryID=4 is 'Dairy Products'
-- we used the CategoryID field in the Products table
-- to "look up" the name of the Category
-- but we don't want to do that for lots of records
SELECT ProductID,
ProductName,
CategoryID
FROM Products
ORDER BY ProductID
-- why not just add the CategoryName field to the SELECT clause?
SELECT ProductID,
ProductName,
CategoryID,
CategoryName
FROM Products
ORDER BY ProductID
-- CategoryName is not a field in the Products table
-- OK, let's add the Categories table to the FROM statement
SELECT ProductID,
ProductName,
CategoryID,
CategoryName
FROM Products,
Categories
ORDER BY ProductID
-- that got rid of the syntax error on CategoryName
-- but now CategoryID is ambiguous??
-- look at the columns list for each table
-- there's a CategoryID in both the Products table
-- and the Products table
-- so, we need to be more specific - which one do we mean
-- here's how
SELECT ProductID,
ProductName,
Products.CategoryID,
CategoryName
FROM Products,
Categories
ORDER BY ProductID
-- great, now we don't have any syntax errors
-- but we have 702 rows?
-- where is that coming from?
SELECT COUNT(CategoryID) AS [CategoryCount]
FROM Categories
-- 9 categories
SELECT COUNT(ProductID) AS [ProductCount]
FROM Products
-- 78 products
SELECT 9 * 78
-- 702 - just like the number of records
SELECT ProductID,
ProductName,
Products.CategoryID,
CategoryName
FROM Products,
Categories
ORDER BY ProductID
-- let's look closer
-- adding the categoryID from the Categories table
SELECT ProductID,
ProductName,
Products.CategoryID,
Categories.CategoryID,
CategoryName
FROM Products,
Categories
ORDER BY ProductID
-- that's odd, why is chai being shown next to the wrong category?
-- in fact, it's being shown next to *every* category
-- all products are shown next to every category
-- because we haven't stated how the products and categories should be related
-- the database engine returns all possible combinations of products and categories
-- this is called a CROSS JOIN
-- this is sometime what we want, but not usually
-- so , how do we limit the returned records below
-- so that only the ones with matching categoryIDs are kept?
-- using the WHERE clause
--
SELECT ProductID,
ProductName,
Products.CategoryID,
Categories.CategoryID,
CategoryName
FROM Products,
Categories
WHERE Products.CategoryID = Categories.CategoryID -- ********
ORDER BY ProductID
-- this looks a lot better
-- chai has a categoryID = 1, and it is shown next to the category with categoryID = 1
-- but the record count is 77?
-- aren't there 78 products?
-- looking at the products again...
SELECT ProductID,
ProductName,
CategoryID
FROM Products
ORDER BY ProductID
-- scrolling down, we can see that Dougs Mustard has a NULL categoryID
-- and looking back our combined query
SELECT ProductID,
ProductName,
Products.CategoryID,
Categories.CategoryID,
CategoryName
FROM Products,
Categories
WHERE Products.CategoryID = Categories.CategoryID -- ********
ORDER BY ProductID
-- scrolling down, note that Dougs Mustard doesn't appear
-- because this Products.CategoryID = Categories.CategoryID
-- is essentially this: NULL = Categories.CategoryID
-- which is false (NULLs in another video)
-- ok, looks like we have the logic right with this:
SELECT ProductID,
ProductName,
Products.CategoryID,
Categories.CategoryID,
CategoryName
FROM Products,
Categories
WHERE Products.CategoryID = Categories.CategoryID
ORDER BY ProductID
-- but humans don't really want to see the CAtegoryID
-- so we can remove them
SELECT ProductID,
ProductName,
CategoryName
FROM Products,
Categories
WHERE Products.CategoryID = Categories.CategoryID
ORDER BY ProductID
-- the above is called a JOIN of the Products and Categories table
-- the above is stated as a constraint in the WHERE clause
-- but it is a special kind of a constraint
-- between the Primary Key of one table, and the Foreign Key in another table
-- so one way to look at a JOIN is as a constraint
-- you make all possible combinations of the records from two tables
-- then remove the ones that don't have equal PK--FK combinations
-- another way to view a JOIN is as a specific combination of tables
-- combine the tables in a specific way to create a specific set of records
-- then show fields from the combined set of records
-- this is done in the FROM clause
SELECT ProductID,
ProductName,
CategoryName
FROM Products JOIN Categories ON Products.CategoryID = Categories.CategoryID
ORDER BY ProductID
-- this is logically equivalent
-- it will always return the same records as the WHERE clause version
-- I like to format it like this:
SELECT ProductID,
ProductName,
CategoryName
FROM Products
JOIN Categories ON Products.CategoryID = Categories.CategoryID
ORDER BY ProductID
-- I suggest you learn both ways
-- so you can read either way when you encounter it
-- also, sometimes one way is clearer than another in a certain context
-- the JOIN syntax signals your *intention* better than the WHERE clause
-- and generally organizes your statement better
-- conceptually, it is an binary operator between two tables,
-- with an ON clause to specify details that the operator will use
-- it is *not* a list of tables (separated by commas),
-- but an expression that states how to combine the tables
-- one last thing
-- safe SQL coding practice
-- right now the statement below works fine
-- it is syntactically correct
SELECT ProductID,
ProductName,
CategoryName
FROM Products
JOIN Categories ON Products.CategoryID = Categories.CategoryID
ORDER BY ProductID
-- but what would happen if in the future
-- someone put a CategoryName field in the Products table?
-- then there would be a CategoryName field in both Products and Categories
-- they *should* be able to do that without breaking code
-- even if it doesn't make sense
-- then categoryName would become ambiguous and the code would not execute
-- so how do we prevent that possible future bug?
-- by *qualifying* every field like this
SELECT Products.ProductID,
Products.ProductName,
Categories.CategoryName
FROM Products
JOIN Categories ON Products.CategoryID = Categories.CategoryID
ORDER BY Products.ProductID
-- safe coding rule:
-- whenever there is more than one table
-- fully qualify every field in every clause: SELECT, FROM, WHERE, ORDER BY, etc.
-- intro to JOIN...
No comments:
Post a Comment