-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Terminology : Tables, Columns, Rows
use Northwind
-- no pre-requisites
-- a very good way to organize data is in a table
-- why?
-- mainly because humans find tables fairly easy to understand
-- what they should look like, how they should behave, how to look things up, etc.
-- for example, lets say I show somebody this table:
SELECT *
FROM ( VALUES
('Doug','2'),
('Dan', '3'),
('Debbie', '1'),
('2','Joe')) AS tbl([Name], [Number])
-- Most people will think that is weird
-- why is there a number in the Name column?
-- we know this innately, without knowing any special theory
-- we like it this way, because it keeps things organized
-- organized data is easy to get value out of
-- so let's go over some terminology, looking at some real table data
SELECT ProductID,
ProductName,
Unitprice,
UnitsInstock,
QuantityPerUnit,
Discontinued
FROM Products
-- **highlighting entire table
-- the entire table is also sometimes called:
-- an entity, a table, a file, an object
-- entity - usually used by data modelers/architects
-- table - usually used by database analysts
-- file - usually used by database administrators (especially of old databases, where each table was stored as a file)
-- object - usually used by software developers (especially object-oriented developers, where an object maps to a table)
-- **highlighting a row
-- one row in a table also has several names:
-- row, record, entity instance, object instance
-- row - used by most people, in other words its just a part of a table that's horizontal
-- record - usually used by database analysts, this is special terminology for db people
-- entity instance - usually used by data modelers/architects
-- object instance - usually used by software developers
-- **highlighting a column
-- one column in a table has several names:
-- column, attribute, field, property
-- column - used by most people, in other words its just a part of a table that's vertical
-- attribute- or entity attribute, used by data modelers, some value of interest about an entity
-- field - used by database analysts, special terminology for db people
-- property - or object property, used by OO software developers, object properties usually map to table fields
-- ** highlighting a cell
-- one cell, an intersection of a row and column has several names:
-- cell, value, field, field value, attribute, attribute value, property, property value, etc.
-- careful...
-- sometimes attribute means an entire column, e.g., Person table has an attribute named firstName
-- sometimes attribute means a single cell, e.g., Doug's firstName attribute is Doug
-- thanks for watching
-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Terminology : Tables, Columns, Rows
Monday, July 30, 2018
Tables and their parts
Structuring Data in Tables
-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Good Tables : Structuring your Data
-- here's an odd example
SELECT *
FROM ( VALUES
('Doug','2'),
('Dan', '3'),
('Debbie', '1'),
('2','Joe')) AS tbl([Name], [Number])
-- this probably makes you a little uncomfortable
-- it might bother some more than others
-- in Relational Databases, like SQL Server (or DB2 or Oracle or PostgreSQL or mySQL or...)
-- we can easily enforce organization on table
-- in this example, you shouldn't be able to put a number in the name column
SELECT ProductID,
ProductName,
UnitPrice,
UnitsInStock,
QuantityPerUnit,
Discontinued
FROM Products
-- here are some fundamental organization mechanisms built-in to an RDBMS
-- every table must have a name
-- every column must have a name
-- every table is a rectangle
-- -- every column has the same number of rows
-- -- every row has the same number of columns
-- -- there are never an "ragged" rows or columns
-- every table has at least one column
-- (but tables can have zero rows - an empty table)
-- just those simple things immediately keep the data organized
-- you don't have to do anything else
-- you don't HAVE to organize your data any further
-- the database won't AUTOMATICALLY organize your data any further
-- as an example,
-- you don't HAVE TO name your columns in a meaningful way
-- for example, we could do this:
SELECT ProductID AS [A],
ProductName AS [B],
UnitPrice AS [C],
UnitsInStock AS [D],
QuantityPerUnit AS [E],
Discontinued AS [F]
FROM Products
-- of course it goes without saying, but I need to say it
-- a clear, descriptive, concise name is critical
-- for your own sanity and for the sanity of those that come after...
-- there are very good benefits to keeping your data MORE organized than the bar minimum
SELECT ProductID,
ProductName,
UnitPrice,
UnitsInStock,
QuantityPerUnit,
Discontinued
FROM Products
-- if we want to go further, here are the type of things you CAN enforce
-- a value in a cell should be filled in (NOT NULL), or can be left unknown
-- a value in a cell should be a specified data type (numeric, alphabetic, date, etc.)
-- a value in a cell should be in a certain range (more than zero, not include special characters, etc.)
-- values in a column must be unique (no duplicates across rows)
-- a value in a cell should be from a list somewhere (predefined or dynamic)
-- other things
-- adding these constraints keeps your data MORE organized, and easier/quicker/cheaper to turn into valuable information
-- Here are some more subtle ways to organize your data
-- these are critically important
-- and sometimes more theoretically presented as "normalizing" your data
-- ** Every record should have a unique identifier
-- this is called "entity integrity" in database theory
-- it enables us to uniquely find each record individually
-- for example, "the record with productID=7"
-- at most, there is one record with 7 as the productID
-- if two records have that productID, I can't distinguish them
-- ** the meaning of a column should not change across rows
-- in other words, values in the ProductName column are always ProductNames
-- there are never any "dual-purpose" or "multi-purpose" columns
-- never "sometimes it means this, other times it means something else"
-- ** values in a row always pertain to that row
-- in other words, every value in the ProductID=1 record is about that product
-- another way, we never store anything in a record that is not specifically about that record
-- an example violation: storing the name of a supplier in a product record
-- the name of a supplier is an attribute of a Supplier,
-- and should be stored in a Supplier table in a column named SupplierName
-- ** only one value in a cell
-- don't have a list of values with commas between them
-- don't have "Douglas M Kline" in a single column,
-- rather have "Douglas", "M", and "Kline" in separate columns
-- In Summary
-- well-structured data has
-- good names
-- data types chosen well
-- valid values enforced
-- a unique, non-NULL identifier
-- each column has a single meaning
-- each row represents a single thing
-- each cell has a single value
-- thanks for watching
-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Good Tables : Structuring your Data
Subscribe to:
Comments (Atom)