Monday, July 30, 2018

Tables and their parts





-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Terminology : Tables, Columns, Rows

use Northwind

-- no pre-requisites

-- a very good way to organize data is in a table

-- why? 

-- mainly because humans find tables fairly easy to understand
-- what they should look like, how they should behave, how to look things up, etc.

-- for example, lets say I show somebody this table:

SELECT *
FROM ( VALUES
   ('Doug','2'),
   ('Dan', '3'),
   ('Debbie', '1'),
   ('2','Joe')) AS tbl([Name], [Number])

-- Most people will think that is weird
-- why is there a number in the Name column?

-- we know this innately, without knowing any special theory

-- we like it this way, because it keeps things organized

-- organized data is easy to get value out of

-- so let's go over some terminology, looking at some real table data

SELECT  ProductID,
   ProductName,
   Unitprice,
   UnitsInstock,
   QuantityPerUnit,
   Discontinued
FROM  Products

-- **highlighting entire table
-- the entire table is also sometimes called:
-- an entity, a table, a file, an object

-- entity - usually used by data modelers/architects
-- table - usually used by database analysts
-- file  - usually used by database administrators (especially of old databases, where each table was stored as a file) 
-- object - usually used by software developers (especially object-oriented developers, where an object maps to a table)

-- **highlighting a row
-- one row in a table also has several names:
-- row, record, entity instance, object instance

-- row  - used by most people, in other words its just a part of a table that's horizontal
-- record - usually used by database analysts, this is special terminology for db people
-- entity instance - usually used by data modelers/architects
-- object instance - usually used by software developers

-- **highlighting a column
-- one column in a table has several names:
-- column, attribute, field, property

-- column - used by most people, in other words its just a part of a table that's vertical
-- attribute- or entity attribute, used by data modelers, some value of interest about an entity
-- field - used by database analysts, special terminology for db people
-- property - or object property, used by OO software developers, object properties usually map to table fields

-- ** highlighting a cell
-- one cell, an intersection of a row and column has several names:
-- cell, value, field, field value, attribute, attribute value, property, property value, etc.

-- careful...

-- sometimes attribute means an entire column, e.g., Person table has an attribute named firstName
-- sometimes attribute means a single cell, e.g., Doug's firstName attribute is Doug

-- thanks for watching

-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Terminology : Tables, Columns, Rows

Structuring Data in Tables




-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Good Tables : Structuring your Data

-- here's an odd example
SELECT *
FROM ( VALUES
   ('Doug','2'),
   ('Dan', '3'),
   ('Debbie', '1'),
   ('2','Joe'))  AS tbl([Name], [Number])

-- this probably makes you a little uncomfortable 
-- it might bother some more than others

-- in Relational Databases, like SQL Server (or DB2 or Oracle or PostgreSQL or mySQL or...)
-- we can easily enforce organization on table

-- in this example, you shouldn't be able to put a number in the name column

SELECT  ProductID,
  ProductName,
  UnitPrice,
  UnitsInStock,
  QuantityPerUnit,
  Discontinued
FROM    Products

-- here are some fundamental organization mechanisms built-in to an RDBMS

-- every table must have a name
-- every column must have a name
-- every table is a rectangle 
-- -- every column has the same number of rows
-- -- every row has the same number of columns
-- -- there are never an "ragged" rows or columns
-- every table has at least one column
-- (but tables can have zero rows - an empty table)

-- just those simple things immediately keep the data organized

-- you don't have to do anything else
-- you don't HAVE to organize your data any further
-- the database won't AUTOMATICALLY organize your data any further

-- as an example,
-- you don't HAVE TO name your columns in a meaningful way
-- for example, we could do this:
SELECT  ProductID  AS [A],
  ProductName  AS [B],
  UnitPrice  AS [C],
  UnitsInStock AS [D],
  QuantityPerUnit AS [E],
  Discontinued AS [F]
FROM    Products

-- of course it goes without saying, but I need to say it
-- a clear, descriptive, concise name is critical
-- for your own sanity and for the sanity of those that come after...

-- there are very good benefits to keeping your data MORE organized than the bar minimum
SELECT  ProductID,
  ProductName,
  UnitPrice,
  UnitsInStock,
  QuantityPerUnit,
  Discontinued
FROM    Products
-- if we want to go further, here are the type of things you CAN enforce

-- a value in a cell should be filled in (NOT NULL), or can be left unknown
-- a value in a cell should be a specified data type (numeric, alphabetic, date, etc.)
-- a value in a cell should be in a certain range (more than zero, not include special characters, etc.)
-- values in a column must be unique (no duplicates across rows)
-- a value in a cell should be from a list somewhere (predefined or dynamic)
-- other things

-- adding these constraints keeps your data MORE organized, and easier/quicker/cheaper to turn into valuable information

-- Here are some more subtle ways to organize your data
-- these are critically important
-- and sometimes more theoretically presented as "normalizing" your data

-- ** Every record should have a unique identifier
-- this is called "entity integrity" in database theory
-- it enables us to uniquely find each record individually
-- for example, "the record with productID=7" 
-- at most, there is one record with 7 as the productID
-- if two records have that productID, I can't distinguish them

-- ** the meaning of a column should not change across rows
-- in other words, values in the ProductName column are always ProductNames
-- there are never any "dual-purpose" or "multi-purpose" columns
-- never "sometimes it means this, other times it means something else"

-- ** values in a row always pertain to that row
-- in other words, every value in the ProductID=1 record is about that product
-- another way, we never store anything in a record that is not specifically about that record
-- an example violation: storing the name of a supplier in a product record
-- the name of a supplier is an attribute of a Supplier, 
-- and should be stored in a Supplier table in a column named SupplierName

-- ** only one value in a cell
-- don't have a list of values with commas between them
-- don't have "Douglas M Kline" in a single column, 
-- rather have "Douglas", "M", and "Kline" in separate columns

-- In Summary

-- well-structured data has
-- good names
-- data types chosen well
-- valid values enforced
-- a unique, non-NULL identifier
-- each column has a single meaning
-- each row represents a single thing
-- each cell has a single value

-- thanks for watching

-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Good Tables : Structuring your Data


Followers