Monday, July 30, 2018

Structuring Data in Tables




-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Good Tables : Structuring your Data

-- here's an odd example
SELECT *
FROM ( VALUES
   ('Doug','2'),
   ('Dan', '3'),
   ('Debbie', '1'),
   ('2','Joe'))  AS tbl([Name], [Number])

-- this probably makes you a little uncomfortable 
-- it might bother some more than others

-- in Relational Databases, like SQL Server (or DB2 or Oracle or PostgreSQL or mySQL or...)
-- we can easily enforce organization on table

-- in this example, you shouldn't be able to put a number in the name column

SELECT  ProductID,
  ProductName,
  UnitPrice,
  UnitsInStock,
  QuantityPerUnit,
  Discontinued
FROM    Products

-- here are some fundamental organization mechanisms built-in to an RDBMS

-- every table must have a name
-- every column must have a name
-- every table is a rectangle 
-- -- every column has the same number of rows
-- -- every row has the same number of columns
-- -- there are never an "ragged" rows or columns
-- every table has at least one column
-- (but tables can have zero rows - an empty table)

-- just those simple things immediately keep the data organized

-- you don't have to do anything else
-- you don't HAVE to organize your data any further
-- the database won't AUTOMATICALLY organize your data any further

-- as an example,
-- you don't HAVE TO name your columns in a meaningful way
-- for example, we could do this:
SELECT  ProductID  AS [A],
  ProductName  AS [B],
  UnitPrice  AS [C],
  UnitsInStock AS [D],
  QuantityPerUnit AS [E],
  Discontinued AS [F]
FROM    Products

-- of course it goes without saying, but I need to say it
-- a clear, descriptive, concise name is critical
-- for your own sanity and for the sanity of those that come after...

-- there are very good benefits to keeping your data MORE organized than the bar minimum
SELECT  ProductID,
  ProductName,
  UnitPrice,
  UnitsInStock,
  QuantityPerUnit,
  Discontinued
FROM    Products
-- if we want to go further, here are the type of things you CAN enforce

-- a value in a cell should be filled in (NOT NULL), or can be left unknown
-- a value in a cell should be a specified data type (numeric, alphabetic, date, etc.)
-- a value in a cell should be in a certain range (more than zero, not include special characters, etc.)
-- values in a column must be unique (no duplicates across rows)
-- a value in a cell should be from a list somewhere (predefined or dynamic)
-- other things

-- adding these constraints keeps your data MORE organized, and easier/quicker/cheaper to turn into valuable information

-- Here are some more subtle ways to organize your data
-- these are critically important
-- and sometimes more theoretically presented as "normalizing" your data

-- ** Every record should have a unique identifier
-- this is called "entity integrity" in database theory
-- it enables us to uniquely find each record individually
-- for example, "the record with productID=7" 
-- at most, there is one record with 7 as the productID
-- if two records have that productID, I can't distinguish them

-- ** the meaning of a column should not change across rows
-- in other words, values in the ProductName column are always ProductNames
-- there are never any "dual-purpose" or "multi-purpose" columns
-- never "sometimes it means this, other times it means something else"

-- ** values in a row always pertain to that row
-- in other words, every value in the ProductID=1 record is about that product
-- another way, we never store anything in a record that is not specifically about that record
-- an example violation: storing the name of a supplier in a product record
-- the name of a supplier is an attribute of a Supplier, 
-- and should be stored in a Supplier table in a column named SupplierName

-- ** only one value in a cell
-- don't have a list of values with commas between them
-- don't have "Douglas M Kline" in a single column, 
-- rather have "Douglas", "M", and "Kline" in separate columns

-- In Summary

-- well-structured data has
-- good names
-- data types chosen well
-- valid values enforced
-- a unique, non-NULL identifier
-- each column has a single meaning
-- each row represents a single thing
-- each cell has a single value

-- thanks for watching

-- Database by Doug
-- Douglas Kline
-- 7/30/2018
-- Good Tables : Structuring your Data


No comments:

Post a Comment

Followers