-- Database by Doug -- Douglas Kline -- 7/30/2018 -- Good Tables : Structuring your Data -- here's an odd example SELECT * FROM ( VALUES ('Doug','2'), ('Dan', '3'), ('Debbie', '1'), ('2','Joe')) AS tbl([Name], [Number]) -- this probably makes you a little uncomfortable -- it might bother some more than others -- in Relational Databases, like SQL Server (or DB2 or Oracle or PostgreSQL or mySQL or...) -- we can easily enforce organization on table -- in this example, you shouldn't be able to put a number in the name column SELECT ProductID, ProductName, UnitPrice, UnitsInStock, QuantityPerUnit, Discontinued FROM Products -- here are some fundamental organization mechanisms built-in to an RDBMS -- every table must have a name -- every column must have a name -- every table is a rectangle -- -- every column has the same number of rows -- -- every row has the same number of columns -- -- there are never an "ragged" rows or columns -- every table has at least one column -- (but tables can have zero rows - an empty table) -- just those simple things immediately keep the data organized -- you don't have to do anything else -- you don't HAVE to organize your data any further -- the database won't AUTOMATICALLY organize your data any further -- as an example, -- you don't HAVE TO name your columns in a meaningful way -- for example, we could do this: SELECT ProductID AS [A], ProductName AS [B], UnitPrice AS [C], UnitsInStock AS [D], QuantityPerUnit AS [E], Discontinued AS [F] FROM Products -- of course it goes without saying, but I need to say it -- a clear, descriptive, concise name is critical -- for your own sanity and for the sanity of those that come after... -- there are very good benefits to keeping your data MORE organized than the bar minimum SELECT ProductID, ProductName, UnitPrice, UnitsInStock, QuantityPerUnit, Discontinued FROM Products -- if we want to go further, here are the type of things you CAN enforce -- a value in a cell should be filled in (NOT NULL), or can be left unknown -- a value in a cell should be a specified data type (numeric, alphabetic, date, etc.) -- a value in a cell should be in a certain range (more than zero, not include special characters, etc.) -- values in a column must be unique (no duplicates across rows) -- a value in a cell should be from a list somewhere (predefined or dynamic) -- other things -- adding these constraints keeps your data MORE organized, and easier/quicker/cheaper to turn into valuable information -- Here are some more subtle ways to organize your data -- these are critically important -- and sometimes more theoretically presented as "normalizing" your data -- ** Every record should have a unique identifier -- this is called "entity integrity" in database theory -- it enables us to uniquely find each record individually -- for example, "the record with productID=7" -- at most, there is one record with 7 as the productID -- if two records have that productID, I can't distinguish them -- ** the meaning of a column should not change across rows -- in other words, values in the ProductName column are always ProductNames -- there are never any "dual-purpose" or "multi-purpose" columns -- never "sometimes it means this, other times it means something else" -- ** values in a row always pertain to that row -- in other words, every value in the ProductID=1 record is about that product -- another way, we never store anything in a record that is not specifically about that record -- an example violation: storing the name of a supplier in a product record -- the name of a supplier is an attribute of a Supplier, -- and should be stored in a Supplier table in a column named SupplierName -- ** only one value in a cell -- don't have a list of values with commas between them -- don't have "Douglas M Kline" in a single column, -- rather have "Douglas", "M", and "Kline" in separate columns -- In Summary -- well-structured data has -- good names -- data types chosen well -- valid values enforced -- a unique, non-NULL identifier -- each column has a single meaning -- each row represents a single thing -- each cell has a single value -- thanks for watching -- Database by Doug -- Douglas Kline -- 7/30/2018 -- Good Tables : Structuring your Data
Monday, July 30, 2018
Structuring Data in Tables
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment