-- Database by Doug -- Douglas Kline -- 7/30/2018 -- Terminology : Tables, Columns, Rows use Northwind -- no pre-requisites -- a very good way to organize data is in a table -- why? -- mainly because humans find tables fairly easy to understand -- what they should look like, how they should behave, how to look things up, etc. -- for example, lets say I show somebody this table: SELECT * FROM ( VALUES ('Doug','2'), ('Dan', '3'), ('Debbie', '1'), ('2','Joe')) AS tbl([Name], [Number]) -- Most people will think that is weird -- why is there a number in the Name column? -- we know this innately, without knowing any special theory -- we like it this way, because it keeps things organized -- organized data is easy to get value out of -- so let's go over some terminology, looking at some real table data SELECT ProductID, ProductName, Unitprice, UnitsInstock, QuantityPerUnit, Discontinued FROM Products -- **highlighting entire table -- the entire table is also sometimes called: -- an entity, a table, a file, an object -- entity - usually used by data modelers/architects -- table - usually used by database analysts -- file - usually used by database administrators (especially of old databases, where each table was stored as a file) -- object - usually used by software developers (especially object-oriented developers, where an object maps to a table) -- **highlighting a row -- one row in a table also has several names: -- row, record, entity instance, object instance -- row - used by most people, in other words its just a part of a table that's horizontal -- record - usually used by database analysts, this is special terminology for db people -- entity instance - usually used by data modelers/architects -- object instance - usually used by software developers -- **highlighting a column -- one column in a table has several names: -- column, attribute, field, property -- column - used by most people, in other words its just a part of a table that's vertical -- attribute- or entity attribute, used by data modelers, some value of interest about an entity -- field - used by database analysts, special terminology for db people -- property - or object property, used by OO software developers, object properties usually map to table fields -- ** highlighting a cell -- one cell, an intersection of a row and column has several names: -- cell, value, field, field value, attribute, attribute value, property, property value, etc. -- careful... -- sometimes attribute means an entire column, e.g., Person table has an attribute named firstName -- sometimes attribute means a single cell, e.g., Doug's firstName attribute is Doug -- thanks for watching -- Database by Doug -- Douglas Kline -- 7/30/2018 -- Terminology : Tables, Columns, Rows
Monday, July 30, 2018
Tables and their parts
Structuring Data in Tables
-- Database by Doug -- Douglas Kline -- 7/30/2018 -- Good Tables : Structuring your Data -- here's an odd example SELECT * FROM ( VALUES ('Doug','2'), ('Dan', '3'), ('Debbie', '1'), ('2','Joe')) AS tbl([Name], [Number]) -- this probably makes you a little uncomfortable -- it might bother some more than others -- in Relational Databases, like SQL Server (or DB2 or Oracle or PostgreSQL or mySQL or...) -- we can easily enforce organization on table -- in this example, you shouldn't be able to put a number in the name column SELECT ProductID, ProductName, UnitPrice, UnitsInStock, QuantityPerUnit, Discontinued FROM Products -- here are some fundamental organization mechanisms built-in to an RDBMS -- every table must have a name -- every column must have a name -- every table is a rectangle -- -- every column has the same number of rows -- -- every row has the same number of columns -- -- there are never an "ragged" rows or columns -- every table has at least one column -- (but tables can have zero rows - an empty table) -- just those simple things immediately keep the data organized -- you don't have to do anything else -- you don't HAVE to organize your data any further -- the database won't AUTOMATICALLY organize your data any further -- as an example, -- you don't HAVE TO name your columns in a meaningful way -- for example, we could do this: SELECT ProductID AS [A], ProductName AS [B], UnitPrice AS [C], UnitsInStock AS [D], QuantityPerUnit AS [E], Discontinued AS [F] FROM Products -- of course it goes without saying, but I need to say it -- a clear, descriptive, concise name is critical -- for your own sanity and for the sanity of those that come after... -- there are very good benefits to keeping your data MORE organized than the bar minimum SELECT ProductID, ProductName, UnitPrice, UnitsInStock, QuantityPerUnit, Discontinued FROM Products -- if we want to go further, here are the type of things you CAN enforce -- a value in a cell should be filled in (NOT NULL), or can be left unknown -- a value in a cell should be a specified data type (numeric, alphabetic, date, etc.) -- a value in a cell should be in a certain range (more than zero, not include special characters, etc.) -- values in a column must be unique (no duplicates across rows) -- a value in a cell should be from a list somewhere (predefined or dynamic) -- other things -- adding these constraints keeps your data MORE organized, and easier/quicker/cheaper to turn into valuable information -- Here are some more subtle ways to organize your data -- these are critically important -- and sometimes more theoretically presented as "normalizing" your data -- ** Every record should have a unique identifier -- this is called "entity integrity" in database theory -- it enables us to uniquely find each record individually -- for example, "the record with productID=7" -- at most, there is one record with 7 as the productID -- if two records have that productID, I can't distinguish them -- ** the meaning of a column should not change across rows -- in other words, values in the ProductName column are always ProductNames -- there are never any "dual-purpose" or "multi-purpose" columns -- never "sometimes it means this, other times it means something else" -- ** values in a row always pertain to that row -- in other words, every value in the ProductID=1 record is about that product -- another way, we never store anything in a record that is not specifically about that record -- an example violation: storing the name of a supplier in a product record -- the name of a supplier is an attribute of a Supplier, -- and should be stored in a Supplier table in a column named SupplierName -- ** only one value in a cell -- don't have a list of values with commas between them -- don't have "Douglas M Kline" in a single column, -- rather have "Douglas", "M", and "Kline" in separate columns -- In Summary -- well-structured data has -- good names -- data types chosen well -- valid values enforced -- a unique, non-NULL identifier -- each column has a single meaning -- each row represents a single thing -- each cell has a single value -- thanks for watching -- Database by Doug -- Douglas Kline -- 7/30/2018 -- Good Tables : Structuring your Data
Subscribe to:
Posts (Atom)