an accumulation of data that is too large and complex for processing by traditional database management tools
obese data (noun)
an accumulation of data that is much larger and more complex than necessary
I made the second one up. After seeing lots of examples of obese databases.
In short, obese databases' effective size is much smaller than their actual size, e.g., a database that consumes 1TB of space, but has only 1GB of data. Further, the schema / architecture / model / framework / product / system / process is more complex than necessary.
Databases that have these problems:
they are slow
they could be a fraction of the size that they are
their growth rate is higher than it should be (GBs/day rather than Kbs/day)
their operations (backup, batch load, maintenance) take much longer than necessary
they are inflexible, e.g., not easily adaptable to solve a new problem, write a new report, add a new feature, etc.
All these problems above are leading people to think that they have big data, when they really have obese data.
Obese data might be viewed as a "database anti-pattern".
Things that contribute to obese data in the context of databases:
data types that are larger than they need to be
bigint that could be int
nchar that could be char
guid that could be int
datetime2 that could be date
char that could be varchar
char(50) that could be char(2)
unused columns
duplicate indexes
indexes that are never used
full-factorial indexing :-)
an index for every single column in the table
an index for every two-column combination in the table
etc.
data that should be archived
leftover staging data
redundant data
large numbers of NULL values
inappropriate schema
not "normalized" enough
too "normalized"
missing or inappropriate maintenance operations
hourly dbcc dbreindex to fillfactor 100% on a write-heavy table
A short video covering the IN operator in the WHERE clause within a SQL SELECT statement.
For beginners. Simple WHERE clauses using IN.
Here is the SQL that goes with the demo.
-- Database by Doug
-- Douglas Kline
-- 1/20/2016
-- Using the IN operator
USE northwind
-- refresher
-- here's a simple SELECT statement with a simple WHERE clause
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE UnitPrice > 10
ORDER BY ProductID
-- recall that the portion in the WHERE clause is a logical expression
-- the logical expression is evaluated for each record
-- records that evaluate (Unitprice > 10) as True (1) are included
-- records that evaluate (Unitprice > 10) as False (0) are not included
-- here's a simple IN example
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE CategoryID IN (2, 4)
ORDER BY ProductID
-- all records that have a CategoryID of either 2 or 4 are included in the results
-- so the above could be equivalently rephrased as
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE CategoryID = 2
OR CategoryID = 4
ORDER BY ProductID
-- but this can be inconvenient and verbose if there are lots of desired values, like this:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE CategoryID = 2
OR CategoryID = 4
OR CategoryID = 5
OR CategoryID = 8
OR CategoryID = 67
OR CategoryID = 101
ORDER BY ProductID
-- which would be more succinctly phrased as
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE CategoryID IN (2, 4, 5, 8, 67, 101)
ORDER BY ProductID
-- here's another example that can lead to bugs:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE CategoryID = 2
OR CategoryID = 4
OR CategoryID = 5
OR CategoryID = 8
OR CategoryID = 67
OR CategoryID = 101
AND UnitPrice > 10
ORDER BY ProductID
-- note that the record with ProductID 13, 'Konbu' has a unitprice < 6
-- since the ORs and ANDs are evaluated left-to-right, the Unitprice > 10 has no effect
-- so parentheses are needed to control the order of operations, like this:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE (
CategoryID = 2
OR CategoryID = 4
OR CategoryID = 5
OR CategoryID = 8
OR CategoryID = 67
OR CategoryID = 101
)
AND UnitPrice > 10
ORDER BY ProductID
-- this is MUCH clearer, and less conducive to bugs, phrased with IN
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE CategoryID IN (2, 4, 5, 8, 67, 101)
AND UnitPrice > 10
ORDER BY ProductID
-- here's another way that IN is very convenient
-- back to the long example of ORs
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE CategoryID = 2
OR CategoryID = 4
OR CategoryID = 5
OR CategoryID = 8
OR CategoryID = 67
OR CategoryID = 101
ORDER BY ProductID
-- what if we want CategoryIDs NOT equal to 2, 4, 5, etc
-- this doesn't work:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE NOT
CategoryID = 2
OR CategoryID = 4
OR CategoryID = 5
OR CategoryID = 8
OR CategoryID = 67
OR CategoryID = 101
ORDER BY ProductID
-- note the categoryID = 8 for productID=10
-- so we would need parentheses like this:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE NOT
(
CategoryID = 2
OR CategoryID = 4
OR CategoryID = 5
OR CategoryID = 8
OR CategoryID = 67
OR CategoryID = 101
)
ORDER BY ProductID
-- or more verbosely, but without parentheses
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE NOT CategoryID = 2
AND NOT CategoryID = 4
AND NOT CategoryID = 5
AND NOT CategoryID = 8
AND NOT CategoryID = 67
AND NOT CategoryID = 101
ORDER BY ProductID
-- so, again, this is much clearer and less prone to bugs phrase with IN:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE CategoryID NOT IN (2, 4, 5, 8, 67, 101)
ORDER BY ProductID
-- or equivalently
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE NOT CategoryID IN (2, 4, 5, 8, 67, 101)
ORDER BY ProductID
-- technically, 'IN' is comparison operator, like '=' or '<'
-- a phrase like '7 > 8' compares the value on the left with the value on the right
-- and returns a true or false
-- a phrase like '7 IN (4, 5, 8) compares the value on the left with
-- the list of values on the right, and returns a true or false
-- IN can be used with other data types:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE ProductName IN ('Chai', 'Tofu', 'Alice Mutton')
ORDER BY ProductID
-- for character based items, the above works,
-- but you can't combine LIKE and IN, by including LIKE wildcards:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE ProductName IN ('Chai%', 'Tofu', 'Alice Mutton')
ORDER BY ProductID
-- the above doesn't work - % is interpreted as a regular character, not a wildcard for matching
-- here's one using dates
SELECT *
FROM Employees
WHERE BirthDate IN ('19481208', '19520219')
-- another very cool feature of the IN operator,
-- is the ability to dynamically create the list of values
-- with a SELECT statement
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE ProductName IN ( SELECT ProductName
FROM Products
WHERE unitprice > 10)
ORDER BY ProductID
-- of course, this is a very simple example,
-- and it would be much more clearer to phrase it like this:
SELECT ProductID,
CategoryID,
SupplierID,
ProductName,
UnitPrice
FROM Products
WHERE unitprice > 10
ORDER BY ProductID
-- but the idea is very powerful
-- and allows for greater flexibility for phrasing your SQL
-- in some cases, a sub query with IN is the clearest way to state what you intend
-- in summary
-- IN is a logical operator - it returns True/False (1/0)
-- IN is sometimes used to replace a series of logical expressions connected with ORs
-- IN operates on a reference value and a list of values
-- IN returns True(1) if the reference value is *in* the list of values
-- IN - the list of values can be the result of a SELECT statement
-- Database by Doug
-- Douglas Kline
-- 1/20/2016
-- Using the IN operator