Thursday, January 28, 2016

Obese Data

big data (noun) (Merriam-Webster)
an accumulation of data that is too large and complex for processing by traditional database management tools
obese data (noun)
an accumulation of data that is much larger and more complex than necessary
I made the second one up. After seeing lots of examples of obese databases.

In short, obese databases' effective size is much smaller than their actual size, e.g., a database that consumes 1TB of space, but has only 1GB of data. Further, the schema / architecture / model / framework / product / system / process is more complex than necessary.

Databases that have these problems:
  • they are slow
  • they could be a fraction of the size that they are 
  • their growth rate is higher than it should be (GBs/day rather than Kbs/day)
  • their operations (backup, batch load, maintenance) take much longer than necessary
  • they are inflexible, e.g., not easily adaptable to solve a new problem, write a new report, add a new feature, etc.
All these problems above are leading people to think that they have big data, when they really have obese data. 

Obese data might be viewed as a "database anti-pattern".

Things that contribute to obese data in the context of databases:
  • data types that are larger than they need to be
    • bigint that could be int
    • nchar that could be char
    • guid that could be int
    • datetime2 that could be date
    • char that could be varchar
    • char(50) that could be char(2)
  • unused columns
  • duplicate indexes
  • indexes that are never used
  • full-factorial indexing :-)
    • an index for every single column in the table
    • an index for every two-column combination in the table
    • etc.
  • data that should be archived
  • leftover staging data
  • redundant data
  • large numbers of NULL values
  • inappropriate schema
    • not "normalized" enough
    • too "normalized"
  • missing or inappropriate maintenance operations
    • hourly dbcc dbreindex to fillfactor 100% on a write-heavy table
    • fillfactor 10% on read-heavy table

No comments:

Post a Comment

Followers