It’s said many different ways, by many different people.
- Data Analytics
- Data Science
- Business Intelligence
To many people, this is just a new way to say some old, broad, techniques such as:
- Data Analysis
- Statistics
- Science
- Management Science (business)
- Operations Research (business)
- Artificial Intelligence (computer science)
- Machine Learning (computer science)
- Technical Analysis (finance)
- Healthcare Analytics (healthcare)
- Web Analytics (marketing)
- Statistics (mathematics)
- Econometrics (economics)
So I’ve been thinking lately, how is data analytics different from all these past things? Is it a superset of all these things? Is it a mash-up of these things? Is it different to different people?
I have a specific perspective on this, so my thinking may not fit for everyone. I’m an Information Systems professor. My undergraduate degree was math and computer science, I have an MBA, and my doctorate degree is in business, majoring in Operations Research and minoring in Information Systems. My corporate experience is in software development. Now I’m a professor in a business school, but I continue to consult on database performance and storage.
I have what I’d call traditional training in statistics. I’ve taken the math courses behind statistical models, calculated eigenvalues, and derived maximum likelihood estimators. Then I took the management science and operations research courses in optimization, and applied techniques to things like quality control, queuing, facility location, scheduling, etc.
For several years, I’ve been thinking that analytics is just a re-branding of all this stuff I already know, dreamed up by marketing geniuses at a company like SAS or EMC.
I couldn’t find anything that really disputed that hypothesis. But I had this sneaking suspicion that there was something else there.
I think differently now. I’ve dug into it. I went to a course taught by a professional data scientist, consulting regularly for global corporations. I saw how he was applying it. I saw how he approached things. I worked with the modern software tools: R, SQL Server, SQL Server Analysis Services, Azure Machine Learning. I came back and applied the tools in my own academic and consulting work.
Here’s my conclusion: modern analytics is different. And for good reason. Traditional methods were right for their era. Analytics is right for this era.
However, analytics is not different in trivial, obvious, or superficial ways.
Here’s what’s different :
• Cost – data and computation are cheaper now
• Timing – we want continuous answers
• Context – we want analytics integrated into systems
• Paradigm – we want to explore data rather than confirm hypotheses
All of these differences change the approach, the methodology, the practices. It changes the relevant techniques. It takes widely applicable, tried-and-true, bread-and-butter techniques and makes them arcane techniques for special cases.
I think the hardest part for traditionalists approaching analytics, is why some of the practices that were drilled into us are no longer appropriate. Mathematically, everything I learned is still true and correct, just much less relevant.
For example, hypothesis testing, estimation, confidence intervals, etc. These topics were foundational material in my training, and heavily used techniques in my toolbox. But the main motivation behind their development and use was lack of data. We were “controlling” error with these methods, and trying to make sure we weren’t trying to generalize too much based on small samples. We had rules of thumb like "at least 30 data points for ...".
"Not enough data" is typically not a problem today. In recent years, I’ve found my work involves so much data that p-values are incredibly small by traditional standards. I use techniques to reduce the amount of data. I'm not worried about getting at least 30 points, I'm worried about getting the data file from here to there because it's so big. In other words, I have so much data, that I don’t worry about “controlling error” anymore, so hypothesis testing is not as relevant.
The next few articles will explore these differences. I hope that this will help older, more traditionally trained analysts understand modern analytics better. I also hope that this will help younger, more modern analysts understand the traditional methods better.
Next articles:
- Why Data Analytics is Different, Part 1: Introduction
- Why Data Analytics is Different, Part 2: Cost
- Why Data Analytics is Different, Part 3: Timing
- Why Data Analytics is Different, Part 4: Context
- Why Data Analytics is Different, Part 5: Paradigm
No comments:
Post a Comment