Like many professionals, my job doesn’t require expertise in data or analytics. I’m a writer and editor, so I deal with words, not numbers. Still, nearly every knowledge worker today needs to be a regular consumer of data analysis. For example, I need to understand whether and why articles on having a mid-career crisis outperformed ones on receiving feedback or why pieces with particular headlines get more traffic than others.
I also need to be able to read research on the topics I cover and understand whether the findings in those studies are valid and generalizable, and be able to articulate the findings — and their limitations — to you, our readers.
To do all of this, I need a more-than-basic understanding of data analytics. And while the statistics course I took in graduate school was helpful, it didn’t fully equip me to grasp the important concepts and have the conversations I need to around data analysis.
Fortunately, I had the opportunity to talk with some of the best experts in the field — Tom Redman, author of Data Driven: Profiting from Your Most Important Business Asset, and Kaiser Fung, who founded the applied analytics program at Columbia University — about several critical topics when it comes to data analysis. Here are four refreshers from our archives on data analytics concepts that every manager should understand.
Randomized controlled experiments
One of the first steps in any analysis is data gathering. This often happens via a spectrum of experiments that companies do — from quick, informal surveys, to pilot studies, field experiments, and lab research. One of the more structured types is the randomized controlled experiment. Many people, when they hear this term, immediately think of costly clinical trials but randomized controlled experiments don’t have to be costly or time consuming and they can be used to gather data on things like whether a particular customer service intervention improved customer retention or whether a new, more expensive piece of equipment is more effective than a less costly one. In this refresher, Tom Redman helps me understand what it means for a test to be “controlled” and how you make sure it includes an element of “randomization.” The article also addresses questions like: What are dependent and independent variables? And what are the steps to designing and conducting one of these experiments?
One of the more common experiments companies use these days is the A/B test (which is a type of randomized controlled experiment). At their most basic, these tests are a way to compare two versions of something to figure out which performs better. Companies use it to answer questions like, “What is most likely to make people click? Or buy our product? Or register with our site?” A/B testing is used to evaluate everything from website design to online offers to headlines to product descriptions. It’s critical to understand how to interpret the results and to avoid common mistakes, like ending the experiment too soon before you have valid results or trying to look at a dashboard of metrics when you really should be focusing on a few. You can learn more about A/B tests here.
Once you have the data, regression analysis helps you make sense of it. Of course, there are many ways to analyze the data, but linear regression is one of the most important. It’s a way of mathematically sorting out whether there’s a relationship between two or more variables. For example, if you are in the business of selling umbrellas, you might want to know how many more items you sell on rainy days. Regression analysis can help you determine whether and how inches of rain impacts sales. It answers the questions: Which factors matter most? Which can we ignore? How do those factors interact with each other? And, perhaps most importantly, how certain are we about all of these factors?
Fortunately, regression is not something you typically do on your own. There are statistics programs for that! But it’s still important to understand the math behind it and the types of mistakes to avoid. In this refresher, I explain how regression works and share a common — but often misunderstood — warning against confusing correlation with causation.
Once you’ve done the analysis, you need to figure out what your results mean, if anything. This is where statistical significance comes in. This is a concept that is also often misunderstood and misused. And yet because more and more companies are relying on data to make critical business decisions, it’s an essential concept to understand. Statistical significance helps you quantify whether a result from an experiment is likely due to chance or from the factors you were measuring.
This is a concept I sometimes struggled to fully understand myself but, fortunately, the average professional doesn’t need to understand it too deeply. According to Tom Redman, who helped out with this refresher, it’s more important to understand how to not misuse it.
While you’re boning up on these four concepts, it would also be helpful to read this overview on quantitative analysis from my colleague, Walt Frick. It is a nice primer on why data matters, picking the right metrics, and asking the right questions from data. There’s also a great chart on correlation vs. causation to help you make decisions about when to act on analysis and when not to.
Lastly, if you’re interested in analytics because you need to consume social science research, I highly recommend this piece from Eva Vivalt, a research fellow and lecturer at the Australian National University. She gives several tips for determining whether the evidence from a study should be trusted.
Data analytics is ultimately about making good decisions. It doesn’t matter what business you are in or what your role is at your company, we all want to — need to, really — make smart, informed, evidence-based decisions.
Amy Gallo, Harvard Business Review