Let the data talk … but not too much.
We live in a world of data. We also live in a world of inexpensive computer processing. The combination of these two resources means we also live in a world of poor quality statistical analytics.
As an economist, it is tempting to believe that there are fundamental truths to be uncovered through econometric modeling and that we can help businesses work smarter, faster, and more profitably through deep and thorough regression modeling. Undoubtedly, econometric modeling can yield incredibly valuable insights, particularly in areas such as price elasticity. But we should also realize that some questions are not well suited to econometric modeling. Sometimes, other tools and tactics are better at identifying relationships and predicting consumer behavior.
The ability to run hundreds, thousands or millions of regressions using enormous data sets makes it easy to find the model that says what you want to hear. When many of us were learning statistics, we had to estimate regressions using precious mainframe time and walk across campus to get the results. In that environment, we thought a long time about each regression. When the marginal cost of running another regression is zero, more spurious regressions are run and the average quality goes down.
Best Measure of Model Performance is Predictive Power
Typically, a model is fit to a subset of the data, often called the training data, and applied to data outside of the training data set. Models designed to optimize the fit statistics on the training data may not predict well because these models are too “trained” on the noise in the data and thus miss the “signal.” Generalizing the models outside of the range of data used to develop them will produce poor predictions.
In applied economics, we most often use data that is collected from operating systems, and these data are usually not ideal for analytics. They can be missing observations, be censored or truncated or be at a level of granularity not ideal for the model. We may have a lot of data, but it may not be the “right” data. This is where the temptation to try alternative models until the desired answer is received captures many analysts. They do not want to accept that they cannot measure what they want from the data at hand.
This is Where Less Analytics is More
Other tools are available to analysts, such as A/B testing, that are particularly valuable in a digital context in which thousands of customer transactions occur daily. Analysts can form hypotheses about which factors are important for predicting customer behavior and test them. We also can test seemingly insignificant factors, such as the order in which different product options are presented or the color of the “buy now” button, on purchase decisions.
Books have been written about the surprising, but predictable, effect that changing the language, the options or the sequence of choices has on customer decisions. Behavioral economics is the field that explores the effect of cognitive, social, psychological and emotional factors on decision-making. These factors are often hard to quantify in historical transaction data, but their effects can be identified and measured through careful testing, often in real time. These “soft” factors also can be more important than the price of the product in predicting sales, particularly digital sales.
Recognizing the limits of statistical regression analysis does not mean it is not a powerful toolset. Rather, knowing how to combine econometrics with A/B testing and other types of analysis is the best way to improve business performance. When these individual tools are used together, as Aristotle said, the whole is greater than the sum of its parts.