Multiverse Analysis Can Increase Research Integrity

Photograph of a galaxy

A couple of months ago I posted about questionable research practices on LinkedIn and Lorne Hartman asked me what I thought of multiverse analysis. Of course, I’d never heard of it so I had to look it up before commenting that it looked interesting and potentially useful. Lorne shared some materials and it wasn’t long before I realized that multiverse analysis can increase research integrity. It was something I was eager to try.

What Is Multiverse Analysis?

When we conduct data analysis, even on a fairly simple dataset, there are several choice points that have two or more reasonable options: should you remove outliers, should you impute missing values if someone doesn’t answer a question, should you add a control variable? Researchers generally pick an option and give little thought to them after that. With a multiverse analysis, the researcher chooses all the reasonable options and runs the analysis in all possible combinations. This results in dozens of trials with the same data analyzed in somewhat different ways. You can then see whether results are consistent across the different options, or if they vary. This gives us a sense of how reliable a phenomenon is. If you try 50 different options, the number of times you get similar results indicates how confident you should be in the results. It is a way to apply an internal replication process to examine consistency of results. If only a few cases are statistically significant, you would have low confidence that your results are real and not just a statistical fluke.

Multiverse Analysis Can Increase Research Integrity

The integrity of research in industrial-organizational psychology, management and likely other disciplines has been undermined by questionable research practices like p-hacking. P-hacking means reanalyzing data in different ways until you achieve the statistical significance that you want. If you try enough things, it is very likely that one of them will give you what you want, although all you found is a statistical fluke that cannot be replicated. P-hacking is like multiverse analysis in that you reanalyzed data multiple ways. Where it differs is that with p-hacking you only report the options that gave the best results and ignore the rest. In fact p-hackers do not even mention the other things they tried.

Multiverse analysis can increase research integrity because the researcher reports all of the things that were tried–there is total transparency. If you need to try 40 things to find statistical significance, all 40 are mentioned, and the results across all 40 are summarized.

My First Attempt

Intrigued with multiverse analysis, I discussed using it with my colleague Mukhunth Raghavan. We planned some features to vary in an ongoing survey project and we were off and running. The features we tried included:

  • Data quality check. Did the respondents provide patterns suggesting carelessness?
  • Dealing with missing data. There are a variety of ways to handle when someone leaves an item blank.
  • Group membership. There are different ways to put people into groups based on responses to a question.
  • Data dependency. There are different ways to handle people from the same department.

In all we had nearly 50 combinations. We were lucky in that results were consistent. Some things were statistically significant in all cases, others were not significant in all (or nearly all) cases. This makes interpretation easy. Of course, some of the features we varied affect the strength of relationships, and those are interesting in their own right.

Consider Using Multiverse Analysis?

Researchers should be free to use whatever reasonable approaches they wish as long as they use them properly, and p-hacking is not using statistical analysis properly. I would hate for journals to decide that every paper now has to use multiverse analysis. There should be room for papers that use a variety of methods, as different methods converging on the same conclusion gives us confidence that what we believe is sound and not limited to a single flawed method (and all methods are flawed in some way).

On the other hand, p-hacking has become so widespread, that it would be good if at least some of us used multiverse analysis, at least some of the time to give an indication of how reliable and robust findings might or might not be. As multiverse analyses become more widespread, we will gain a better understanding of how some common choice points change or do not change results. We will know which to pay attention to and which to ignore. We will know which phenomena are sensitive to these choices and which are robust. It will enable us to put p-hacked results in the literature into context. Multiverse analysis can increase research integrity if used properly, and that’s a good reason for it to be one of many tools in our researcher’s tool kit.

Image from Pexels.com

SUBSCRIBE TO PAUL’S BLOG: Enter your e-mail and click SUBSCRIBE

Join 1,334 other subscribers

Leave a Reply