Modern research in the organizational and many other social sciences has become dominated by model testing. Models specify an order of events that unfold over time. In my field models note how a combination of working conditions combine to affect employee emotions that lead to burnout. Complex and esoteric structural equation modeling (SEM) analyses are used to test such models that fill our academic journals. The methodologies used for model testing seem sophisticated and more advanced than simple comparison of conditions that we see with intervention studies. However, the weak inference of model testing limits their usefulness, especially for practitioners.

## The Strong Inference of Experimentation

The basic experimental design is a simple device that allows strong inference. In its best form, it involves randomly assigning a group of subjects, be they humans or some other entity, to one of two conditions. In organizations an intervention study represents this approach. We might randomly assign 50 employees to an intervention condition in which they are trained or a control condition (or placebo condition) in which they are not. If our training, for example, is designed to improve sales skill, we would expect that the trained employees would sell more product after training than the untrained employees. To determine that our training was effective, we would compare sales volume between the two groups, using a t-test to determine if observed differences were statistically significant. If we find statistical significance in the direction hoped (more sales after training) we would make an inductive inference that our training is effective, and expect that if we train other employees, their sales will also increase. Of course, there are no guarantees, as the results we achieved might have been a fluke (Type 1 error), which can be ruled out by replicating the study (train the control group to see if sales increase). It also might have been due to something other than our intervention (the study was contaminated in some way). Nevertheless, positive results would lead to a reasonable inference that training is effective, an insight that is highly useful to academics and practitioners alike.

## The Weak Inference of Model Testing

A structural model test involves collecting data on a group of individuals and seeing if the result is consistent with a predetermined pattern of inter-relationships. This approach of comparing a set of inter-relationships to an expected pattern is based on the logic of deductive inference. This means that one reaches or deduces a logical conclusion based on assumptions that are not tested. For example, here are two assumptions leading to a deductive inference.

**Assumption 1**: All men own hats**Assumption 2**: Bob is a man**Conclusion**: Bob owns a hat

Note that if Assumptions 1 and 2 are true, logically 3 must be true. If every man in existence owns a hat and Bob is a man, then Bob must own a hat. However, the inference cannot work in reverse. Finding that Bob owns a hat does not tell you that the assumptions are correct. It is possible that Bob owns a hat even if one or both assumptions are false. If Bob is a man and owns a hat, it doesn’t mean all men own hats. If someone of unknown gender owns a hat, you could not logically conclude that the person is a man.

Model testing is based on the same deductive logic. We start with a model which serves as our initial assumption. We then collect data to see if the data behave as the model would predict. That is, we test a proposition of the form

- If my model is correct, the data will fit the expected pattern

Finding the expected pattern is support for the model. However, it is based on the assumption that the model is correct in the first place. If the only evidence that the model is correct is whether the data fit the expected pattern, there is a circularity in logic. It would be analogous to the hat example in which we assume all men have hats and then conclude that we are correct because Bob is a man who owns a hat. That one man owns a hat is hardly strong evidence that our assumption about all men is correct. Bob might own a hat even though few other men also own hats. We would need far more evidence than Bob’s hat ownership to reach a definitive conclusion.

Returning to modeling, if we find that our data fit a particular model it is like finding that Bob owns a hat. The results of our study are consistent with the model, but it is hardly convincing. There are many potential models that can produce the same study results (the model equivalence problem), and in the typical modeling study, these other models cannot be ruled out. Merely knowing that our findings are consistent with one model does not tell us anything about all the other possible models. Similarly, knowing that Bob owns a hat tells us nothing about other men.

## What Can We Conclude from a Model Test?

If a study finds that the results are consistent with a proposed model, what we can conclude is that the model is possibly correct, but we have no way of knowing how likely without additional information. To be able to make a stronger case for our model requires that we have more evidence than just the statistical model test. The best evidence would come from a simple experiment that enables us to make a much stronger inference. We might design an intervention study, for example, to see if our model can predict the outcome.

Unfortunately, it is common for models to be developed without having sufficient evidence for assumptions beyond the model test itself. Thus, at best we can conclude that the data were consistent with the model, but we cannot be sure why. This can be of some value for academic research as the basis for future studies to test underlying assumptions about what leads to what over time. For practitioners the weak inference of model testing does not lead to certain enough conclusions to make them useful. Result are too preliminary to serve as the basis for actions in field settings. The mismatch between the dominance of model testing in the academic literature and the need for stronger inference by practitioners has contributed to the academic-practice divide in business, IO psychology, and other fields.

Photo by Ann H from Pexels

SUBSCRIBE TO PAUL’S BLOG: Enter your e-mail and click SUBSCRIBE

I’ve been a practitioner of model building. Your comments are right on the mark. Although very different, your comments about weak inference provoked me to think about Platt’s classic paper on strong inference, a paper published many years ago.

J. R. Platt, Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science 146, 347–353 (1964). doi: 10.1126/science.146.3642.347; pmid: 17739513

Thanks Irvin:

I’ll check out Platt’s paper.

Hi Paul. Your post reminded me of this 2018 article by Klaus Fiedler and co-authors:

Fiedler, K., Harris, C., & Schott, M. (2018). Unwarranted inferences from statistical mediation tests–An analysis of articles published in 2015. Journal of Experimental Social Psychology, 75, 95-102. https://www.sciencedirect.com/science/article/abs/pii/S0022103117300628

It is required reading in a masters level stats course I teach for my local psychology department. As you know, mediation analysis has become very common in some areas of psychological research. But unfortunately, many users of mediation analysis seem to believe that observation of a statistically significant indirect effect of X on Y through M *proves* that X causes M and M causes Y. Ironically, if I showed these same individuals a scatter-plot including only 2 variables (X and Y, or M and Y), most of them would shout me down if I claimed that correlation implies causation. It’s a strange world! 😉

Hi Bruce:

I couldn’t agree more.