Of the increasing number of books I have read in relation to statistics (yes this is worrying me!), few have changed the way I think about and report studies in the way that ‘Understanding The New Statistics’ has. Within this highly recommended read, authored by Geoff Cumming (2012), I was particularly drawn to his argument for an estimation manner of thinking about statistics.
Estimation thinking within statistics focuses on the size of an relationship between X and Y or the size of a difference between groups. This school of thought asks such questions as ‘to what extent?’ or ‘how much?’; whereas the popular, dichotomous way of thinking employed in null-hypothesis testing (NHST) merely tells us whether an effect exists or not, and thus asks only one simple question, namely, ‘is there an effect?’.
NHST is the staple of undergrad statistics classes..even if it is apparent that many lectures aren’t able to correctly describe NHST as ‘the probability of finding a said effect if the null hypothesis were true’ (Haller & Kraus, 2002). But NHST suffers from a number of limitations…
- NHST doesn’t tell us whether an effect is important or not. For example a small effect can be significant in a large sample (although I’m not saying small effects can have no practical importance). For magnitude of effects we should look to the effect sizes advocated within estimation thinking.
- Just because a result is not significant it doesn’t mean we can accept the null hypothesis i.e. that there is no effect in our study, or , more precisely, in the population that our study’s sample represents. a significant value (ie. p < .05 by conventional standards). NHST also doesn’t tell us that the null is false, just highly unlikely.
- The actual techniques we see when reporting NHST (e.g. a cut off of p < .05 ) have little evidential basis. Fisher’s original approach was to present a range of p values, from values of p that suggested strong evidence for an effect (p < .01) and evidence that could be considered weak (p >.20). In comparison, Neyman and Pearson’s approach to significance testing, was to set a pre-specified alpha level and choose whether or not to reject the null hypothesis and accept the alternate hypothesis. Realistically, the NHST we often see in textbooks and taught in classrooms is an ‘ugly’ mish-mash of these two approaches toward significance testing.
- NHST is also severely limited in the inferences it gives us about our population as a whole i.e. what our study sample is trying to represent. In contrast, the confidence intervals commensurate with estimation thinking offer a range of plausible values that the population effect size could be, as well as indicating the precision of our estimate in relation to what one could expect to find in our study’s population.
So why the popularity of NHST? Well it may be that we simply crave the reassurance of all or nothing thinking (cf. Dawkins, 2004); indeed one sees evidence of dichotomous thinking in the media every day…people are described as introverted or extroverted, racist or not racist, sexist or not sexist….and so on. Yet psychologists will tell you that behaviours generally occur on a continuum; likewise statisticians such as Cummings (2012) suggest that the best way for us to think about and explain our research findings is to account for the continuum of what the results actually tell us.
In Cummings’ (2012) book he argues that we should move towards an estimation approach to statistical inference, broadening the way that we, and just as importantly, our readers, think about statistics. Estimation thinking focuses on the best point estimate of the parameter we are interested in, which is represented by the use of effect sizes, which give some indication of the magnitude of our results and allows us to easily compare across studies. For example by calculating Cohen’s d, along with means and standard deviations, we can collaborate with other researchers to compare studies and also bring studies together to better inform our scientific theories (in the form of meta analysis). As previously mentioned the confidence intervals around this effect size can also tell us more about what we would expect to see in replications of our study, and thus inform us about the parameter measured in our population as a whole.
I hope you can see that by promoting estimation thinking through the use of effect sizes and confidence intervals not only can one gain greater inference from their own data, but they can help other researchers to interpret and build on their data and together build stronger theories, as opposed to merely proposing that X differs from Y, or X predicts Y. Surely this move away from mere ‘whether or not’ NHST thinking can only benefit researchers and, in turn, end users alike.
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York,
NY: Routledge.
Dawkins, R. (2004). A devil’s chaplain: Reflections on hope, lies, science, and love. New York,
NY: Houghton Mifflin Harcourt.
Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers. Methods of Psychological Research, 7(1), 1-20.