Why always p? Moving from dichotomous thinking to estimation thinking in statistics

Of the increasing number of books I have read in relation to statistics (yes this is worrying me!), few have changed the way I think about and report studies in the way that ‘Understanding The New Statistics’ has.  Within this highly recommended read, authored by Geoff Cumming (2012), I was particularly drawn to his argument for an estimation manner of thinking about statistics.

Estimation thinking within statistics focuses on the size of an relationship between X and Y or the size of a difference between groups.  This school of thought asks such questions as ‘to what extent?’ or ‘how much?’; whereas the popular, dichotomous way of thinking employed in null-hypothesis testing (NHST) merely tells us whether an effect exists or not, and thus asks only one simple question, namely, ‘is there an effect?’.

NHST is the staple of undergrad statistics classes..even if it is apparent that many lectures aren’t able to correctly describe NHST as ‘the probability of finding a said effect if the null hypothesis were true’ (Haller & Kraus, 2002).  But NHST suffers from a number of limitations…

  1. NHST doesn’t tell us whether an effect is important or not.  For example a small effect can be significant in a large sample (although I’m not saying small effects can have no practical importance).  For magnitude of effects we should look to the effect sizes advocated within estimation thinking.
  2. Just because a result is not significant it doesn’t mean we can accept the null hypothesis i.e. that there is no effect in our study, or , more precisely, in the population that our study’s sample represents.  a significant value (ie. p < .05 by conventional standards).  NHST also doesn’t tell us that the null is false, just highly unlikely.
  3. The actual techniques  we see when reporting NHST (e.g. a cut off of p < .05 ) have little evidential basis.   Fisher’s original approach was to present a range of p values, from values of p that suggested strong evidence for an effect (p < .01) and evidence that could be considered weak (p >.20).  In comparison, Neyman and Pearson’s approach to significance testing, was to set a pre-specified alpha level and choose whether or not to reject the null hypothesis  and accept the alternate hypothesis.  Realistically, the NHST we often see in textbooks and taught in classrooms is an ‘ugly’ mish-mash of these two approaches toward significance testing.
  4. NHST is also severely limited in the inferences it gives us about our population as a whole i.e. what our study sample is trying to represent.  In contrast, the confidence intervals commensurate with estimation thinking offer a range of plausible values that the population effect size could be, as well as indicating the precision of our estimate in relation to what one could expect to find in our study’s population.

p value

So why the popularity of NHST?  Well it may be that we simply crave the reassurance of all or nothing thinking (cf. Dawkins, 2004); indeed one sees evidence of dichotomous thinking in the media every day…people are described as introverted or extroverted, racist or not racist, sexist or not sexist….and so on.  Yet psychologists will tell you that behaviours generally occur on a continuum; likewise statisticians such as Cummings (2012) suggest that the best way for us to think about and explain our research findings is to account for the continuum of what the results actually tell us.

In Cummings’ (2012) book he argues that we should move towards an estimation approach to statistical inference, broadening the way that we, and just as importantly, our readers, think about statistics.  Estimation thinking focuses on the best point estimate of the parameter we are interested in, which is represented by the use of effect sizes, which give some indication of the magnitude of our results and allows us to easily compare across studies.  For example by calculating Cohen’s d, along with means and standard deviations, we can collaborate with other researchers to compare studies and also bring studies together to better inform our scientific theories (in the form of meta analysis).  As previously mentioned the confidence intervals around this effect size can also tell us more about what we would expect to see in replications of our study, and thus inform us about the parameter measured in our population as a whole.

I hope you can see that by promoting estimation thinking through the use of effect sizes and confidence intervals not only can one gain greater inference from their own data, but they can help other researchers to interpret and build on their data and together build stronger theories, as opposed to merely proposing that X differs from Y, or X predicts Y.  Surely this move away from mere ‘whether or not’ NHST thinking can only benefit researchers and, in turn, end users alike.

Fortune teller - medium

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis.   New York,
NY: Routledge.

Dawkins, R. (2004). A devil’s chaplain: Reflections on hope, lies, science, and love.  New York,
NY: Houghton Mifflin Harcourt.

Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers. Methods of Psychological Research, 7(1), 1-20.

Exam Time…..Revision / Essay Writing Tips (with a bit of science!) for Social Science Students

It’s that time of year again when exam fever has struck and some people are feeling a bit lost as to how to go about their revision. Maybe I am out on my own here, but I actually enjoyed accumulating all the new knowledge that comes through a revision session…but after all I have actively chosen to do a PhD, so draw your own conclusion from that! Anyway, whilst my exam days might be behind me for now, I have taken the experiences of myself and others to offer some suggestions on how to get it right, which are hopefully reflected in this post.


Searching for material 

For reasons unbeknown to me, it seems it is assumed that students will know how to conduct an effective literature search.  Even at Masters level, most of us certainly didn’t, and were very grateful to one of my former lecturers, who acknowledged this lack of direction and offered the guidance that I will share here.  As opposed to going through Universities’ own electronic resources, I have found it much more useful to use either Web of Knowledge http://www.webofknowledge.com or Scopus http://www.scopus.com/ for accessing papers.  These programs enable you to view papers by citations, year of publication, journal impact factor etc.  I guess the only caveat is you need to be connected to the University network to use them, but all Universities I have been at offered a VPN service, so you can log onto the network from your personal computer at home.

I had three main ways of using these search tools to obtain appropriate materials for my exam answers:-

1) Conducting a literature search and ranking papers by the most cited.  For example, I could type in a fairly broad term such as ‘Personality AND Job Performance’ and then sort articles by those that had been cited the most…you must take into account that older papers have had more opportunity to be cited, but this will give you an overview as to the influential papers within the area of your exam question.


2)  Conducting a literature search and ranking papers by the most recent.  A decent reading list will probably give you most of the papers you uncover in tip 1, however it is likely that more recent papers may not be included.  Thus you can impress your exam marker by showing you have engaged with recent developments in the literature.


3) Search for papers that have cited those on your reading list.  Most reading lists tend to cover general overviews of the literature such as meta-analyses.  Therefore you can build on the information contained within the reading list.  For example, I can click on the ‘cited by’ link for Barrick and colleagues’ meta-analysis on personality and job performance and it will give me a list of papers that have cited this one.  These papers are often more nuanced and feature critical views and support that will help your essay answer; again these papers can be ranked by recency and number of citations to give you a guide to their relevance.


Top tips for revising and writing answers

  • Plan – Have a detailed, flexible and progressive revision timetable making sure you give the appropriate focus on all topics (not just the ones you like).  If you have a fair idea what question the exam will give you I would recommend planning an exam answer beforehand, I wish I had done this more!
  • Give detailed answers – Obviously it is the norm in the UK to reference within your exam essay so you should definitely do this, but where you can you should give sufficient detail (i.e. correlations, effect sizes) to show you have actually read the paper.  For example, instead of just saying “Barrick et al., (2001) found Conscientiousness to predict job performance”, go deeper than this and remark  “In Barrick et al’s (2001) meta-analysis, Conscientiousness emerged as the largest predictor of job performance (r =.20), dwarfing that of the other five factor traits, such as Neuroticism (r =.09), which suggests that those who are goal driven and dutiful may perform better at work”.
  • Be critical – It is not good enough to merely show you have learnt the literature by wrote, you must argue in both directions and weigh down one side.  No academic paper is perfect and the authors will acknowledge the limitations themselves (which you can use) and will make recommendations for future research, which, if featured in more recent papers, can help shape your own conclusions and suggestions for future research.  Likewise, tip 3 above will often bring up papers and developments that contradict or critique the papers on your reading list, use these criticisms to your advantage!
  • Have a logical structure – Whilst university exam marking schemes value innovation, your marker still has certain areas they have stick to when marking your paper.  Thus, your essay answer must have an intro, main body and a conclusion and follow some sort of coherent structure (this is where the planning helps!).  If you missed out a point, rather than writing it out of context, put an asterisk (or mark) in the gap and write that paragraph elsewhere with the same mark next to it, thus showing where it fits in.  If you are running out of time, make sure you round off with a quick summary, in order to ‘tick those boxes’.

And what the science says about revision

Professor Dunlovsky’s (2013) study, published in ‘Psychological Science in the Public Interest’ and summarised in this BBC article http://www.bbc.co.uk/news/health-22565912 , reviewed over a thousand different articles on 10 common revision techniques.  He found only two techniques to consistently work, (1) practice testing, where one constantly tests themselves by writing down their knowledge, covering up answers etc.  and (2) distributed practice – spreading out practice over time and mixing up the topics revised.  Some techniques, such as using highlighters, were found to hinder performance; and, contrary to popular opinion, Dunlovsky believes that his findings generalise across the majority of individuals.  From a personal perspective I can attest to these findings and have also found mixing up my revision location and methods of practice testing a good means of relieving boredom and maintaining self-motivation (and my sanity!)

I hope you have found this brief(ish) overview useful (and not just a mere distraction from your own study!), please feel free to comment and share with others.  And finally, to those with exams coming up…good luck!!



To the next step…moving from Baron and Kenny to bootstrapping in mediation analysis

Over the past month, I have been enjoying the excellent new books on statistical mediation from both Paul Jose and Andrew Hayes.  Mediation analysis seeks to explain the mechanism through which one variable influences another and is arguably one of the most important skills for a researcher in the social sciences.

As someone who was taught the Baron and Kenny ‘causal steps’ method (and has subsequently taught this to others) reading about a more modern approach, advocated by Jose and also Hayes, has thrown into doubt much of what I had learnt about mediation.  In this blog I seek to summarise the critiques of Baron and Kenny’s work made by both authors and point the reader in the direction of a new approach to mediation.

As you may recall, the Baron and Kenny method assumes that certain steps must be met for mediation to occur, the latter three of which are outlined in the diagram below.  First they propose that X must predict Y in the absence of the mediator (M) for there to be an effect to mediate.  Secondly, X must predict M (a).  The crucial third and fourth steps are that M must predict Y (b) and then in c’ either reduce the size of the relationship between X and Y (partial mediation) or reduce the relationship to non-significance (complete mediation).  If any of these four steps are not met, one effectively writes off any prospect than an indirect effect (i.e. a x b), and therefore mediation, could occur.  Hopefully this blog will make it clear why such an approach is both unnecessary and illogical.


Baron and Kenny’s 1986 paper outlining this approach is arguably of the most influential works in psychology to date, having been cited over 35,000 times (Field, 2013).  Further, the relative simplicity of this approach has lead to its establishment as the staple method of teaching mediation within the classroom.

However, considering the advances made since 1986 in terms of both methodological knowledge and computing power, it is no longer thought to be the optimal method of conducting mediation analysis.  Here the limitations of this causal steps method as discussed and recommendations are made for future practice, which will become imperative if one wishes to produce publishable work in the near future (Hayes, 2013).

Firstly, it should be noted that Baron and Kenny’s method produces no test that the indirect effect ab (and therefore mediation) has occurred, instead it merely provides supposed antecedents i.e. the causal steps outlined previously, that they propose must be met to enable mediation to occur.  I highly recommend downloading Hayes’ PROCESS plug in for SPSS (see http://afhayes.com/introduction-to-mediation-moderation-and-conditional-process-analysis.html) which allows one to produce output for the indirect effect (a x b in the above figure), including confidence intervals and effect sizes. It should be noted that Baron and Kenny did advocate the use of the Sobel test to calculate this indirect effect, yet the fact that It wasn’t explicitly outlined as one of their ‘steps’ means it has been commonly overlooked by researchers.  Further, given the lack of power of the Sobel test and it’s reliance on a normal sampling distribution, Hayes (2013) recommends the use of a bootstrapping method to calculate the indirect effect (available in PROCESS), which doesn’t suffer from such limitations.

On a similar note,  a further criticism of Baron and Kenny’s method is that some of the steps they propose are unnecessary, given that it is only really the indirect effect that matters in a mediation analysis (although the direct effect can aid interpretation).  Baron and Kenny suggest that X must significantly predict Y in the absence of the mediator (i.e. the total effect) for there to be an effect to mediate; although this point seems logical, it is certainly not the case.  For example, one may have a situation where the total effect is clouded by the fact that two sets of people e.g. males and females, differ in their relationship between X and Y. If these individuals are represented in similar numbers and the strength of the relationships is of a similar magnitude (albeit in opposite directions), they will cancel one another out.  Similarly, if a subset of individuals that show non-significant relationships between X and Y are overrepresented within a sample then this may explain a non significant total effect.  Further still, although Baron and Kenny propose the mediator should always predict the dependent variable, one should acknowledge than a strong relationship between the X and Y could lead to large standard errors for the mediator and negatively impact upon this causal step.

Perhaps the biggest problem levelled at Baron and Kenny’s method is their use of the terms ‘complete’ mediation and ‘partial’ mediation.  The term complete mediation suggests that one has accounted for all of the total effect of the relationship between X and Y and may guide a researchers discussion and future study in such a direction, yet in reality one could have multiple ‘complete’ mediators of a relationship.  Further, in cases where in reality there is actually partial mediation, a finding of complete mediation may be a mere reflection of an inability to detect the direct effect through a lack of statistical power, yet these results are often championed more than if a larger sample was able to detect the indirect the effect and thus ‘only’ find partial mediation.  The celebration of partial mediation is also somewhat illogical, in that, all psychological variables are essentially mediated by something, so the occurrence of a significant direct effect is merely a reflection of model misspecification.

I hope this blog helps outline some of the limitations of Baron and Kenny’s method and encourages readers to explore the books by Paul Jose and Andrew Hayes (listed below), which provide worked examples using the PROCESS plug-in for SPSS; both are very reasonably priced online. For those who wish to borrow and are unable to access these books in their library as of yet, Andy Fields newest version of his ‘Discovering Statistics using SPSS’ book features an excellent overview of this modern approach to mediation, including modelling and interpreting the indirect effect, as well as suggestions for the write up of results.

Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage.

Hayes, A. F. (2013). Introduction to mediation, moderation, and conditional process analysis. New York, NY: Guilford.

Jose, P. E. (2013). Doing statistical mediation and moderation (4th ed.). Guilford Press.