By Don Lincoln, Ph.D., University of Notre Dame
We see statistics used all the time in the media, like in elections and other similar polls. But what about another place where you hear about statistics in the media—the world of medical claims?
What Is a ‘Significant’ Result?
Scientists often say that a result is ‘significant’, which means that the tested hypothesis is supported by data. What exactly does the term ‘significant’mean? There are two groups defined, one of which is a test group and one which is a control group. In the breakfast-based example, the test group is fed waffles and the control group isn’t. Then, the two groups are compared for baldness.
The core trick is that you have to start out assuming that waffles have no effect on baldness. You then ask how likely it is that the range of baldness that you see among the waffle-chomping population could be explained by simple random fluctuations.
Now, there are many statistical tests to do this that you might have heard of. They all have pros and cons, and they’re all different in technical ways. But, they do have one thing in common: they tell you if the thing you’re testing matters or not. And the property they use to tell you if the thing matters is called the ‘95 percent confidence limit’.
This is a transcript from the video series Understanding the Misconceptions of Science. Watch it now, on Wondrium.
Misunderstanding the 95 Percent Confidence Limit
So what is a 95 percent confidence limit? Most people hear the term and think they know what it means. And almost everybody is wrong about that.
Most people will say that a test is true if it meets the 95 percent confidence limit. They assume that this means that it is 95 percent likely that the thing you’re testing—in above example, eating waffles—is the cause of the thing that’s your outcome—which in this case is increasing baldness. And this is just ‘not’ what it means.
The 95 percent confidence limit does not talk about the probability of your assumption being correct. It talks about the chances of the connection itself being non-random.
Actually, what the statistical test would do is assume that waffles don’t cause baldness. Then, you look to see what you measured and see if it is likely that your assumption is true. For most reports you will encounter in the media, on topics like medicine, sociology, psychology, and political science, scientists use the 95 percent confidence threshold—the confidence limit—to claim that something matters.
Learn more about how to better understand and evaluate the media description of prescription drugs.
Less than a One in Twenty Chance
Assume that the thing you are testing has no effect on the result you are testing against. What range of outcomes do you expect to observe if that’s true, and what are their probabilities? If it will happen, purely by chance, less than 5 percent of the time, then you have reason to think that the hypothesis you’re testing might matter. This is what 95 percent confidence limit means.
But there is a consequence of this that you should be aware of. Remember that 5 percent simply means 1 time in 20. There is probably a good chance that more than 1 out of 20 waffle-eaters will be bald. So, a 1 out of 20 chance is not all that rare as you might think it is. It’s even less rare if you make lots of tests.
Testing a Hypothesis
Suppose you have a hypothesis that painting your room certain colors will help you sleep better. If you think about it, that even sounds reasonable. A light beige or gray just sounds more soothing than, for example, hunter’s orange.
So, to test that, you go to your local hardware store and get one of those cards with lots of colors—say the card has 100 colors—and you paint 100 bedrooms with all options and you do your experiment. You find out that in certain bedrooms people are indeed sleeping better. Is this enough to publish the findings in the journal for paint science?
Remember that your threshold is a 95 percent confidence limit, which means 1 time in 20. But you tested 100 colors. That means that it’s completely expected that something like 5 of those colors will pass the 95 percent confidence limit even if the color has no actual effect on sleep patterns. It is entirely possible to have people sleep soundly in 5 out of 100 rooms, regardless of the color in that room. You would be therefore making a hasty conclusion.
If you were testing just one color, the 95 percent confidence limit is a reasonable approach, but if you test dozens, or hundreds, or thousands of different configurations, you have to keep that in mind and adjust your requirements accordingly. If you have a big set of measurements, it’s not difficult that something could happen just 5% of the time out of pure random chance.
This particular statistical sin, of assuming that the 95 percent confidence limit talks about the result itself rather than the chances that the linkage between cause and effect is not random, is very common in the medical reports you see in the media. Media reports on medical trials on a variety of things use the concept of the 95 percent confidence limit fairly inaccurately.
This doesn’t mean that you shouldn’t trust statistics. Statistical methods work very well. But you should be suspicious of how people use statistics if they have an agenda.
Learn more about health, medicine, and the media.
Common Questions about the 95 Percent Confidence Limit
When scientists talk about a significant result, it means that they think that the tested hypothesis is supported by data.
For most reports you will encounter in the media, on topics like medicine, sociology, psychology, and political science, scientists use the 95 percent confidence threshold—the confidence limit—to claim that something matters.
People assume that the 95 percent confidence limit means that you can say that it is 95 percent likely that the thing you’re testing in an analysis is the cause of the outcome you are testing for.
However, it actually means that the outcome being tested for will only happen randomly no more than 5 percent of the time.
The 95 percent confidence limit is another way to say that the outcome being tested for will only happen randomly no more than 5 percent of the time.