Statistical Evidence

I forgot to warn readers (since I didn’t realize it myself) that my schedule this summer would be variable, given some personal and professional commitments that mean posting will be less frequent in July and August.

Speaking of variability, that is one of the primary clues that a question you are trying to answer is statistical, rather than factual.

When Mark Twain said (or, more specially, paraphrased) that there were three types of lies: lies, damned lies, and statistics, he was likely talking about the truism that people use statistical evidence to buttress untruths just as false things are justified with facts all the time.

In cases where people present statistical information to support a questionable or even false conclusion, that evidence is just one more example of true premises not providing adequate support for the conclusion you are being asked to accept (in other words, statistics can be used as a premise in an invalid argument).

Given the human tendency to treat numerical information as superior to other types of evidence, drawing on statistics provides those pushing false (or true) conclusions a powerful means of persuasion. But statistical evidence, while effective if used responsibly, also runs into another problem: our brains are just not wired to think statistically.

For example, if a screening test for illegal drugs has a false positive rate of 3% (i.e., in 3% of cases, the test will show a positive result for people with no illegal drugs in their system), what is the likelihood someone who tests positive definitely took illegal substances? If you said 97%, would it surprise you to learn that your chances of having drugs in your system are actually less than 10%?

An explanation of that calculation can be found here, and while it may be eye-opening (if you can follow the math) it is in no way intuitive. Our tendencies to over- or under-estimate risk or mistake correlation with causation are additional examples of how our inability to think statistically can lead us astray.

These kinds of problems are best illustrated using the three major news stories of 2020: COVID, the Presidential election, and police shootings.

In the first case, COVID, the public was flooded with statistical data in an effort to help us navigate risks and choices throughout the last eighteen months as reports of increasing and decreasing infection rates were broadcast over the airwaves, printed in the newspapers, and pushed out through the Internet on a daily (sometimes hourly) basis. This evidence became an important component for tracking various approaches to slowing or stopping the disease (such as masking and social distancing), as well as evaluating how well vaccines were working. Those numbers continue to be used to help us determine risks of various variants of the original virus that might threaten our recent return to “normalcy.”

Problems regarding public acceptance of statistics arose not only due to human bias (such as people’s tendency to embrace or ignore statistical information for partisan reasons) but also due to the dynamic nature of the phenomena those statistics were meant to track.

For example, infection rates are a rate, meaning they can be expressed as a fraction that has the number of positive test results as the numerator with the total number of tests given during a time period the denominator. Putting aside issues of false positives and negatives mentioned earlier (which add uncertainty to the numerator of that fraction), lack of stability in the denominator can result in even more variability.

How? Well early on in the pandemic, testing kits and facilities were scarce which meant tests were primary given to those who were symptomatic. Given that people with COVID symptoms were more likely to have the disease than those who were not symptomatic, rates were going to change once testing became widespread enough to be used regularly on the non-symptomatic, meaning our denominator changed its nature dramatically over the course of 2020 and into 2021.

Now responsible parties in the healthcare and media communities did their best to control for and explain the dynamic nature of the statistics they were communicating. But that did not change the fact that the fractions underlying those statistics were a moving target.

Last year’s election presents other challenges related to variability, in this case the variability of people. Like most of you, I spent last Fall assuming that a Biden win was not just inevitable, but that his predicted landslide would be accompanied by a “Blue Wave” that would sweep Democrats into huge House and Senate majorities.

Yet on Election Night, the presidential race was not just much closer than expected, but Democrats actually lost seats in the House and the Senate ended up a messy tie.

So what happened? Well, in this case our assumptions were based on polls which assumed a subset of the voting public could be used to predict the behavior the entire electorate. The fact that methodologies for performing such measurements have failed to deliver accurate predictions (most notably in the case of the 2016 presidential race) did not seem to blunt our appetite for mathematical certainty.

That uncertainty we ended up with arose from the fact that human beings are not like gas particles with no free will. We can choose to not talk to pollsters, or not tell them what we really think when we do answer their questions. We change our means of communication (from listed land lines to unlisted cell phones) and hide our real beliefs until we enter a voting booth.

More importantly, circumstances change before, during, and after polls are conducted, meaning events (like a COVID surge or civil unrest) can cause us to change our minds. All of these factors – not to mention the flailing track record of pollsters – should make us treat statistical information involving human behavior more skeptically than we do.

The final story that presented problems with statistical evidence was the police shootings that led to civil unrest in 2020. In this case, the problem was not that statistics were not available, but that they weren’t satisfying.

Again, this was not simply a case of bias (such as the biases leading to partisans choosing their preferred statistics or fights over whether confounding factors – like crime or poverty – should be factored into statistical analysis). Rather, the trouble with stats of any provenance or accuracy is that they fail to capture the human cost of each individual death, especially one as grisly as the killing caught on camera of George Floyd.

Our appropriate outrage over what we saw could not be ameliorated by the fact that such violence might be statistically less frequent than we believe, or mitigated by “confounders,” highlighting the benefits and challenges of the next topic I’d like to talk about: anecdotal evidence.