Significance level vs. the type I error chance, or how to interpret conditionals

Published on

One of the 12 p-value misconceptions states:

  1. With a \(P = .05\) threshold for significance, the chance of a type I error will be 5%.

I find this to be a particularly counter-intuitive one. Recall that the type I error occurs when we reject the null hypothesis if the null hypothesis is in fact true. Therefore, the chance of committing a type I error is \(p(\text{reject } H_0 | H_0\text{ is true})\)—which is exactly the definition of the significance level!

The confusion here stems from the ill-defined mapping from the English language to the language of the probability theory. Consider these semantically equivalent ways to state when a type I error occurs:

  1. Type I error occurs when we reject \(H_0\), given that \(H_0\) is true.
  2. Type I error occurs when \(H_0\) is true, given that we have rejected it.
  3. Type I error occurs when \(H_0\) is true and we reject it.

If we try to map these descriptions to their probabilities by replacing “given” with the conditional probability “|”, we will get three distinct probabilities:

  1. \(p(\text{reject } H_0 | H_0\text{ is true})\)
  2. \(p(H_0\text{ is true} | \text{reject } H_0)\)
  3. \(p(H_0\text{ is true} \wedge \text{reject } H_0)\), where \(\wedge\) means the logical “and”.

However, in this particular case there is no ambiguity. This is because a “type I error” is an event (whose probability is the subject of the misconception), but \(\text{reject } H_0 | H_0\text{ is true}\) and \(H_0\text{ is true} | \text{reject } H_0\) are not events. (You can find a proof that “|” cannot be an operator constructing new propositions/events in Richard Jeffrey’s “Subjective Probability: The Real Thing”, section 1.5.)

Therefore, the only possible interpretation is the third one.

Here’s another scenario where interpreting conditionals may be ambiguous. You say “I bet you that if Bernie Sanders becomes the Democratic candidate in the 2020 election, he’ll defeat his opponent”. I take the bet, and we each put $10 on the respective outcomes. Now imagine Sanders does not win the Democratic primaries. Who wins the $10?

One way to resolve this is to say that our whole bet was conditional on Sanders becoming the Democratic nominee in 2020. Since he didn’t, the bet is called off and no one pays. Under this interpretation, the probability of you winning the bet is

\[p(\text{Sanders becomes president } | \text{ Sanders is nominated for president}).\]

The fact that we had to recall the bet remidns us that this a conditional probability, and is not an unconditional probability of the “event”

\[\text{Sanders becomes president } | \text{ Sanders is nominated for president}.\]

Alternatively, you could say you were proven correct. Your statement essentially was

\[\text{Sanders is nominated for president } \Rightarrow \text{ Sanders becomes president}.\]

Since the antecedent is false (Sanders was not nominated), the rules of logic say that the whole implication is true. And since I took the other side of the bet, essentially betting on you being wrong, I lost.

The probability of you winning the bet under such an interpretation is

\[p(\neg\text{(Sanders is nominated for president) } \vee \text{ Sanders becomes president}),\]

(where \(\neg\) and \(\vee\) mean the logical “not” and “or”, respectively) because only in the case when Sanders is nominated but loses the election can I claim that your statement was wrong.