Wow, #epibookclub, we’re officially halfway through the #bookofwhy!
Chapter 6 is a doozy... we’re going to learn all about paradoxes!! Get ready to have your mind blown!! 🤯🤯🤯
Given last week’s tragic fail on paradox gifs, I made you our very own #epibookclub gif! It’s paradox time!!
(Please be kind this is my very first homemade gif 😂😂)
I like the chapter set up this week: @yudapearl explains importance of spending time working out the paradoxes...they can give us insight into how people process causal information. He says these are paradoxes b/c they straddle the rungs of the #ladderofcausation
So, up we go!
We begin with the Monty Hall problem. I found the data table (Table 6.1) to be really helpful.
For simplicity, whichever door you picked is Door 1.
Here’s Pearl’s table
If you don’t like to read, here’s an emoji version.
It shows the outcome(⁉️) for the 3 possible door (🚪), car (🚗), goat (🐐) combos.
Monty always picks a🚪with a🐐, so we can easily tell from the table if you win (🎉) or lose (😫) by switching(🔄) or staying (⏹)
You’ve only got a 1 in 3 chance of winning if you stick with your original pick, but a 2 in 3 chance of winning if you switch!!
It seems like magic but it’s really because Monty gave you new information!
Monty’s choice is determined by your choice & the location of the car. That makes it a collider!
As we’ve learned, when you include the collider in your analysis, you induce a spurious association!
Here, Monty’s choice opens a path between your choice & the car!
As a second attempt to explain, Pearl introduces a simpler game: what if Monty is *also* guessing randomly?
Now, the location of the car doesn’t affect Monty’s choice, there’s no collider, and you don’t do any better whether you switch or stay!!
So, Monty’s choice only helps if we know how Monty is deciding!
The lesson here? How we obtain information is just as important as the information itself!
Good lesson for #epi, and good lesson for #life
Pearl finishes this section with a third attempt at explaining, using a Bayesian framework:A hypothesis that survives testing gets an updated probability of being true.
When Monty knows where the car is and picks Door 2, then you learn something about Door 2 & 3, but not Door 1
Door 1 wasn’t an option for Monty to pick (you already claimed it!), so you learn nothing about Door 1.
Result: the probability of the car in Door 1 doesn’t change, but for Door 2 & 3 the probability they have the car does change!
When Monty doesn’t know where the car is, picking Door 2 only tells u something about Door 2, so the probability of Doors 1 & 3 don’t change and it doesn’t matter if you switch or not!
That’s all for Monty Hall.
Next up, Berkson’s bias: even if two diseases aren’t related, if you look at people who are hospitalized then you will find an association between them!
Now that we know DAGs, this is simple! Hospitalization is a collider, and restricting opens it up
Pearl gives a couple other examples of collider bias caused by your selection process.
For ex, maybe u only date people who are cute or funny, or both, but never anyone who isn’t cute *and* is boring. Then, among people you date, cute dates are likely boring & vice versa!
Another example: flip 2 coins 100 times each & record only the times when at least 1 coin has heads.
Hypothesis: You’ll get about 75 records, within which the two coins are associated, even tho they are actually independent!!
Explanation: selection is a collider!!
I loved this example, and I think it would make a great Intro Epi lab exercise. So I got myself 2 coins and I did it!
•Among all 100 tosses, the 2 coins were independent (p=0.56).
•Among 73 tosses with 1 or 2 heads, the 2 coins were associated (p=0.000001).
Collider bias‼️
Wanna check the results? Here’s my data (all 100 tosses are included but the code also subsets to 1 or 2 heads), and some very basic R code to compare the two coins.
github.com/eleanormurray/…
A fun and easy way to show how selection can create bias! Another thing I learned? Tossing coins is actually kinda hard!
Finally we get to Simpson’s paradox. It boils down to this: if u have confounding u should look at adjusted results & if u have mediation u should look at unadjusted.
When u don’t know which you have, it can be confusing b/c adjusted & unadjusted could give opposite conclusions
Example 1: a drug fails to prevent heart attacks for men, fails for women, but seems to succeed overall.
Here, gender is most likely a confounder b/c it comes before the drug in time. So, we should adjust. The results for men and women separately are right: drug A doesn’t work!
Example 2: a drug given to lower blood pressure & prevent heart attacks seems not to work within each blood pressure group but does overall.
Here, blood pressure is a mediator and adjusting blocks the effect. The unadjusted results are right: drug B does work!
Finally, we get a version of Simpson’s paradox called Lord’s paradox.
Lord wants to know if children in a school gain weight over the year when a new diet is put in place. He looks at boys and girls separately comparing weight change by gender & sees no difference.
But then he looks within levels of September (baseline) weight, and suddenly there’s a difference!
What’s going on?
The causal q here is (weirdly) does gender cause weight gain. The first analysis is the right one to do because there is no confounding!
If Lord had changed the example slightly to compare two diets instead of genders, now the situation is reversed!
Baseline weight *might* be a confounder for diet choice &if so, the adjusted analysis is correct!
(But if diet were randomized, unadjusted is still the way to go!)
This was a pretty intense chapter. I’m very curious to hear whether you understood it, especially if this is your first real foray into #causalinference.
Share your comments, questions, confusions, etc!
Which bias is the most 🤯🤯?
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.