What does a P value mean?

A microtweetorial because I have been watching @anupampom and @JohnTuckerPhD chase each other like this for the last hour, which is distracting me from doing work.

Answer this and save them from circular argument.
#foamed #meded
Francisomycin, my new natural anti-atherosclerotic antibiotic, has now been published in NEJM to reduce atherosclerosis by 173 arbitrary units (p=0.03).

I am really, really pleased with the result and the fact that I have the patent on this magical stuff.
What does p=0.03 mean?

A. 3% of pts had reduced atherosclerosis?
B. 97% of pts had reduced atherosclerosis?
C. If a drug is ineffective, 3% chance that it produces an effect this extreme.
D. 3% probability that THIS drug is ineffective, i.e. that these results occurred by chance
I am breaching all rules of Twittiquette by revealing the voting pattern during the vote itself. This is as disastrous as I felt it would be, and indeed more elegantly perfectly disastrous than I could dream of.

At 30 votes, here is the distribution.
At 60 votes, it is 40:20 with the WRONG answer in the lead.

People who've answered already, don't give the game away please. Let me take everyone through this as it is difficult but fundamental to interpreting clinical trial data, or indeed any hypothesis testing (p values).
Francis Academy of Kale Endotherapeutics, one of my less successful ventures, produces a range of natural products.

They are organic, renewable and biodegradable. In short, they are lawnmower cuttings diluted in water.

F.A.K.E. has run a series of trials, FAKE-1, FAKE-2 etc.
Alas, despite enthusiastic YouTube videos from leading Lawnmower men, the sliced grass and water mixture has no actual health benefit.

However, FAKE has deep pockets and ran many trials.

In fact, 200 of them.

How many of them would be expected to have p<0.05?
When you run a trial you will find that FAKE-1 recipients do slightly better or slightly worse than Controls, through chance.

Individual patients do differently.
The group MEAN should "wobble" less than an individual, by about SQUARE ROOT OF N-fold, if N is the number of patients.
The statistical tests help you by looking at the variability between individuals, automatically calculating how much the GROUP MEANS would be EXPECTED to vary, and then seeing if the ACTUAL difference between the active an placebo arm can be EASILY explained by random variation.
If the between arm difference is here ("A"), within the expected range of wobble, the statistics tell you this by giving you high P value.
If, on the other hand, the between-arm difference is large (such as "B") - so large that it would only RARELY be expected to be this far away by chance, the statistics tell you this by giving you a SMALL P value.
In fact, you can think of the statistics as "scoring" the actual result of the trial on a standardised scoring scale, for AN IMAGINARY DRUG THAT DOES NOT WORK.

On this scale, p=1.00 means "absolutely exactly average of expectations, for a drug that does not work."
For example, if I run a 2000 patient trial and have 132/1000 events in the active arm, and 132/1000 events in the control arm, the statistics (if based on this alone) will return:
Hint - if answer not obvious, read the tweet immediately above. It is written there...

In my mind I think of a P value as colour coding the result of a trial.

The statistics calculate where the trial result lies on that spectrum.

Near the middle, and the stats return "P=0.9..something"
Near the outer extremes is the exciting area. The stats are saying:

"Wow! A truly useless drug would be unlikely to be this lucky or unlucky."
Side request to nerds
I sketched the above in PPT between caths just now. It's not very good.
I would be grateful if someone can draw it correctly, in R or Excel or something.
Don't use rectangles, use trapezia, so it doesn't look so zany at the top.
The x-coordinates should be
NORMINV(0.025, 0, 1)
NORMINV(0.075, 0, 1)
etc, to
NORMINV(0.975, 0, 1)

The y-coordinate for each x should be

Tweet or DM it to me, and I will credit when using in future.
The secret fact about P values

P values are automatically generated by statistics, in a way that INEFFECTIVE interventions produce P values UNIFORMLY distributed between 0 and 1.

0.32 is just as likely as 0.78 and 0.02.
Ineffective drugs produce values beginning 0.00..., 0.01..., 0.02..., 0.03... and 0.04... just as often as they produce values beginning with 0.54... or 0.91...

So, IF THE DRUG IS INEFFECTIVE, what is the probability of getting a P value:

BETWEEN 0.2500 and 0.3500?
Likewise, IF A DRUG IS INEFFECTIVE, what is the probability of getting a P value

between 0.000 and 0.0499...
Therefore the 200 trials of rubbish drugs, by the indefatigable F.A.K.E. Ltd, would be expected to produce about how many trials with P<0.05?

In amongst those trials (5, 10, 15, or 20, however many you said above) where the P value is <0.05,

In WHAT PROPORTION OF _THOSE_ TRIALS is the drug actually genuinely any good?

(Remember it is just lawnmower shreddings and water.)
What is the relationship between the 5% probability inherent in the definition of "P<0.05" and the percentage you chose in the above answer?
That last result (~98% right at the 50-vote stage) is the central challenge of P values.

P is the probability that IF YOU START WITH A USELESS DRUG, you get a result like this.

It is NOT the probability that IF YOU GET A RESULT LIKE THIS, the drug is useless.
In everyday life we are very sloppy indeed in our use of probabilistic language, and most people are not aware that there is even a distinction between the two.

Evidence of this lack of awareness is the current state of the vote on the first question:

At 300 votes:
The great statistician Stephen Senn has written a whole article on the non-reversibility of probabilities

For example, imagine you choose a person at random from the world.

They may be a WOMAN (or not)
They may be PREGNANT (or not)

*If* they are a woman, what is the probability they are pregnant?
*if* they are PREGNANT, what is the probability they are a woman?
It matters, a lot, a REALLY lot, which way round the conditions are.

So much so that mathematicians, who hate getting confused, have a special notation for it.

They write it

P (Thing that MIGHT be true | Thing that is ALREADY KNOWN TO BE TRUE)
What is P(Woman | Pregnant)?

You can read this as:

"What is the probability of a person being a woman, GIVEN THAT you already know they are pregnant?"
What is P(Pregnant | Woman)?

You can read this as:

"What is the probability of a person being pregnant, GIVEN THAT you already know they are a woman?"
Read the vertical line "|" as the words "GIVEN THAT".

The P value of a trial is:

P (A result as exciting as this or more so | Drug is, in fact, CRAP)

It is not the probability that anyone actually wants. It all backwards.
We can't even convert it into the probability we want.
The utility of P values is that if you try out rubbish drugs, 95% of the time the result has P>0.05.

However 5% of the time, the result will have P<0.05.

If you try out ONLY RUBBISH DRUGS, what proportion of the P<0.05 results you get will be genuinely beneficial drugs?
In real life, we DO NOT KNOW whether we are trying out

- mainly GOOD drugs
- mainly RUBBISH drugs
- 50:50 mix
- some other mix

We have absolutely no idea.
So there is NO POINT expecting our standard stats to tell us P(Drug is rubbish | Got the result we got).
It can't.
What I have described in somewhat childlike terms is our standard "frequentist" stats.

There is another approach to stats called "Bayesian", which (if we could do it perfectly) would give us the probability we actually wanted, namely "how likely is this drug to be rubbish?"
I understand the principles of Bayesian Stats.

But I also understand the principles of doing a Heart-Lung transplant.

Doesn't mean I would actually try to do either, myself!

This tweetorial only covers Frequentist, which is the great majority of published medical statistics.
Revision Question:

What does p=0.03 mean?

A. 3% of pts had reduced athero
B. 97% of pts had reduced athero
C. If a drug is ineffective, 3% chance that it produces an effect this extreme or more
D. 3% probability that THIS drug is ineffective, i.e. that these results are chance
I am a little unhappy with the result of the above poll.

P values tell you the following.

IF THIS DRUG IS INEFFECTIVE, the chance of a result like this (or more extreme) is 0.###.

Can we try again please?
Fingers crossed...
Third time lucky...
Francisomycin has been studied in only 1 trial. It was done honestly.

Got p=0.03.

What does this mean?

C. An INEFFECTIVE drug would be expected to produce a result as extreme as this (or more so) in only 3% of trials.

D. There is 3% probability that THIS drug is ineffective.
Hint. Think very carefully.

Remember that a the statistics generate your P value as a handy rating of how RARELY a trial of an INEFFECTIVE drug would have a result as dramatic as this (or more so).

Remember the Uncle Darrel red-o-gram?
We are getting closer to 100% correct.

Remember when we first did this, the majority were wrong. So this is a big improvement in the hardest thing to understand in the whole of medical stats.
Francisomycin has been studied in only 1 trial. It was done honestly.

Got p=0.03.

What does this mean?

C. An INEFFECTIVE drug would be expected to produce a result as extreme as this (or more so) in only 3% of trials.

D. There is 3% probability that THIS drug is ineffective.
The reason the Titanic sank was not because people were stupid.

It was because of inertia. It is very difficult to change the velocity (speed and direction) of a very heavy object.

A huge amount of impulse is needed.
We are like that in our misperception of what the P value is calculating for us.

It is ONLY showing us what an imaginary INEFFECTIVE drug would be expected to do. How rarely a trial result of an INEFFECTIVE drug would be as extreme as this or more so.
Please reflect on the fact that the MAJORITY of people misunderstood the meaning of P values, the most common statistic seen in medical research.

And more importantly how SLOWLY we shifted the Titanic of Twitteropinion as I repeated the same question through this tweetorial.
Here is an interesting PhD project for someone.

Develop an efficient and memorable way to explain to medical colleagues in general what a P value actually means.

Test it like in this tweetorial, but also by retest at a later date.
People will laugh at you that you are not doing gels or genes, nor comparing zigogrel with zagogrel and counting 'events'.

But you will be contributing an enormous amount to medical care worldwide and everyone will be genuinely interested in your PhD.
Not just your mum, pretending to be interested.
The @thebyrdlab have developed this improved version of the P value obviousifier graph.
Here is an explanation of a problem with interpreting P values. Statistical experts (I am NOT one of them, by training or by belief values) argue the the probability that the drug is crap is 0 or 1. We just don't know what it is. It can't be 0.03.
"Probabilities are only for things in the future. For things in the past probabilities are always 0 and 1."

Zeroth Law of Statistics,
J Articles Rej By J Quantum Theor.
Thank you to Lauren for highlighting a beautiful illustration by the astonishing polymath that is XKCD.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Prof Darrel Francis ☺ Mk CardioFellows Great Again

Prof Darrel Francis ☺ Mk CardioFellows Great Again Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ProfDFrancis

Jul 29, 2018
If anyone wants to know how incorrect causal inference arises in cardiology, there's no need to do a PhD on it.

It's encapsulated in this thread. Unlike most of my threads it has a happy ending though !

First, a whole load of unsuspecting patients have PCI.
Then a bunch of cardiologists who are normally ultracompetitive decide to do something constructive for a change, instead of just doing each other down in cross-London acrimony.

Let's get together and ...

They do what we always do when we cardiologists get together. Tell stories.

"I had a guy with 3 vessel thrombosis in cardiogenic shock, going into asystole as he was put on the table, but we still saved him!"
Read 64 tweets
May 27, 2018
How to make your
correlations correlate.
Even if they don't.

A practical tweetorial for aspiring research fraudsters, cardiology fellows doing research, and others in need of a strong association when there isn't one.

#meded #foamed

Please retweet to fellows.
Thank you to @rallamee for loaning me the use of her garden and boating lake to do a few demonstrations in this weekend's tweetorial.
Thank you also to a colleague, for sending me this just now. It is enlightening and I am genuinely sorry.
Read 58 tweets
May 13, 2018
Summer Sunday quiz

from Sunny Plymouth, Pearl of Europe

#DreadMed #DooMed
Look at this table of baseline characteristics.

Control arm consists of 22+2 = 24 people.
Therapy arm consists of 20+1=21 people.
How many people is that, altogether?
Read 24 tweets
May 13, 2018
Joel Giblett @joelgiblett brings up a very important question.

Where does scientific integrity come from? What makes me think I can trust papers written by him? I've never heard of him before this tweet, and I have no idea of his background.
Well, first of all, he seems genuinely curious about integrity. This is a good sign.

Second, he is scared of R&D and MHRA. This is a good sign that he tries to toe the line, pays his taxes, doesn't park in the disabled spots (unless entitled) and doesn't drop litter.
But it is not non-littering, or trembling when the R&D office sends an email, that CAUSES his research to be non-fake.

It is his own personal attitude, those of his colleagues, and the lack of tremendous incentives to fiddle.
Read 22 tweets
Apr 21, 2018
Risk Ratio, Odds Ratio, Hazard Ratio

2nd and final part of the tweetorial, from ORBITA-HQ!

Fun, easy and informative [*]

#MedEd #FOAMed #cardiology #cardiotwitter
[*] Results may vary and are not guaranteed. See small print.

Not all fears are alike.

* Some fears are one-and-done:
I get a cold which progresses to pneumonia. Will I die, or recover?

* Other things hang over us for much longer, perhaps all your life.
Will you get hit by a bus?
Get a heart attack?
There's no "sell by" date on the fear.
Practice Qs

You are walking through Hyde park, minding your own business.
Hardly causing any trouble at all.
Perhaps the odd humorous tweet.

You get a Direct Message:
"Enough of ur abuse!
You will be hearing form my agent shortly.
AJ Kirtane"

You see the agent:
Read 89 tweets
Apr 13, 2018
Can you get into the mindset of a probability?

You thought you had a simple and easy life, roaring along the highway from 0 upwards.

Until you see this ahead of you.
You have a twin sister, who in her youth was similar to you in many respects.
But she was always more vivacious and sporty.

When you read comic books, she preferred to dance.

When you watched TV, she joined a cycling club.

Now you are almost at a dead end.
And she is running free.
Read 30 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!