Pierre-Yves Oudeyer Profile picture
Jul 6, 2018 11 tweets 7 min read Read on X
How many random seeds are needed to compare #DeepRL algorithms?

Our new tutorial to address this key issue of #reproducibility in #reinforcementlearning

PDF: arxiv.org/pdf/1806.08295…

Code: github.com/flowersteam/rl…

Blog: openlab-flowers.inria.fr/t/how-many-ran…

#machinelearning #neuralnetworks
Algo1 and Algo2 are two famous #DeepRL algorithms, here tested
on the Half-Cheetah #opengym benchmark.

Many papers in the litterature compare using 4-5 random seeds,
like on this graph which suggests that Algo1 is best.

Is this really the case?
However, more robust statistical tests show there are no differences.

For a very good reason: Algo1 and Algo2 are both the same @openAI baseline
implementation of DDPG, same parameters!

This is what is called a "Type I error" in statistics.
Sometimes, using few random seeds shows no sign of one algorithm being
better or worse than another.

Here, DDPG with action perturb. vs DDPG with parameter perturb. with 5 seeds.
This apparent no-difference is a "Type II" error. Using more random seeds, and refined statistical tests, DDPG with parameter perturb. is actually robustly better than DDPG with action perturb.
The tutorial discusses the issue of how many random seeds are needed to compare algorithms, and which statistical method to use to assess the reliability of results.
Nothing is new in this tutorial, and these statistical methods are used widely in biology and physics. But we hope it will be useful!

What is surprising is how rarely they are used in #machinelearning ... which is about statistical learning!
If you see things to improve or update, all comments welcome!
You can use the blog to post questions/comments:
openlab-flowers.inria.fr/t/how-many-ran…
Last but not least, congrats to Cedric Colas for the outstanding work on this project!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Pierre-Yves Oudeyer

Pierre-Yves Oudeyer Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(