Alec Muffett Profile picture
Jul 7, 2018 23 tweets 9 min read Twitter logo Read on Twitter
Regards #Article13, I wrote up a little command-line false-positive emulator; it tests 10 million events with a test (for copyrighted material, abusive material, whatever) that is 99.5% accurate, with a rate of 1-in-10,000 items actually being bad.
For that scenario - all of which inputs are tuneable - you can see that we'd typically be making about 50,000 people very upset, by miscategorising them as copyright thieves or perpetrators of abuse:
But let's vary the stats: @neilturkewitz is pushing a 2017 post by very respected fellow geek and expert @paulvixie in which Paul speaks encouragingly about a 1-to-2% error rate; let's split the difference, use 1.5% errors, ie: 98.5% accuracy:…
Leaving everything else the same, we have now tripled the number of innocent people that we annoy with our filtering, raising it to ~150,000 daily; in exchange we stop about 990 badnesses per day.
Let's be blunt: We make victims of, or annoy, about 150-thousand people each day, in order to prevent less than 1000 infringements, if we use these numbers. The only other number that we can mess with "is the rate of badness", because the number of uploads is what defines "scale"
So let's do that: let's assume that the problem (eg: copyright infringement) is very much WORSE than 1-upload-in-10,000; instead let's make it 1 in 500. What happens? This is what happens: you still upset 150,000 people, but you catch nearly 20,000 infringements.
If 1 in every 500 uploads are badness (infringing, whatever) then you annoy 150,000 people for every 20,000 uploads you prevent; that's still 7.5 people you annoy for every copyright infringement that you prevent. BUT what if the problem is LESS BAD than 1 in 10,000 ?
I was at a museum yesterday, and I uploaded more than 50 pictures which (as a private individual) I'm free to share; the vast majority of uploads to Facebook by its 2 billion users will be "original content" of varying forms, stuff that only the account-holder really cares about.
So let's go with an entirely arbitrary guess of 1-in-33,333 rate of badness - that amongst every 33,333 pictures of hipsters vomiting, of "look at this sandwich" and of "here's my cute cat", there's only 1 copyrighted work. What then?
What happens is that you still piss-off 150,000 people, but your returns are really low - you prevent only about 300 badnesses in exchange; at which point you really have to start asking about the cost/benefit ratios.

If you want to play with the code:…
This sort of math might be useful to @Senficon, I suppose, especially in relation to the thread at
I would _REALLY_ love to have some little javascript toy with a slider for test accuracy, some input box for 1-in-N badness rate, and then have the four buckets broken out for visualisation; but I am a backend coder and my JS-fu is weak.
And in case it's not obvious, I take issue with @paulvixie's somewhat glib assertion that "Simple procedures can be readily adopted to address the relatively small number of false positives" - for the reasons that I demonstrate above, & also in this essay:…
One last little addendum: let's go back to a badness rate of 1-in-10000, but let's drop the test accuracy to a more plausible 90%, what happens? Answer: you piss off nearly 1 million people per day, to prevent about 900 infringements.

Have a nice saturday!
This thread has been nicely unrolled at… for easier reading in web browsers.
This is a cute way to phrase some of the results:
Revisiting the above: let's assume the test is has a Vixie-like accuracy of 98.5% & that BADNESS IS REALLY PREVALENT: 1 in every 500 uploads are bad.

What happens each day?
- you annoy 150,000 innocents
- & stop 17900 badnesses
- 300 badnesses SURVIVE THE FILTER

Is this good?
HYPOTHETICAL: How much badness do you need, with a 98.5%-accurate test, for the False-Positive-Rate (loss) to EQUAL the Block-Rate (gain)?

Answer: about 1 in 67 postings have to be "bad" in order to break even (ignoring costs of overhead, power, CPU, etc)
…at that point you are making as many bad guys unhappy, as good guys.

Probably, nobody is happy.
HYPOTHETICAL #2 — "…but Alec, what if you are a smaller content-sharing platform and only get 10,000 new pieces of content per day?"

Anwers: at 98.5% accuracy you will piss-off about 140 to 180 users per day; results shown with badness rates of 1:500 and 1:10000 for comparison
HYPOTHETICAL #3— If you're in business, you've probably heard of "Six Sigma", a metric of quality; a friend suggested that I model it, & at SixSigma the costs of Filtering & False Positives are CLEARLY BEARABLE: about 34 FALSE POSITIVES IN 10 MILLION UPLOADS. There's one problem:
Six Sigma MEANS 99.99966% ACCURACY - ie: that the copyright-identification engine can only make 3.4 mistakes per million tests; which nicely multiplies-out in the example below. It's never going to happen in real life.
With these same numbers (1:10,000 badness, 10M uploads) a test which is merely 99.99% accurate (a little over 5-Sigma) then you approximately have as many false-positives as badness, a 50-50 ratio:

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Alec Muffett

Alec Muffett Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AlecMuffett

Oct 5, 2018
Re: @timberners_lee's #Solid / @SolidMit

Hi @robertscammell!

More interesting that Facebook, I used to work on TheMine!Project*, a highly influential, much-plagiarised & ultimately unsuccessful stab at personal information stores, from 2006-2011.

If you want to know my opinion of how @timberners_lee's #Solid will impact "tech giants", watch this video (actually, x3) from 2010; the bulletpoints are:

- facebook killers, aren't
- there's plenty of room for alternatives
- first it must grow

The media loves zero-sum, david/goliath stories, and thereby often causes doom ("ello") & even tragically suicidal levels of stress ("diaspora*") to people who are foolish enough to pitch themselves/their platforms as the antithesis of "social media giantism; so do please beware.
Read 7 tweets
Sep 4, 2018
Australia: "The Assistance and Access Bill 2018" - the people of Australia have SIX DAYS in which to register their feelings on encryption back doors:… #straya #endtoend
A Bill for an Act to amend the law relating to telecommunications, computer access warrants and search warrants, and for other purposes #otherPurposes
A technical capability notice may require the provider to do acts or things by way of giving help to ASIO or an interception agency in relation to…
Read 13 tweets
Aug 5, 2018
Hey! You remember that piece where I was randomly asked to respond in a 2…3 hour window, about "fixing" Facebook? Well, it's out, and I've found it!
And, of course, like every other Associated Press piece, it is broadly republished in many newspapers, under mostly-the-same-headlines:
You get the same copy at CTV in Canada:
Read 11 tweets
Jul 30, 2018
<pops open bonnet of car>
Mark: "There you go, there's the engine. 4 cylinder petrol engine"
@CommonsCMS: "Where are the horses?"
Mark: "Horses?"
CMS: "We heard it's a 100 Horsepower engine."
Mark: "That's just a metaphor…?"
.@CommonsCMS: "No, we know there are horses. That engine is a black box. You're not being transparent about where the horses are."
Mark: "But that's not how cars really work…"
CMS: "Everyone knows that cars are driven by horsepower. We want to see the horses." #algorithms
Author's Note: this may sound like whimsy, but it's only a few years since I had the following conversation with a member of a London-based "civil society" campaigning organisation:
Read 9 tweets
Jul 27, 2018
HEREWITH: a _different_ argument about why it's easier to put a man on the moon than to have backdoorable cryptography at scale. This fine article got posted by Techdirt a couple days ago…
And it has received reasonable praise, commentary, and dad-jokes from the usual crypto suspects:
And it quotes the highly respectable @mattblaze who as-ever properly demolishes the argument on its own terms of groundless aspiration:
Read 15 tweets
Jul 9, 2018
While we're on the topic of scale: every so often I have the misfortune of having to listen to some politician or former civil servant* demanding that people "NEED TO LEARN THE VALUE OF THEIR PERSONAL DATA, GODDAMNIT!".

*eg: ex-GCHQ
This one can be quite quick:
- Facebook
- About 2 Billion users
- Annual revenue 2017: $40.653 Billion…
Here's simple division as a rough guide: your data is worth about $20

About $20 per annum per user.

Let's implausibly assume that you're a heavy user, and are worth double that, so that you're actually worth $40; that means your value to Facebook would be (40/12) = $3.33/month.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!