Regards #Article13, I wrote up a little command-line false-positive emulator; it tests 10 million events with a test (for copyrighted material, abusive material, whatever) that is 99.5% accurate, with a rate of 1-in-10,000 items actually being bad.
For that scenario - all of which inputs are tuneable - you can see that we'd typically be making about 50,000 people very upset, by miscategorising them as copyright thieves or perpetrators of abuse:
But let's vary the stats: @neilturkewitz is pushing a 2017 post by very respected fellow geek and expert @paulvixie in which Paul speaks encouragingly about a 1-to-2% error rate; let's split the difference, use 1.5% errors, ie: 98.5% accuracy: circleid.com/posts/20170420…
Leaving everything else the same, we have now tripled the number of innocent people that we annoy with our filtering, raising it to ~150,000 daily; in exchange we stop about 990 badnesses per day.
Let's be blunt: We make victims of, or annoy, about 150-thousand people each day, in order to prevent less than 1000 infringements, if we use these numbers. The only other number that we can mess with "is the rate of badness", because the number of uploads is what defines "scale"
So let's do that: let's assume that the problem (eg: copyright infringement) is very much WORSE than 1-upload-in-10,000; instead let's make it 1 in 500. What happens? This is what happens: you still upset 150,000 people, but you catch nearly 20,000 infringements.
If 1 in every 500 uploads are badness (infringing, whatever) then you annoy 150,000 people for every 20,000 uploads you prevent; that's still 7.5 people you annoy for every copyright infringement that you prevent. BUT what if the problem is LESS BAD than 1 in 10,000 ?
I was at a museum yesterday, and I uploaded more than 50 pictures which (as a private individual) I'm free to share; the vast majority of uploads to Facebook by its 2 billion users will be "original content" of varying forms, stuff that only the account-holder really cares about.
So let's go with an entirely arbitrary guess of 1-in-33,333 rate of badness - that amongst every 33,333 pictures of hipsters vomiting, of "look at this sandwich" and of "here's my cute cat", there's only 1 copyrighted work. What then?
What happens is that you still piss-off 150,000 people, but your returns are really low - you prevent only about 300 badnesses in exchange; at which point you really have to start asking about the cost/benefit ratios.
I would _REALLY_ love to have some little javascript toy with a slider for test accuracy, some input box for 1-in-N badness rate, and then have the four buckets broken out for visualisation; but I am a backend coder and my JS-fu is weak.
And in case it's not obvious, I take issue with @paulvixie's somewhat glib assertion that "Simple procedures can be readily adopted to address the relatively small number of false positives" - for the reasons that I demonstrate above, & also in this essay: medium.com/@alecmuffett/a…
One last little addendum: let's go back to a badness rate of 1-in-10000, but let's drop the test accuracy to a more plausible 90%, what happens? Answer: you piss off nearly 1 million people per day, to prevent about 900 infringements.
Revisiting the above: let's assume the test is has a Vixie-like accuracy of 98.5% & that BADNESS IS REALLY PREVALENT: 1 in every 500 uploads are bad.
What happens each day?
- you annoy 150,000 innocents
- & stop 17900 badnesses
- 300 badnesses SURVIVE THE FILTER
Is this good?
HYPOTHETICAL: How much badness do you need, with a 98.5%-accurate test, for the False-Positive-Rate (loss) to EQUAL the Block-Rate (gain)?
Answer: about 1 in 67 postings have to be "bad" in order to break even (ignoring costs of overhead, power, CPU, etc)
…at that point you are making as many bad guys unhappy, as good guys.
Probably, nobody is happy.
HYPOTHETICAL #2 — "…but Alec, what if you are a smaller content-sharing platform and only get 10,000 new pieces of content per day?"
Anwers: at 98.5% accuracy you will piss-off about 140 to 180 users per day; results shown with badness rates of 1:500 and 1:10000 for comparison
HYPOTHETICAL #3— If you're in business, you've probably heard of "Six Sigma", a metric of quality; a friend suggested that I model it, & at SixSigma the costs of Filtering & False Positives are CLEARLY BEARABLE: about 34 FALSE POSITIVES IN 10 MILLION UPLOADS. There's one problem:
Six Sigma MEANS 99.99966% ACCURACY - ie: that the copyright-identification engine can only make 3.4 mistakes per million tests; which nicely multiplies-out in the example below. It's never going to happen in real life. en.wikipedia.org/wiki/Six_Sigma
With these same numbers (1:10,000 badness, 10M uploads) a test which is merely 99.99% accurate (a little over 5-Sigma) then you approximately have as many false-positives as badness, a 50-50 ratio:
• • •
Missing some Tweet in this thread? You can try to
force a refresh
More interesting that Facebook, I used to work on TheMine!Project*, a highly influential, much-plagiarised & ultimately unsuccessful stab at personal information stores, from 2006-2011.
If you want to know my opinion of how @timberners_lee's #Solid will impact "tech giants", watch this video (actually, x3) from 2010; the bulletpoints are:
- facebook killers, aren't
- there's plenty of room for alternatives
- first it must grow
The media loves zero-sum, david/goliath stories, and thereby often causes doom ("ello") & even tragically suicidal levels of stress ("diaspora*") to people who are foolish enough to pitch themselves/their platforms as the antithesis of "social media giantism; so do please beware.
Australia: "The Assistance and Access Bill 2018" - the people of Australia have SIX DAYS in which to register their feelings on encryption back doors: homeaffairs.gov.au/about/consulta…#straya#endtoend
A Bill for an Act to amend the law relating to telecommunications, computer access warrants and search warrants, and for other purposes #otherPurposes
A technical capability notice may require the provider to do acts or things by way of giving help to ASIO or an interception agency in relation to…
<pops open bonnet of car>
Mark: "There you go, there's the engine. 4 cylinder petrol engine" @CommonsCMS: "Where are the horses?"
Mark: "Horses?"
CMS: "We heard it's a 100 Horsepower engine."
Mark: "That's just a metaphor…?"
.@CommonsCMS: "No, we know there are horses. That engine is a black box. You're not being transparent about where the horses are."
Mark: "But that's not how cars really work…"
CMS: "Everyone knows that cars are driven by horsepower. We want to see the horses." #algorithms
Author's Note: this may sound like whimsy, but it's only a few years since I had the following conversation with a member of a London-based "civil society" campaigning organisation:
HEREWITH: a _different_ argument about why it's easier to put a man on the moon than to have backdoorable cryptography at scale. This fine article got posted by Techdirt a couple days ago…
While we're on the topic of scale: every so often I have the misfortune of having to listen to some politician or former civil servant* demanding that people "NEED TO LEARN THE VALUE OF THEIR PERSONAL DATA, GODDAMNIT!".
*eg: ex-GCHQ
This one can be quite quick:
- Facebook
- About 2 Billion users
- Annual revenue 2017: $40.653 Billion
Here's simple division as a rough guide: your data is worth about $20
About $20 per annum per user.
Let's implausibly assume that you're a heavy user, and are worth double that, so that you're actually worth $40; that means your value to Facebook would be (40/12) = $3.33/month.