First up at #VelocityConf this morning: @tammybutow on chaos engineering to increase resilience!
@tammybutow She's been doing chaos engineering since 2009. But she wants to encourage people to do Chaos Days the same way that they do Hack Days - to build resilience into teams and products. #VelocityConf
Chaos isn't about going rogue and setting everything on fire; instead, it's much more scientific and collaborative.

You need to do thoughtful, planned experiments to reveal weaknesses. #VelocityConf
Make sure you're doing things safely and that you think they have a chance of success.

Start small, using test users and staging environments. Be thoughtful about the blast radius. #VelocityConf
What chaos experiments can you run?
(1) Test infrastructure failures -- host and container level failures. Simulate cpu/io/disk bottlenecks and failures. Also test your dependencies, including your cloud provider. #VelocityConf
(1.5) simulate network failures, inviting network engineers to help you out.
(2) Application failures. Inject faults into the application code and see if it causes user-visible failures.

Full-stack chaos: inject at all layers of stack. #VelocityConf
Think about your business continuity planning. But Chaos Engineering is about doing the testing at smaller scale, more often, rather than doing BCP once a year. #VelocityConf
Think as well about Game Days (e.g. simulating or actually pulling cables from machines in datacenters), Capture the Flag, and Hack Days/Hack Weeks.

Why should we care? Think about big outages in the press. Reduce frequency of big outages. #VelocityConf
She set out to reduce outage frequency by 20% but instead had a 10x reduction in outage frequency in first 3 months by simulating and practicing until she got the big wins.

0 Sev0 incidents for 12 months after the 3 month period. #VelocityConf
How do you do this? Actually think through your dependencies and work together with them to generate action items for improving your systems.

We should think about durability and data inconsistency (c.f. @kilobitten's talk) #VelocityConf
Can you actually detect corruption? How do you know?

And are you keeping your oncall engineers fresh if you have a very low oncall load? #VelocityConf
Gain understanding of your system's weaknesses, and strengthen products pre-launch.

And involve every facet of your business in sourcing ideas for testing resiliency. #VelocityConf
You can test out new technology (e.g. k8s) and make sure that it behaves in the way you anticipate. If not, you have bugs to fix!

You can also learn how your failures impact customers. For instance, Netflix discovered that... #VelocityConf
if they gracefully degraded by not showing the top banner image, rather than showing a blank image during failure, the UI looks much better. #VelocityConf
And this can generate feature requests for cloud providers to support better failovers between regions.

Make sure your team gains the skills they need through Chaos Days. It'll take 90 days to plan your first Chaos Day. #VelocityConf
What preparation do you need to do?
(1) Know your top 5 critical systems,
(2) Have monitoring and alerting [ed: including SLOs!]
(3) know the cost of downtime
(4) Know what new things you want to battle-test.

#VelocityConf
Make sure you have the right access to perform your experiments, and that the set of people who can do it (and watch it!) is the right size.

Where are you coordinating your team? What happens if you cause a real problem? Test your physical contingencies too. #VelocityConf
It takes a team to pull off a Chaos Day -- have your VPeng or CTO around, an administrative business partner, engineering managers/directors, engineers, and new engineers and interns!

If the newest person can do it, so can everyone else. #VelocityConf
Go forth and run your Chaos Day, and let @tammybutow know how your survival story went! Let's share and learn as an industry! [fin] #VelocityConf

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Liz Fong-Jones (方禮真)

Liz Fong-Jones (方禮真) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lizthegrey

Oct 3, 2018
Final talk I'll be getting to at #VelocityConf before I dash to Toronto: @IanColdwater on improving container security on k8s.
@IanColdwater She focuses on hardening her employer's cloud container infrastructure, including doing work on k8s.

She also was an ethical hacker before she went into DevOps and DevSecOps. #VelocityConf
She travels around doing competitive hacking with CTFs. It's important to think like an attacker rather than assuming good intents and nice user personas that use our features in the way the devs intended things to be used. #VelocityConf
Read 36 tweets
Oct 3, 2018
My colleague @sethvargo on microservice security at #VelocityConf: traditionally we've thought of traditional security as all-or-nothing -- that you put the biggest possible padlock on your perimeter, and you have a secure zone and untrusted zone.
@sethvargo We know that monoliths don't actually work, so we're moving towards microservices. But how does this change your security model?

You might have a loadbalancer that has software-defined rules. And you have a variety of compartmentalized networks. #VelocityConf
You might also be communicating with managed services such as Cloud SQL that are outside of your security perimeter.

You no longer have one resource, firewall, loadbalancer, and security team. You have many. Including "Chris." #VelocityConf
Read 19 tweets
Oct 3, 2018
Leading off the k8s track today is @krisnova on migrating monoliths to k8s! #VelocityConf
@krisnova [ed: p.s. her ponies and rainbows dress is A+++]

She starts by providing a resources link: j.hept.io/velocity-nyc-2…

The problems we're solving:
(1) why are monoliths harder to migrate?
(2) Should you?
(3) How do I start?
(4) Best practices #VelocityConf
.@krisnova is a Gaypher (gay gopher), is a k8s maintainer, and is involved in two k8s SIGs (cluster lifecycle & aws, but she likes all the clouds. depending upon the day). And she did SRE before becoming a Dev Advocate! #VelocityConf
Read 29 tweets
Oct 3, 2018
Final keynote block: @lxt of Mozilla on practical ethics and user data. #VelocityConf
@lxt And also ethics of experimentation!

"just collect data and figure out later how you'll use it" doesn't work any more. #VelocityConf
We used to be optimistic before we ruined everything.

Mozilla also used to not collect data, and only had data on number of downloads, but its market share went down because they weren't measuring user satisfaction and actual usage. #VelocityConf
Read 25 tweets
Oct 3, 2018
Next up is @mrb_bk on why marketing matters. #VelocityConf
@mrb_bk Hypothesis: marketing >> code in terms of software adoption. [ed: and this is why I became a developer advocate!] #VelocityConf
You need to consider community early when developing a product.

Always ask, "Why do people matter?" "Why does adoption matter?" #VelocityConf
Read 17 tweets
Oct 3, 2018
Next up is @rogerm on O'Reilly's insights into trends with Radar. #VelocityConf
@rogerm They look at changes in search terms year on year; the two largest increases are k8s and blockchain. #VelocityConf
People are becoming less interested in broader topics and more interested in specific technologies e.g. pytorch. #VelocityConf
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(