Going to tipsy-tweet #SREcon lightning talks, probably not at my usual level of detail. Expect more editorializing and heckling!
p.s. see me up front if you want cider, the hotel doesn't seem to be serving it, and @ingridavendano and @Ana_M_Medina hooked me up with a 6-pack last night on a Safeway run. My win is your win too. 2 is my limit, so 4 are up for grabs. #SREcon
Also, obligatory note that it is 100% okay to not drink. Fuck alcohol culture etc. #SREcon
Ooh, do I hear word that there might be Ignite Karaoke? This is gonna be fuuun. #SREcon
This talk is about shoelaces, and teaching you how to tie shoelaces. Apparently. Feel free to play along so that we can meet the 15 second auto-advance? #SREcon
Apparently the surgeon's knot is a much better way of tying shoes. You'll get an hour back out of teachers' (or parents') days, every single day, for their careers. #SREcon
next: @stormlash from Mailchimp is teaching us how to deliver technical presentations? like how to make big gestures and modulate one's voice. Or have your slides malfunction. #SREcon
Haha, it was a scripted animation. :D :D :D we got trolled HARD #SREcon
This was a super funny talk but loses the essence in a livetweet because it has to be seen live :) bravo. #SREcon
Next: @craigknott92. Telling a SRE story about JIRA/Confluence Cloud. Lots of toil. #SREcon
Autofix ALL the things instead of receiving ALL the alerts! Use agents colocated with containers. Simple plugins. #SREcon
But collect the data so you can long-term fix rather than letting them band-aid over and over repeatedly. #SREcon
Next up Andrey Falko of Salesforce, who has an annoying legal forward-looking-statement slide required by a lawyer. #srecon
Changing base OS bits is hard. Dockerfile image updates let us do incremental builds and layering. #SREcon
Applause for the next speaker who is a first-time speaker!!! :D #SREcon
Alerting tips from @therendeye!!! Alerts are critical for operational success. #SREcon
Should a human or machine react to this alert? If you're not responding, it's not important. #SREcon
Make your alerts understandable. If they're critical, you should spend time when writing the alert to explain. #SREcon
.@dannygnc next on "shoulder tap" interruptions after some awful speaker feedback. #SREcon
[ed: did you see what I did there?] Unplanned activity, interruptions, also known as toil. #SREcon
[ed: I'm going to snark a bit here. do NOT make your junior engineers do your entirely manual work. unless tipsy-me is misinterpreting the speaker. "Quality Reliability Engineering" as junior SRE seems off to me] #SREcon
Next up: some folks from Wayfair - Hemant and someone else whose handle isn't on the slides, rawwwwwr. stop making @bridgetkromhout sad. #SREcon
Orchestration problems. Cleaning up snowflakes and technical debt/dead code. #SREcon
Capacity planning is needed in public cloud too. [ed: yes, this. they're physical servers under the hood, and also your CFO wants to know how much it'll cost]. #SREcon
Next: David and James from Atlassian. [ed: and now the speakers have all found out how to fill in with product pitches the time between presentations that was too much!] #SREcon
Stride: slackalike from Atlassian, who have decided to build a culture of reliability into the product and team producing it. Thus SRE. Joint oncall with first-time-oncallers from product dev SWE. "TechOps" #SREcon
Real-time chat needs to be realtime. Must be fast, measure using throughput. Must be reliable, so measure reliability. Reduce alerting, so track whether items are actionable [ed: and urgent, and non-repeat, and investigated each time] #SREcon
[ed: p.s. the Atlassian art team is KICKING ASS HERE. I wish that SRE at Google had dedicated artists :)] #SREcon
Record when you miss your objectives and how you will stop them from happening again. Otherwise it's just a status report. #SREcon
Identify measurable goals, and measure them on a recurring basis. review as a team and take action [ed: Google SRE teams do this with weekly production meetings joint with product dev SWE] #SREcon
Next up is Janardh from LinkedIn about how to audit your data replication. #SREcon
If you run active-active, you might have writes accepted in multiple places, with realtime replication. [ed: this talk is going to force me to rehash trauma about Bigtable replication watermarks, isn't it?] #SREcon
Get an ETL flow of data from your sources. Audit processor needs to keep track of last modified timestamps e.g. for conflict resolution. #SREcon
[ed: too tipsy and tired to actually process this talk. deeply technical talks and tired/tipsy me do not mix]
"We want to be able to make or break the market." 🤔🤔🤔🤔 #SREcon
[ed: please don't break my market, the only way I can pay for my habit of helping fund the LGBT+ movement is through selling equity for good prices]
Need to balance toil/ops and project work, and keep boredom away [ed: but some people *do* like to concentrate and go heads-down. but not fucking me, that's for sure. until I want to write a talk 🤔🤔🤔] #SREcon
Embrace failure, let people make mistakes, psych safety, oncall and time off. #SREcon
Oncall onramp: reverse engineering the service, sandbox, shadow, then oncall for real. #SREcon
You tend to find the skeletons in the closet when you start at a new company. She likes pickles [ed: I wonder why 🤔🤔🤔 🏳️🌈🏳️🌈🏳️🌈] #SREcon
"one last pickle" consultancy. config analysis, consulting on tuning cassandra clusters. vNodes. #SREcon
Fix was released in 4.0 but needed backporting. In theory you could patch, fix merge conflicts, ???, profit. Not reality. What's a better way? #SREcon
Pickled cassandra: started with a patched cassandra, then had it backported upstream. and now it's deployable. #SREcon
Work with consultancies that contribute upstream rather than forking. Pay your partners, tip your wait staff. #SREcon
Next up, @aspyker, who is apparently worried I will heckle him here. #SREcon
He's explaining why line managers of SRE teams should be oncall. #SREcon
Titus as Netflix's container orchestrator. Team grew from 3 to 4 individual contributors, @aspyker became the product mgr and then eventually the mgr. Was necessary to stay oncall initially due to small team. But once up to 6 eng, more flexibility. #SREcon
Okay, now I'm going to troll you @aspyker. Why is your team in the photo all men? #SREcon
Manager learns every 7 weeks how burnt out or not you are after being oncall. And as a user and product manager you learn what the right stability balance is. [ed: why not use SLOs and toil metrics for that?] #SREcon
Helps improve recruiting (confidence to talk to recruits about it) and tooling. During the day, manager shouldn't be oncall because of meetings and looking after careers [ed: YES YES YES] #SREcon
[ed: the thing that I think is under-discussed as a risk is the chance that you might under-invest in the one job you can't delegate, management] #SREcon
"Oh shit that's moving fast." And the speaker instantaneously snaps to not mumbling. Nothing like a little shock to get people more excited and less tired! #SREcon
Simplicity is bliss. [ed: and our monorepo and static linking agrees SOOO much. We know not everyone can do it, but it's great when it works.] #SREcon
Read about PEP411 for python executables in the form of a zip file. [but you still have to package python runtime] #SREcon
Also possible in Ruby and NodeJS. Shoutout to NPMJS. #SREcon
Build systems: pants, bazel, and buck can solve these problems for you. Or just ship a container? Nope, you can't do that because of fuzzy boundaries. #SREcon
And now the final lightning talk by @jschnip. 35=>60 people at Choozle. Reached gender parity. #SREcon
[ed: very interested to hear whether they've had success with racial equity and retaining women of color] #SREcon
How do we know job descriptions are bringing us the right candidates? Risk-averseness and single paths to the top are problems. #SREcon
We don't work in factories. We have creative, non-predictable jobs. #SREcon
[ed: removing job titles can cause problems with being able to assess problems with pay equality, compensation setting, etc.] #SREcon
Self-managing teams. [ed: curious to hear what the HR story is here, as someone currently trying to fix bloody HR.] #SREcon
Ensure people can learn from new perspectives that people can learn from. Recruit people with different stories, rather than more of the same mirrortocracy #SREcon
[ed: I have many unanswered questions, but will follow up offline.]
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Final talk I'll be getting to at #VelocityConf before I dash to Toronto: @IanColdwater on improving container security on k8s.
@IanColdwater She focuses on hardening her employer's cloud container infrastructure, including doing work on k8s.
She also was an ethical hacker before she went into DevOps and DevSecOps. #VelocityConf
She travels around doing competitive hacking with CTFs. It's important to think like an attacker rather than assuming good intents and nice user personas that use our features in the way the devs intended things to be used. #VelocityConf
My colleague @sethvargo on microservice security at #VelocityConf: traditionally we've thought of traditional security as all-or-nothing -- that you put the biggest possible padlock on your perimeter, and you have a secure zone and untrusted zone.
@sethvargo We know that monoliths don't actually work, so we're moving towards microservices. But how does this change your security model?
You might have a loadbalancer that has software-defined rules. And you have a variety of compartmentalized networks. #VelocityConf
You might also be communicating with managed services such as Cloud SQL that are outside of your security perimeter.
You no longer have one resource, firewall, loadbalancer, and security team. You have many. Including "Chris." #VelocityConf
The problems we're solving: (1) why are monoliths harder to migrate? (2) Should you? (3) How do I start? (4) Best practices #VelocityConf
.@krisnova is a Gaypher (gay gopher), is a k8s maintainer, and is involved in two k8s SIGs (cluster lifecycle & aws, but she likes all the clouds. depending upon the day). And she did SRE before becoming a Dev Advocate! #VelocityConf
"just collect data and figure out later how you'll use it" doesn't work any more. #VelocityConf
We used to be optimistic before we ruined everything.
Mozilla also used to not collect data, and only had data on number of downloads, but its market share went down because they weren't measuring user satisfaction and actual usage. #VelocityConf