I unfortunately have some @GCPcloud NEXT prep to do, so I need to duck out halfway through the next block of talks :/ livetweets will cut out halfway but I want to signal-boost as much of @eanakashima and @rachelmyers's talk on accidental distributed systems as I can. #QConNYC
@GCPcloud@eanakashima@rachelmyers They used to work together at GitHub and ModCloth. "So a cloud vendor and tool vendor walk onto a stage..." but they're not selling anything, they're talking about their worst work problems. #QConNYC
Turn the clocks back to 2012 at ModCloth. They were running a rails app with a mysql backend. More engineers writing more code, but deploying only once a week on Friday... more code going out every week and deployers less familiar with the code. #QConNYC
Always stepping on someone else's code and no sense of ownership. Wanted to isolate the highest reliability pages, but had no mechanism to do it aside from sharding. #QConNYC
Canonical definition of distributed system: multiple nodes not sharing memory, communicating over RPCs that take a non-trivial amount of time to cross the network. #QConNYC
To build these systems, we need to define the nouns, the patterns for accessing data, and whether different parts have different requirements. Then we draw boxes on a chart (a v1 intentionally distributed system) #QConNYC
So the first thing was breaking out a mobile app and attached it to the webapp, and we added voting and authentication services that had their own databases and also needed to talk to the main database... and then assets, etc. uh oh. #QConNYC
There was no robust model for code-sharing. Giant web of different services and tangled dependencies of data and RPCs. #QConNYC
This is obviously a 🗑️🔥 [ed: literally those two icons superimposed on the slide]. Increased the cognitive load, and made it possible to break in so many new ways. #QConNYC
This was an intentionally but poorly-built distributed system. But let's talk about accidental distributed systems. #QConNYC
A YCombinator company had mostly non-technical leadership and was focusing on plugging multiple different kinds of SaaS together (subscriptions, ecommerce, and fulfillment). #QConNYC
Information scattered across three different... (with metrics added four different) places. #QConNYC
This was a subscription box service, and they eventually needed some business logic too... all while working through SaaS vendors. They realized they needed a new app and reached out to @rachelmyers for help. #QConNYC
This was on one hand a smart thing to do in that they were focusing on their core competency and not building all of their own software. But sometimes this leads to new problems of accidental distributed systems. #QConNYC
Traditional wisdom on buy vs. build buy everything except for your core functions. But it's not quite right given Flex's experience. You sometimes outgrow your SaaS products. "Understandability is critical." You need to be able to change it when you need to. #QConNYC
Questions to ask: will it improve availability? recovery/integrity? Visibility/control? Customizability? Know when changes take effect? Audit? #QConNYC
This applies to *anything* as a service and all of our cloud infrastructure. Not only does it help our business directly, but does it improve reliability? #QConNYC
You can mitigate all of these things if you build it all yourself on IaaS. with platforms and backends as a service, you give up control. Transparency is important. #QConNYC
For functions as a service, they're great options for varying/irregular load. Before you start doing anything, make sure they improve your reliability rather than making it more fragile. How will you know whether a function ran correctly? #QConNYC
.@rachelmyers is asking all the right questions about cloud functions from a reliability point of view. You have to figure out your observability/control story. #QConNYC
Functions are a distributed system that you don't fully control; you need to plan for failure and work around their limitations. #QConNYC
What most reduces cognitive load for operators and developers? They should have APIs that are easy to understand and use for development. However, the system interactions become tricky the more technologies you combine. #QConNYC
So many different systems, oh my.
Surprising distributed systems: browser based systems (the HTML/CSS/JS) especially single page apps wind up being a part of your distributed system too! #QConNYC
They're not just dumb clients, they can DDoS your system. Had to look at the devtools to understand why API requests for a particular endpoint were surging. #QConNYC
It turns out that logic for infinite scrolling was broken, and was, without throttling, repeatedly repeating the same page on a loop until you scrolled to bottom of page. #QConNYC
"We weren't ready to think of the browser as part of our system, and didn't invest in browser debugging tools because we thought it was a simple client we pushed HTML to." --@eanakashima#QConNYC
Where you have service workers, frontend complexity increases even further. We need the same observability toolchain serverside to work clientside in the browser too. The edge is expanding further back, even beyond the CDN or loadbalancer. #QConNYC
Other secret distributed systems: vendors like LaunchDarkly. But they've made sure their APIs have sensible defaults so that if they fail, the defaults kick in, etc. #QConNYC
Instrumentation rule number 2: wrap third-party calls, so that we can instrument them. #QConNYC
.@rachelmyers is describing Github's support tool, which I'm intensely interested in as the person who wrote the @puzzlepirates support tool, but have to dash now :( -- will have to catch the video. [to be continued] #QConNYC
And you can follow @micheletitolo for the rest of the talk livetweets!!
Final talk I'll be getting to at #VelocityConf before I dash to Toronto: @IanColdwater on improving container security on k8s.
@IanColdwater She focuses on hardening her employer's cloud container infrastructure, including doing work on k8s.
She also was an ethical hacker before she went into DevOps and DevSecOps. #VelocityConf
She travels around doing competitive hacking with CTFs. It's important to think like an attacker rather than assuming good intents and nice user personas that use our features in the way the devs intended things to be used. #VelocityConf
My colleague @sethvargo on microservice security at #VelocityConf: traditionally we've thought of traditional security as all-or-nothing -- that you put the biggest possible padlock on your perimeter, and you have a secure zone and untrusted zone.
@sethvargo We know that monoliths don't actually work, so we're moving towards microservices. But how does this change your security model?
You might have a loadbalancer that has software-defined rules. And you have a variety of compartmentalized networks. #VelocityConf
You might also be communicating with managed services such as Cloud SQL that are outside of your security perimeter.
You no longer have one resource, firewall, loadbalancer, and security team. You have many. Including "Chris." #VelocityConf
The problems we're solving: (1) why are monoliths harder to migrate? (2) Should you? (3) How do I start? (4) Best practices #VelocityConf
.@krisnova is a Gaypher (gay gopher), is a k8s maintainer, and is involved in two k8s SIGs (cluster lifecycle & aws, but she likes all the clouds. depending upon the day). And she did SRE before becoming a Dev Advocate! #VelocityConf
"just collect data and figure out later how you'll use it" doesn't work any more. #VelocityConf
We used to be optimistic before we ruined everything.
Mozilla also used to not collect data, and only had data on number of downloads, but its market share went down because they weren't measuring user satisfaction and actual usage. #VelocityConf