Next is my colleague @rakyll on distributed tracing!! #VelocityConf
@rakyll In a big city, you learn to deal with large scale and navigating your way around, in an environment that has a lot of chaos and data.

And sometimes things don't go according to plan if you encounter construction, etc. along the way. #VelocityConf
It doesn't matter where the error happened, from a user's perspective, it's a failure to them.

Maybe we're doing better at @GCPcloud, but there are lots of opportunities to improve still, says @rakyll. #VelocityConf
Our systems tend to grow over time. You can initially fit them into your brain and explain to someone else in a single whiteboard session.

But as time elapses, teams shard and diverge. #VelocityConf
and then you have the moment that the monolith doesn't scale. So you go into microservices, and a myriad of different storage systems.

These are the problems:
(1) one problem becomes many,
(2) failures are distributed,
(3) it's not clear who's responsible. #VelocityConf
"Who has time to understand everything in such a large company?" --@rakyll

Maybe @JeffDean understands everything? haha, actually no. We don't depend upon one person to understand everything. #VelocityConf
So can we solve this problem by deleting all the code? No.

Our VPs and SVPs aren't very technically involved, people leave over time as they burn out...

And documentation won't save us. Engineers really don't like to document. #VelocityConf
cs.corp/ is an amazing code search tool, but only works if you have a starting entry point like a symbol to look up.

But if you don't have that starting point, good luck.

You can also see our code dependencies in BUILD/bazel files. #VelocityConf
Static analysis can tell us about code dependencies but not runtime dependencies.

None of these tools are good in pointing out what critical paths + dependencies require most attention, too much noise. Can you tell the difference between optional & mandatory? #VelocityConf
.@rakyll is introducing the idea of "Critical Path Driven Development". We need to focus on the availability of our critical paths that users are engaging with. [ed: we'd call this a critical user journey used to create an SLI in SRE land]. #VelocityConf
We need to discover our critical paths, make them reliable and fast, and make them debuggable. Can we figure things out even if we're not familiar with systems when we get paged? #VelocityConf
There are two ways of making things debuggable: either event collection or tracing. But they're maybe synonyms.

We can repeatedly ask why. [ed: or repeatedly ask how?]

Traces and events are a tool for letting us go deeper into our stack. #VelocityConf
We can learn about the life of a request through traces, and use data from production to debug issues affecting our users.

We no longer need to know the implementation details of everything in the stack; instead, we can pinpoint the problem. #VelocityConf
and we can figure out with high confidence whether we need to escalate to another team if an issue is in a dependency or framework rather than in our own code. [ed: and give that team enough data to debug, itself!] #VelocityConf
This is just the beginning though: Traces can unify context by pointing us at all the right places. e.g. being able to directly look at the source code for the failing or slow span, or find out who's responsible for each service we call, in the right context. #VelocityConf
and we can not just view information, but dig deeper e.g. turn up the trace rate on the fly if we see something we need more data on.

But this doesn't come for free and we need to invest in it. This is an organizational challenge. #VelocityConf
We need to propagate our identifiers and record data (including our loadbalancers and CDNs). There's no industry standard yet for the problem.

w3c has a distributed tracing proposal, but the formats aren't fully standard yet. #VelocityConf
(2) You also need to know where to start. if you have a framework for RPC or HTTP you can start there.
(3) infrastructure is still a black box, and doesn't fully support observability.
(4) instrumentation is expensive, especially on high-traffic systems; sampling. #VelocityConf
(5) Dynamic capabilities are underestimated. Right now it's out of the reach of most companies to dynamically tweak sampling rates, collection, etc.

These are everyday tools for some of us [ed: hiii!] but not for others; it's a differentiator when you can get it. #VelocityConf
CPDD is a tool for closing knowledge gaps. If you're building or providing infrastructure, let's change the status quo and close the knowledge gaps, says @rakyll! [ed: who was brilliant!!! <3] [fin] #VelocityConf

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Liz Fong-Jones (方禮真)

Liz Fong-Jones (方禮真) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lizthegrey

Oct 3, 2018
Final talk I'll be getting to at #VelocityConf before I dash to Toronto: @IanColdwater on improving container security on k8s.
@IanColdwater She focuses on hardening her employer's cloud container infrastructure, including doing work on k8s.

She also was an ethical hacker before she went into DevOps and DevSecOps. #VelocityConf
She travels around doing competitive hacking with CTFs. It's important to think like an attacker rather than assuming good intents and nice user personas that use our features in the way the devs intended things to be used. #VelocityConf
Read 36 tweets
Oct 3, 2018
My colleague @sethvargo on microservice security at #VelocityConf: traditionally we've thought of traditional security as all-or-nothing -- that you put the biggest possible padlock on your perimeter, and you have a secure zone and untrusted zone.
@sethvargo We know that monoliths don't actually work, so we're moving towards microservices. But how does this change your security model?

You might have a loadbalancer that has software-defined rules. And you have a variety of compartmentalized networks. #VelocityConf
You might also be communicating with managed services such as Cloud SQL that are outside of your security perimeter.

You no longer have one resource, firewall, loadbalancer, and security team. You have many. Including "Chris." #VelocityConf
Read 19 tweets
Oct 3, 2018
Leading off the k8s track today is @krisnova on migrating monoliths to k8s! #VelocityConf
@krisnova [ed: p.s. her ponies and rainbows dress is A+++]

She starts by providing a resources link: j.hept.io/velocity-nyc-2…

The problems we're solving:
(1) why are monoliths harder to migrate?
(2) Should you?
(3) How do I start?
(4) Best practices #VelocityConf
.@krisnova is a Gaypher (gay gopher), is a k8s maintainer, and is involved in two k8s SIGs (cluster lifecycle & aws, but she likes all the clouds. depending upon the day). And she did SRE before becoming a Dev Advocate! #VelocityConf
Read 29 tweets
Oct 3, 2018
Final keynote block: @lxt of Mozilla on practical ethics and user data. #VelocityConf
@lxt And also ethics of experimentation!

"just collect data and figure out later how you'll use it" doesn't work any more. #VelocityConf
We used to be optimistic before we ruined everything.

Mozilla also used to not collect data, and only had data on number of downloads, but its market share went down because they weren't measuring user satisfaction and actual usage. #VelocityConf
Read 25 tweets
Oct 3, 2018
Next up is @mrb_bk on why marketing matters. #VelocityConf
@mrb_bk Hypothesis: marketing >> code in terms of software adoption. [ed: and this is why I became a developer advocate!] #VelocityConf
You need to consider community early when developing a product.

Always ask, "Why do people matter?" "Why does adoption matter?" #VelocityConf
Read 17 tweets
Oct 3, 2018
Next up is @rogerm on O'Reilly's insights into trends with Radar. #VelocityConf
@rogerm They look at changes in search terms year on year; the two largest increases are k8s and blockchain. #VelocityConf
People are becoming less interested in broader topics and more interested in specific technologies e.g. pytorch. #VelocityConf
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(