Many incidents happen during or right after the release.
Here are some ideas to improve MTTR and on-call considering this argument
[thread] #OnCall#MTTR#DevOps
If to-be-updated services are not covered in the on-call rotations, fail your delivery pipeline.
Add a temporary 15min on-call rotation covering the developers who have issues in the release. Most probably they have something to do with the upcoming alerts.
Make it easy for the on-call to see the latest issues sent to production - enrich alerts. For example, trigger a webhook for new alerts and if there is a new version in the last our, attach build & issue details into the alert details.
Let the on-call do the release if your releases require a final human approval. In this scenario; on-call, when necessary with the developers, does the release. Then, lets developers know and do the after release checks.
I'd love to hear comments and ideas on this, especially from the folks who advocate for developers being on-call @mipsytipsy@copyconstruct :)
• • •
Missing some Tweet in this thread? You can try to
force a refresh