LeadTime#18 - Improving Release Chaos Without Creating More

How to modernize a stressful release process? What questions can help prioritizing actions and balance quick wins with sustainable change?

May 22, 2025

Hi, this newsletter is a weekly challenge for engineers thinking about management. I'm Péter Szász, writing about Engineering Leadership and training aspiring and first-time managers on this path.

In this newsletter, I pose a weekly EM challenge and leave it as a puzzle for you to think about before the next issue, where I share my thoughts on it.

Last Week's Challenge

Last week, I described a challenge where a new Engineering Manager is tasked with fixing a web development team's stressful monthly releases, with developers burning out from branch merging struggles and frequent emergency rollbacks. Read all the details here if you missed the post.

Goals to Achieve

Show progress. The deployment process needs fixing - it's unsustainable. While I need to balance analysis with action, I need to show continuous improvement - this is explicitly what I was hired to deliver within the quarter, so of course it’s one of my main goals.

Gain the team’s trust. However, without gaining the team's trust first, technical changes will be harder to implement. These experienced developers have seen improvement attempts fail before. Gaining their confidence is the first step to everything else.

Learn why things are this way. The current system evolved for reasons. Before changing it, I need to know those reasons. Understanding why things are this way helps me come up with better solutions that address root causes — and avoid pitfalls others fell into.

Establish clear metrics. I need to set up a baseline of metrics that matter to both the team (reducing stress) and the business (improving delivery). These metrics will guide prioritization, make progress visible, and serve as objective success criteria.

Address burnout. The current release process is probably the main contributor to their low engagement, but what other factors might be at play? Are there deeper issues around autonomy, mastery, or purpose that need addressing?

Risks to Avoid

Over-indexing on action. Rushing to implement changes without understanding context or building trust has a huge risk of failure. Trying to force textbook CI/CD processes on an organization I don't yet understand risks creating solutions that won’t work, and further damage team morale.

Under-indexing on action. Equally bad is analysis paralysis, spending too much time analysing without doing. This risks losing the director's confidence and gradually accepting the status quo. Making changes is easiest when you're new, and this advantage fades with time.

Ignoring team expertise. These are experienced developers with reasons for their current practices. Disregarding their knowledge means missing valuable insights, and also, the IKEA effect matters - people give more support to what they helped build.

Forcing industry best practices. Assuming off-the-book standards will work unchanged in this context is dangerous. Every organization and every product is different. There must be some legitimate reasons why their system evolved like this.

Wrong order of improvements. Some changes depend on others. We need to find steps that have a demonstrable impact while still building necessary foundations. Working for months on invisible infrastructure without showing regular progress will kill trust in the process.

Big bang migrations. Creating parallel systems that aren't used in production means building on assumptions without real feedback. Without testing changes in production and learning from results, we risk building the wrong solution. Plus, migrations are risky - incremental improvement is often safer and is a better way to demonstrate progress.

5 Questions

1. Why is the current system structured this way? How and why did previous improvement attempts fail?

Are monthly releases required by business cycles, regulatory needs, or technical limitations? When previous managers tried to improve things, what went wrong? Which constraints can I challenge, and which are not possible? Getting this context helps to better understand the current status quo and avoid repeating past mistakes.

2. What's one small, concrete improvement I can implement quickly?

Finding a quick win builds momentum and trust. This could be automating release notes, writing down an undocumented process, simplifying rollbacks, or removing a manual step everyone hates. The ideal first change addresses a clear, universally shared pain point without requiring massive changes, but it’s small enough to avoid having to rely on organizational context and interconnected systems.

3. What metrics should we track to measure improvement?

I need to find and start recording the metrics that best represent the current problem, both technically and business-wise. DORA metrics can be a good starting point, but developer experience matters too, so maybe I’ll need something more holistic, like a version of the DX Core 4 framework. Either way, I need to involve stakeholders and developers in validating my choice to ensure these metrics connect to the business impact we want to make.

4. How does the rest of the engineering organization operate?

Do other teams face similar challenges? Have some solved problems we're still struggling with? Finding internal examples of better practices provides practical blueprints and credibility. Solutions that worked elsewhere in the company are easier to implement.

5. What are the team's specific fears about changing the process?

Is there a lack of trust in other developers' code that makes them hesitant about trunk-based development? Are they worried that CI/CD will remove important manual checks? Understanding specific fears helps us address concerns directly and design solutions that maintain what works.

Finding the Balance

Let me summarize: The core challenge isn't choosing between technical solutions, it's balancing action with observation, and managing the change process itself.

Did I miss any important considerations? What approaches have you found effective when introducing deployment improvements to resistant teams? Let me know in the comments!

This Week's Challenge

Your team just completed a major product launch that went poorly. The new features had multiple critical bugs that weren't caught in testing, resulting in customer complaints and an emergency hotfix that kept several developers working through the weekend. The product team is frustrated about damage to user trust, executives are asking pointed questions about what went wrong, and your team is demoralized and defensive.
You’re under pressure from multiple directions: you need to quickly identify process failures, regain stakeholder trust, rebuild team confidence, and most importantly, prevent similar issues in the future. All while the product roadmap is still ambitiously planning to deliver the next major feature in six weeks.
What do you do?

Think about what your goals would be and what risks you'd like to avoid in this situation. I'll share my thoughts next week. If you don't want to miss it, sign up here to receive those and similar weekly brain-teasers as soon as they are published:

Until then, here's a small piece of inspiration slightly related to this week’s challenge:

"Process is a tool to make it easy to collaborate, and the process that the team enjoys is usually the right process. If your process is failing somehow, it’s worth really digging into how it’s failing before you start looking for another process to replace it." - Will Larson: An Elegant Puzzle

See you next week,

Péter

Lead Time - Engineering Management Challenges

Discussion about this post