This article is part of the series Your Company Will Never Be Agile. It is a case study about a company that successfully introduced devops to a big team.
Some years ago, Dynatrace faced the “innovators dilemma”: They had and continue to have a software product that sells very well and grows fast, but they faced the challenges to converge multiple products into one and decided to start green-field to not just converge but actually leapfrog with new approaches such as artificial intelligence to monitoring.
You always need pain for change. Bernd Greifeneder
Their major pain was slow feedback: They had 2 major releases per year. But when they realeased the software, customers would not upgrade immediately. Some would wait for 3 months or more until they installed the new version. So it took a year or more until product management got feedback from real users. At this point, the developers were working on completely different things.
They needed faster feedback. In order to achieve this, they had to move away from software that you could install on your own server - They had to move to “Software As A Service”. And they needed a different organizational structure.
So they created a second product - a new generation that almost competed with their first product, but that was positioned towards most modern cloud technologies at first! Not every management is willing to take that risk. But it worked: Now they have 2 very successful products, which are both growing.
Kill the QA Team
Even before that, when the company was much smaller, Dynatrace faced a groth pain. They started to become more agile, and they had a QA department that did mostly test automation but also some manual testing. And they could never catch up. They had to develop test tools and to test and automate tests after a feature was done. As the company grew, this became worse.
So Bernd, together with his teams, decided that a QA department cannot work when you try to move fast. They eliminated the classic approach of QA entirely and instead created a new department “TA” - “Test Automation”, and this new department would only work on test automation tools and help teams automate their tests. They do not test or automate tests.
Developers have to automate all tests as they go. This means more responsibility for developers - They have to make sure that what they implemented works and that it also will work in the future. And it allows you to move faster, since the test automation is done when the feature is done.
After Dynatrace was bought, Bernd has also rolled out this approach to QA globally to other teams in the company, and it was always more successful than the old way.
This initiative was again triggered by pain: Slow feedback, which this time came from the slow test automation after a feature was complete.
Zero to Devops
The pain Dynatrace felt was slow feedback, again: When they started with the SaaS product, their operations team sometimes needed weeks to react. Everyone was unhappy. So the CTO decided: “We need to be able to implement a bugfix and deploy it to production in less than one hour”. DevOps at Dynatrace was born.
At this point, the company had already undergone other major changes: They had already “killed” their QA department, and they had established a culture of continuous integration and continuous testing. Also, Dynatrace had a culture that embraced failure (fail fast, fix what’s wrong, learn from it) from the start. So everyone had the feeling: “We can accept this new challenge”.
To achieve this goal, they had to move operations closer to development. Developers would do all operations for the application / the cluster themselves, and automate everything. Since QA was already integrated in development, after integrating operations they had turned the organizational structure by 90 degrees: Instead of different silos for different functions, they had cross-functional groups that were responsible for a delivery from end to end.
They still had a 24h operations team (which exists in the parent company anyway), but step by step they automated all manual operations procedures. And now, their only remaining procedure is: “If the cluster cannot heal itself, call someone from development”. So now they have fully automated QA, deployment, and application operations, and don’t need the 24h operations team anymore.
There are extreme feedback devices all over the company: The Dynatrace ufo. These are lamps that show all kinds of pipeline problems using a color code.
After the CTO started this initiative, 150 developers in two countries were immediately affected. They faced big problems at the start, and it took them some time until they honed out all of the problems, but now they are hugely successful with it. With even more developers.
Executive Support
I think one of the reasons for the success of those initiatives is that they have support on multiple levels of the hierarchy. They often start with the CTO wanting something, and he then finds supporters in product management, team leaders and developers to achieve his goals.
Bernd told me that, when you do a new project, you have to fully rely on automation from day one. This means that you’ll have very high initial costs, but they will pay for themselves at some point. But you have to get a budget for this, and he sees his task as a CTO to make sure this budget is available.
The way of the engineer: Solve all problems in a sustainable way. Bernd Greifeneder
The DevOps initiative started with Bernd putting a stake in the ground: “We need to be able to implement a bugfix and deploy it to production in less than one hour”. Most told him that this is impossible, but he found enough people from product management, development and team leaders that were willing to try. So they started with DevOps. With a team of 150 developers who were all immediately affected, half of them in another country. And even though it was “a mess” at first, after some time, it was hugely successful.
At one point, Bernd decided that the feedback from their smoke tests was too slow: “One hour for smoke tests is unacceptable. Our new target is five minutes.” And then he gave the TA department a lot of time and money to change the testing technology stack in order to achieve this. Over a year later, they were at ~8 minutes.
Key Takeaways
You need long-term goals: Top management has to put a stake into the ground from time to time (“We need to be able to implement a bugfix and deploy it to production in less than one hour”). And then make sure everyone understands this goal.
You need support on all levels of the hierarchy: In order to achieve those goals, you need some first supporters on all levels of the hierarchy and in all departments.
Allow and embrace failures: Failing is OK as long as you fail fast, correct what’s wrong and learn from your failure.
You have to turn your company structure by 90 degrees: To move fast, you cannot have the traditional departments (silos) for Development, QA, Product Management, Operations. You need cross-functional groups that are composed of all those functions.
Everyone needs a basic understanding of “Agile”: I asked Bernd if it is an advantage when managers are engineers. He said “It is an advantage, but it is clearly not required. However every manager has to understand some basic agile principles: Why we need to move fast, why we need very fast feedback, why it makes sense to invest money in solving probelms in a sustainable way”.
You really have to invest time and money: You have to solve problems in a sustainable way: You have to automate everything. You have to work on faster feedback - constantly. This takes time and money. But it will pay for it at some point.
Lead, follow, or get out of the way: Everyone has to know that this is the new way, and that it will stay like this for a long time. That they cannot wait and do nothing until the next way will be announced. That they have to follow this way when they want to stay with the company. Top management has to put a stake into the ground.