The key values behind DevOps

Since may years during my career I came across all these tools and frameworks: Waterfall, V-Modell, SCRUM, Kanban, LESS, SAFe, Lean and then this mystical DevOps thing (not to forget DevSecOps). I have seen managers pushing in SCRUM trainings on top of people, I’ve seen teams building large walls of Kanban boards with hundreds of sticky notes… and some weeks later the shiny yellow was grayed out and the board was abandoned. In addition, you see consultants rushing in, telling everyone that the way of work is oldish and things must dramatically change, combined with a stack of best practices that worked so well in so many other places. You just need to apply all this and you are good!

Sounds familiar? Guessed so.

Just recently I read an article that had some good arguments why SCRUM is a bad idea for some teams. You will also find articles, case-studies and experience reports that perfectly explain why Kanban, Waterfall, V-Modell and Lean also do not work… Why is this? The entire SW-Engineering community is so keen about agile practices that it seems to be a must have to be successful. And it’s proven, a lot of really successful companies are using agile methodologies and frameworks. But is then the assumption right that these methodologies are the essence of the success and copying them also will cause success at home? I think that is a misconception, because the need and (forceful?) introduction for such processes and frameworks is actually not the cure, but a symptom. I’ll explain why I think that’s true:

Looking back to the beginning of my work as “professional” SW-Engineer (means I spent 12 hours a day on coding) the world was like this for me: I worked in solution business – logistics domain – automation of transportation and sortation… Here we had a platform that provided a good set of common functionality but still every customer project had a lot of custom development. Later in this business, I also had the rare chance to build a new (more lightweight) platform from scratch for the same business, in a slightly different scope. I cannot remember a single project in these 9 years where I and my team encountered significant delays or massive loss in revenue. We had problems like all SW teams have, but bottom line all this worked out and customers were in general happy (of course, ups and downs on the way). Specially this new platform development, we were like 5 people, was great. Built from scratch, we delivered the first customer project for a UK based facility after just 8 months – including building the platform (MVP) and the customization efforts towards full contract coverage. And customer acceptance testing went through without any major issue.

Now guess which process model we used? Waterfall? Yes, the contract content was defined upfront, end-date was clear (with penalty), so pretty much waterfall. But: As usual the upfront contract was something like a 40 pager and of course not all details were in there – as all projects this also revealed problems and challenges on the way and we addressed them in maximal agile ways. Funny here, we also found out that the value of Gant charts ends in the moment the contract got signed… Mentioning this – SCRUM and its derivatives were completely unknown to me at this time. When I hear people talking about agility and what it all means, I tend to send these people to a commissioning site of a large automation facility when integration and system testing is due. Our release process during commissioning was – in extreme days – 1 hour. Testing – Problem – Code Fixing – Build / Auto-Test / Manu-Test – Deploy to production – Testing. In some cases, it needed a day or so, when the problem was harder. Working in a 2-week sprint seems awkward here, right? You may now say… hacker! No quality there, no way to build a complex product like this, not to mention a platform. Wrong! The systems needed to run 24/7 – outages led to service requests and loss of money. So, you can assume that all code was tested to a level that it allowed production use in a customer facility that costs triple-digit million euros. Of course, it is not like product development with massive test automation for all corner cases, but it was good enough. And trust me, in the platform parts we could not afford major problems.

Why could we succeed without even knowing what lean, scrum and devops even means? Looking back, it is easy to say:

The team was rather small, so communication did not generate a lot of overhead
The senior team members were business domain experts with experience from several prior projects, combined with younger colleagues who were eager to do something cool
The entire team was responsible for the entire success or failure on business side
Everybody was exposed to the consequences of getting something wrong – all the time.
Feedback was super direct and intensive (when you spent 2 nights awake because your programmed a race condition and your program just crashes in production in front of the customer – that is feedback you consider).
Upfront mistakes led to longer time on site – away from family and friends

As a result, there was a good sense in people about what is important at which point in time. There was a natural need to discuss interfaces to external suppliers or partners early to allow enough lead time to the commissioning phase for both parties. There was a natural need for early integration and testing while SW grew to not end with a big-bang integration mess when you traveled and you sit with your laptop in the dirt. There was a natural need to have a constant interaction with the customers about how it should actually be crafted so that it fits his needs and business processes (again along user interfaces and data interfaces). So it was a constant collaboration thing where information was weaved into the current state of the solution in the moment it arrived. When things were unclear we pursued the issue to clarify it. Because everybody knew that it will be a big mess if you clarify too late.

So… If at this time someone would have come by and said: Hey – if you use Scrum, you would be even better than we would have found that the fundamental thinking behind scrum was in the team just naturally. Some of the process elements were not there explicitly, but that did not do any larger harm.

OK, what now? It is clear that things are different when an organization grows and when you do product business instead of solution business. When you need 150 developers, 20 product managers, armies of support and service people and so on, you will see Silos develop. Product Management here, R&D there, Operations next building, service and support a few thousand kilometers away. In such a situation it is clear that organization need to compensate the communication problem with processes and rules. And there are those that work for an organization and those which don’t. BUT: Putting processes on top of such an organization – even when SCRUM is applied really good – often fails to address the fundamental success factors. Which are outlined above.

I herewith propose that BEFORE an organization discusses the next process that’s being rolled out to all people some thinking should go into the question how to maximally expose all team members to the consequences of their work and keep the area of responsibility for an individual as clean and small as possible. I truly believe that a person will only then really want to collect early feedback when there is a consequence to his/hers daily live if he/she does not get early feedback. Just telling people, it is important to have early and constant feedback and provide tooling around it, is not enough.

Let’s take DevOps as an example. In my case outlined above we were a small team. So we developed SW, used some middleware, installed and configured OSes, Databases, Application Servers and everything else. We managed our Dev tools, build scripts, test code, packaging and delivery tools and all this with the help of the corporate functions. In large organizations this is typically split into individual specialized groups which often leads to these Silos: Development here, Operations there. The argument is often that it is more efficient this way (Skill development, hiring, …). Hand-over and over-the-fence-throwing however, demand heavy processes to allow a smooth end-to-end-everything-works-together story. Then people invented DevOps to tell the story that we should come to a situation where Developers and Operations people sit together as one team and do things hand in hand because it reduced the loss and burden of handover. Hell yes, that is an idea!! For me, with my background, this just feels like consultants and other smart people invented the Aspirin for people who bang their heads against a wall all day long. Of course they have a headache, but maybe the Aspirin is not the actual solution.

To conclude on this, I propose the following: SCRUM, SAFe and all these frameworks have great tools. Most of this stuff really makes teams more efficient because they solve a lot of recurring problems. BUT it will only work in a sustainable manner, when we also work at the fundamental world-views, responsibilities and consequences of action within the teams. Appling best-practices mechanically will generate more pain than gain.

Here are some proposals for larger systems and organizations:

Every piece of SW should be delivered as a service. Including support, SLA, milestoned roadmap, … and should be managed as a product – using it has a price which is constantly assessed by its clients – internal or external, in regards to the cost-to-value ratio provided.
This applies to externally visible products and services (this is a no-brainer) but MORE IMPORTANT for ALL internal SW components. The ice-berg under the tip, so to say. Example: A logging system where all components log their events into is a base component in most systems. Turn into an internal product with a price tag, delivered as a managed service. Try this and see what happens to the team that delivers this logging system. In AWS, this service is called CloudWatch. Other examples may be Authorization systems (AWS IAM), a managed messaging infrastructure (AWS SQS) or other more commodity things. Funny here – this also triggers people to think more of external solutions (e.g. buy more, develop less) because suddenly these internal services get a price and are in competition to e.g. a cloud service with similar functionality. The threat to be shut down because an 3^rd party service is cheaper must be visible to all team members to be clear about the own USPs and what’s important.
Get away from large platforms with large teams more towards a system of systems idea. Micro services are not quite that, because I want to extend the micro service idea into the commercial release part as well. Micro services do not help much when you need to deploy/release all of them at a given point in time to release your product/platform. So, we should think of independent services that develop, release and deploy in own cycles.
A single internal service should not need more than 6 SCRUM teams to deliver (max 3 Product Owners organized by one commercial Product Manager). If you need more people, split along well defined interfaces and focus on service integration
Interfaces are first class citizens and are discussed all day long between providers and consumers of services. Interface management and the integration architecture is a core skill and discipline. Your lead architect should have a focus on the service integration architecture specifically – how to build end-to-end use case out of hundreds of small services.
A SCRUM team is cross functional. It does everything from Idea generation over Development, Testing, Deployment and Operation. You may get help from the organization or cloud provider (e.g. hosting Jira, TFS, Jenkins or build pipelines and so on, or providing environments as a service) but the team has end-to-end responsibility for its solution. If stuff fails in production, the developer can feel that right away. No hand-overs to other departments.
Processes around releases and the system architecture must allow that each service offering (not only the technical part) can be released individually at any given time. This requires a good system architecture around interface management, tailored processes and an intelligent approach to quality assurance and system testing – again along integration interfaces.
No developer hiding. If you hear statements like “we do not want that every field problem hits the developers. we need 3 layers of support… ” stop it. You may have a first line of defense like a hotline or ticket system filter, but developers need to be exposed to the consequences of their work – especially when it impacts customer satisfaction. Will this potentially kill your roadmap? Yes, and for a good reason: It is better to have less features that really work for users then having a lot of poor features.
Make sure that people WANT early feedback and improvement in process and way of work and then provide them with tools from SCRUM, SAFe, LEss plus all the cool things around CI/CD & continuous X and so on. They will value the benefits much more and are able to tailor them to their needs.
Foster autonomy of these service/product teams but take their commitments serious and with consequence – like in any competitive scenario.