Platform Engineering – Part 4 – Cost Drivers

Managing platform scope – Start small with a bold vision

It is essential to be clear about the scope of the target platform, or better: what the start scope is (we call this the initial scope). If used within one organization only, the larger the platform becomes and the more organizations and people are impacted, the more it becomes a product line approach. Beside the pure technical challenges, also the organizational and human-centric challenges become more and more substantial to the success of such a platform development project.

The answer to this question can be derived from a sound business planning and the business strategy and/or a bold vision. Business owners, product owners, R&D managers and project leads should agree on the business scope of the platform in order build the foundation for the technical scoping. This step is essential also to early identify the involved stakeholders and the monetary impact to different departments in an internal re-use scenario.

It gets easier though when thinking about a service business and start small. Sometimes it is easier to serve external customers then internal departments, due to a much clearer contractual relationship. When you consider how AWS grew, it was a platform with only a hand full of services in the beginning. In 2011, AWS released over 80 significant services and features, the year after in 2012 nearly 160 – and in 2017 AWS already launched 1,430 new services and features. Think about how successful AWS would have been if all those many features and service were considered into a large concept at day one. It would have failed.

In the favor of agility and avoiding to much complexity in the beginning, it should be considered to also start small in your platform approach. Discussing all possible features and variations upfront will lead to a fairly large scope which is hard to fund, hard to implement and hard to manage. Starting small, grow over time – with a bold vision in mind – should also apply for platform development. Which does not mean that you should not have a bold vision and be clear about the key architectural requirements that drive your decision making.

One of the main reasons of platform projects to fail is erosion of scope. As mentioned earlier, each features will cost you up to 5 times more in the platform, compared to directly built into the products. While looking for more re-use many organization accept to many requirements into the platform scope, make promises, just to convince other organizations to build on top of the platform. This may quickly lead to massive cost and investment needs, or end up with platforms that serve everything a little bit, but not really a deep value to anybody. Such projects will be the first to cut cost when financial consolidation happens, any many projects fail to deliver enough value for the size of their development teams.

Separation of Infrastructure vs. Business Features

Most systems implement a layered architecture. From the bottom upward (e.g. base operating systems) the features get more and more domain specific and finally provide the actual business services that hold the customer value. The ultimate goal of any software project is to provide these high value business features which contain the differentiation and market segment relevant assets.

Nevertheless, every system needs supporting functions and some base infrastructure to function properly. Beginning with the basic operating system, or compute capacity to be more generic, we need infrastructure for data persistence and data exchange, runtime containers, service integration, configuration and other tools to finally create our business relevant features.

The potential of a software component or feature that is in the lower part of that layer cake to be re-used is quite high. The lower parts of the layer cake are of general purpose while features get more specialized the higher, we move up. Highly specialized features are typically rarely re-used by other applications unless they can be configured for related and similar use cases.

As defined earlier, every layer in this schema could be designed as a platform for the next layer. And so every layer needs a dedicated scope and a set of features it provides.

When planning for a platform that is distilled from existing solutions, it is crucial to separate infrastructure features from business features in the discussions. It appears logical that it could be a goal to realize as much business features as possible into the platform, to gain a maximum benefit over multiple applications. But this might be paid off by the flexibility that is required in these features. As a general rule, it is much harder to re-use specialized features than more generic features, but at the same time those more generic features may be less valuable because there are many options available to replace them and they do not solve unique customer problems.

It will be more and more difficult to provide features in the upper layers of this pile that fit the requirements of multiple applications or solutions. This can manifest either in the diversity of functional requirements or, and this is the more difficult case, in the diversity of non-functional – or better: operational and developmental – requirements. This is also somehow natural; otherwise, it would not make sense to build different products or solutions at all. They differ in their business features; it is just the question how much. This is why successful platforms invest a lot into providing very high standards in operational and developmental requirements like security, resilience, performance, cost position, scalability, compatibility, ease of use and extensibility to increase the user base and so the amount of client solution/products in a sustainable way.

So, a clean decision will be required how much of the existing or future business features can be generalized to become part of the platform. The generalization efforts require configurability and flexibility as new qualities in the existing features which typically increase the solution complexity and so the cost for the feature provisioning. In the most cases it is advisable to start with a harmonization of the infrastructure and work up the layer cake step by step. In the simplest case, the platform provides pure infrastructure services and some general tools. On the other end one can think of a fully configurable product base platform that covers 90%+ of the application features. Thinking of the car manufacturing domain, there you find examples of highly configurable product platforms that allow a portfolio of products with little engineering overhead per new product – if it is in scope of the platform’s configurable feature set.

In a cloud-based migration effort, with the usage of e.g. AWS, the basic first step around base infrastructure and middleware can be done in very short time, by standardizing your application portfolio on the base of e.g. AWS infrastructure, databases, storage, messaging and more, your entry level in the layer cake can be substantially higher. This generates more value out of your engineering hours immediately and reduces the amount of heavily lifting significantly.

Major efforts apart from the obvious feature development

When transforming the applications and solutions in an organization from individuals to solutions based on a shared platform, not only technical work is required in writing and changing software.

There is set of aspects and issues that need to be taken care of beside the pure technical development of platform features. The establishment of a platform that has multiple clients (internal and external users of the platform) comes along with a new set of task, liabilities and responsibilities that all lead to increased efforts in the platform project. Also the organization and the processes around the platform development will need special attention and proper adjustments. In product line engineering, these topics are also discussed intensively and PLE also provides several best practices in the area. An example is the maturity of an organization to drive global optimizations over platform and application / solution projects.

Here is a short summary of the non-technical aspects which will drive the cost of your platform project:

Ensuring that the platform development is business driven & strategy aligned
- Make sure that the key persons (e.g. architects, key developers) understand the client’s domains and their cost drivers and the platforms strategic rational
- Do not spend effort on things that do not contribute to the business use cases (explicit exclusion)

Creating proper awareness of sponsors expectations and political forces
- Every sponsor or client may have other expectations, be aware that you cannot serve all at the same time. Specifically, if you serve internal customers only, politics can be a significant issue.
- Expect that the clients will generate conflicting requirements for the platform
- Balance the different client’s demand and the development capacity of the platform team – make sure that client’s deadlines to do not impact the platform roadmap (and pressure on the team) too much
- Incremental approach, continuous delivery is advised – foster early feedback

Prioritizing platform feature quality over platform feature completeness
- Prioritize, prioritize, prioritize together with all clients (they have all different ideas of what is important)
- Embrace controlled change, priorities and scope will change due to internal or external forces
- Operational quality is more important than a broad feature set
- Be able to always (!) deliver a product-quality release

Enforcing real architectural governance
- Install a change control board which decides on prioritization (roadmap) for all requested changes, in small independent teams this can be as small as the Product Owner and the Key Developer/Architect.
- Even under high pressure, do not allow quick changes before clarification of the impact to other clients is done – rollbacks are very expensive

Putting a focus on the usability of the Platform on developer level (developer habitability)
- Provide developer level information for effective and efficient work with the platform
- Provide tools and concepts, focus on client’s productivity
- Once an interface is released and used, you will not be able to change it easily
- Most platforms are designed to allow building product/solutions, so the developers and builders are the most important stakeholder! Act accordingly.

Hardening of platform against malfunctioning plug-ins and clients
- When providing a framework to build and plug-in additional functionality, ensure the platform stability
- Expect that plug ins, apps, solutions and on-top products will degrade the platform quality (e.g. performance, stability) for other users and clients and prevent this
- Implement measures to securely handle plug ins and API usage, which includes a checking and release process for externally developed modules
- Focus on multi-tenancy and tenant isolation, not only from security standpoint but also from a performance isolation standpoint, e.g. reducing performance degradation blast radius of resource intensive clients.

Providing Training, Service and Support for developers and users
- Provide a variety of options to build platform knowledge on client side and do ongoing consulting
- Install a variety of options for reactive support
- Install architectural guidance during early client project phases, e.g. solution architects
- Include features in the platform that support the support activities

Balancing the coupling of platform team and Product / Solution teams
- The platform team should keep constant contact with the clients to understand needs and problems and to provide updates on roadmaps and news
- At the same time the platform teams should be exposed to operational issues to build sustainable quality improvements
- But you need to be able to also develop further, so support people are required to help customers on standard problems and architectural guidance
- Defend the platform goals and make sure that platform developers are not firefighting problems in the client’s releases, unless it concerns platform issues and flaws that prevent client releases.

Efforts due to more complex development processes and managing dependencies

The following applies when the platform is created with a set of known client and products/solutions are designed on top of a platform within the same organization.

PLE defines two different development processes: The domain engineering (creating the platform) and the application engineering (creating the solutions). With the introduction of the platform into existing and new solutions, the solutions’ development will now depend on the releases from the platform team. It sounds logic and simple, but this fact shall not be underestimated. Specifically, this becomes a problem, when a (set of) product/solution is explicitly designed to make use of planned platform features and both, platform and product/solution are developed at the same time.

First, a development team along with responsibilities for the platform needs to be created. The team needs to establish a platform development process which also takes into consideration what the solutions require in terms of release cycles, testing periods and roadmaps. The linkage needs to be synchronized, and they will have dependencies. An explicit management of these issues is vital to the success of the platform and the solutions that depend on it.

One solution to it, is to drive the platform as a product. Optimally, your platform organization consists of a set of teams, each developing and operating platform services independently. They release when they release, and users of the platform just work on what’s available after every release. No over promising on roadmaps. However, ever solution / product builder team will have to make decisions around all the features your platform does not provide (yet). Do I implement this myself or do I wait for the platform to solve this problem? This question raises the “when will the platform deliver” issue and can generate a massive pressure on both sides when not handled right. Be prepared.

The idea of managing the platform as an internal product comes along with a few principles:

When discussing with users and clients about platform features, you mainly discuss based on what the platform can do today. Not what it might be able to do next year.
When discussing the future, do not promise anything that is beyond a 3-month horizon and only discuss features that are being implement and you have a good maturity in. Note – your priorities will most properly change often and quickly, so do not over promise.
The platform usage has a price, and your clients have full clarity what they get for it. The judgment if a feature should be used from platform side vs. implementing it into the solution must be backed by actual operational cost for the solution team.
Platform / solution builders should not plan their releases on features that are only planned to be delivered from the platform.

Of course, in an internal software platform, where the platform team shares the same lunch table with the solution team, it is reasonable to discuss about future platform features and consider to not implement features into the solution that will be delivered by the platform short after. Still, the amount of influence into the platform team increases with the number of clients they have, so changes in priorities and direction may come quickly. As solution team, you should always plan only what the platform provides today and receive future features of the platform as a gift, which you may or may not accept in incorporate into your solution.

On the technical side this requires good support for product/solution creators to have the choice to adopt or ignore a new version release of the platform. This leads to the question of backward compatibility, API stability and release cadence of breaking changes. This is a topic for another article.