The hidden costs of PaaS & microservice engineering innovation

[tl;dr The leap from monolithic application development into the world of PaaS and microservices highlights the need for consistent collaboration, disciplined development and a strong vision in order to ensure sustainable business value.]

The pace of innovation in the PaaS and microservice space is increasing rapidly. This, coupled with increasing pressure on ‘traditional’ organisations to deliver more value more quickly from IT investments, is causing a flurry of interest in PaaS enabling technologies such as Cloud Foundry (favoured by the likes of IBM and Pivotal), OpenShift (favoured by RedHat), Azure (Microsoft), Heroku (SalesForce), AWS, Google Application Engine, etc.

A key characteristic of all these PaaS solutions is that they are ‘devops’ enabled – i.e., it is possible to automate both code and infrastructure deployment, enabling the way to have highly automated operational processes for applications built on these platforms.

For large organisations, or organisations that prefer to control their infrastructure (because of, for example, regulatory constraints), PaaS solutions that can be run in a private datacenter rather than the public cloud are preferable, as this a future option to deploy to external clouds if needed/appropriate.

These PaaS environments are feature-rich and aim to provide a lot of the building blocks needed to build enterprise applications. But other framework initiatives, such as Spring Boot, DropWizard and Vert.X aim to make it easier to build PaaS-based applications.

Combined, all of these promise to provide a dramatic increase in developer productivity: the marginal cost of developing, deploying and operating a complete application will drop significantly.

Due to the low capital investment required to build new applications, it becomes ever more feasible to move from a heavy-weight, planning intensive approach to IT investment to a more agile approach where a complete application can be built, iterated and validated (or not) in the time it takes to create a traditional requirements document.

However, this also has massive implications, as – left unchecked – the drift towards entropy will increase over time, and organisations could be severely challenged to effectively manage and generate value from the sheer number of applications and services that can be created on such platforms. So an eye on managing complexity should be in place from the very beginning.

Many of the above platforms aim to make it as easy as possible for developers to get going quickly: this is a laudable goal, and if more of the complexity can be pushed into the PaaS, then that can only be good. The consequence of this approach is that developers have less control over the evolution of key aspects of the PaaS, and this could cause unexpected issues as PaaS upgrades conflict with application lifecycles, etc. In essence, it could be quite difficult to isolate applications from some PaaS changes. How these frameworks help developers cope with such changes is something to closely monitor, as these platforms are not yet mature enough to have gone through a major upgrade with a significant number of deployed applications.

The relative benefit/complexity trade-off between established microservice frameworks such as OSGi and easier to use solutions such as described above needs to be tested in practice. Specifically, OSGi’s more robust dependency model may prove more useful in enterprise environments than environments which have a ‘move fast and break things’ approach to application development, especially if OSGi-based PaaS solutions such as JBoss Fuse on OpenShift and Paremus ServiceFabric gain more popular use.

So: all well and good from the technology side. But even if the pros and cons of the different engineering approaches are evaluated and a perfect PaaS solution emerges, that doesn’t mean Microservice Nirvana can be achieved.

A recent article on the challenges of building successful micro-service applications, coupled with a presentation by Lisa van Gelder at a recent Agile meetup in New York City, has emphasised that even given the right enabling technologies, deploying microservices is a major challenge – but if done right, the rewards are well worth it.

Specifically, there are a number of factors that impact the success of a large scale or enterprise microservice based strategy, including but not limited to:

  • Shared ownership of services
  • Setting cross-team goals
  • Performing scrum of scrums
  • Identifying swim lanes – isolating content failure & eventually consistent data
  • Provision of Circuit breakers & Timeouts (anti-fragile)
  • Service discoverability & clear ownership
  • Testing against stubs; customer driven contracts
  • Running fake transactions in production
  • SLOs and priorities
  • Shared understanding of what happens when something goes wrong
  • Focus on Mean time to repair (recover) rather than mean-time-to-failure
  • Use of common interfaces: deployment, health check, logging, monitoring
  • Tracing a users journey through the application
  • Collecting logs
  • Providing monitoring dashboards
  • Standardising common metric names

Some of these can be technically provided by the chosen PaaS, but a lot is based around the best practices consistently applied within and across development teams. In fact, it is quite hard to capture these key success factors in traditional architectural views – something that needs to be considered when architecting large-scale microservice solutions.

In summary, the leap from monolithic application development into the world of PaaS and microservices highlights the need for consistent collaboration, disciplined development and a strong vision in order to ensure sustainable business value.
Advertisements
The hidden costs of PaaS & microservice engineering innovation

Scaled Agile needs Slack

[tl;dr In order to effectively scale agile, organisations need to ensure that a portion of team capacity is explicitly set aside for enterprise priorities. A re-imagined enterprise architecture capability is a key factor in enabling scaled agile success.]

What’s the problem?

From an architectural perspective, Agile methodologies are heavily dependent on business- (or function-) aligned product owners, which tend to be very focused on *their* priorities – and not the enterprise’s priorities (i.e., other functions or businesses that may benefit from the work the team is doing).

This results in very inward-focused development, and where dependencies on other parts of the organisation are identified, these (in the absence of formal architecture governance) tend to be minimised where possible, if necessary through duplicative development. And other teams requiring access to the team’s resources (e.g., databases, applications, etc) are served on a best-effort basis – often causing those teams to seek other solutions instead, or work without formal support from the team they depend on.

This, of course, leads to architectural complexity, leading to reduced agility all round.

The Solution?

If we accept the premise that, from an architectural perspective, teams are the main consideration (it is where domain and technical knowledge resides), then the question is how to get the right demand to the right teams, in as scalable, agile manner as possible?

In agile teams, the product backlog determines their work schedule. The backlog usually has a long list of items awaiting prioritisation, and part of the Agile processes is to be constantly prioritising this backlog to ensure high-value tasks are done first.

Well known management research such as The Mythical Man Month has influenced Agile’s goal to keep team sizes small (e.g., 5-9 people for scrum). So when new work comes, adding people is generally not a scalable option.

So, how to reconcile the enterprise’s needs with the Agile team’s needs?

One approach would be to ensure that every team pays an ‘enterprise’ tax – i.e., in prioritising backlog items, at least, say, 20% of work-in-progress items must be for the benefit of other teams. (Needless to say, such work should be done in such a way as to preserve product architectural integrity.)

20% may seem like a lot – especially when there is so much work to be done for immediate priorities – but it cuts both ways. If *every* team allows 20% of their backlog to be for other teams, then every team has the possibility of using capacity from other teams – in effect, increasing their capacity by much more than they could do on their own. And by doing so they are helping achieve enterprise goals, reducing overall complexity and maximising reuse – resulting in a reduction in project schedule over-runs, higher quality resulting architecture, and overall reduced cost of change.

Slack does not mean Under-utilisation

The concept of ‘Slack’ is well described in the book ‘Slack: Getting Past Burn-out, Busywork, and the Myth of Total Efficiency‘. In effect, in an Agile sense, we are talking about organisational slack, and not team slack. Teams, in fact, will continue to be 100% utilised, as long as their backlog consists of more high-value items then they can deliver. The backlog owner – e.g., scrum master – can obviously embed local team slack into how a particular team’s backlog is managed.

Implications for Project & Financial Management

Project managers are used to getting funding to deliver a project, and then to be able to bring all those resources to bear to deliver that project. The problem is, that is neither agile, nor does it scale – in an enterprise environment, it leads to increasingly complex architectures, resulting in projects getting increasingly more expensive, increasingly late, or delivering increasingly poor quality.

It is difficult for a project manager to accept that 20% of their budget may actually be supporting other (as yet unknown) projects. So perhaps the solution here is to have Enterprise Architecture account for the effective allocation of that spending in an agile way? (i.e., based on how teams are prioritising and delivering those enterprise items on their backlog). An idea worth considering..

Note that the situation is a little different for planned cross-business initiatives, where product owners must actively prioritise the needs of those initiatives alongside their local needs. Such planned work does not count in the 20% enterprise allowance, but rather counts as part of how the team’s cost to the enterprise is formally funded. It may result in a temporary increase in resources on the team, but in this case discipline around ‘staff liquidity’ is required to ensure the team can still function optimally after the temporary resource boost has gone.

The challenge regarding project-oriented financial planning is that, once a project’s goals have been achieved, what’s left is the team and underlying architecture – both of which need to be managed over time. So some dissociation between transitory project goals and longer term team and architecture goals is necessary to manage complexity.

For smaller, non-strategic projects – i.e., no incoming enterprise dependencies – the technology can be maintained on a lights-on basis.

Enterprise architecture can be seen as a means to asses the relevance of a team’s work to the enterprise – i.e., managing both incoming and outgoing team dependencies.  The higher the enterprise relevance of the team, the more critical the team must be managed well over time – i.e., team structure changes must be carefully managed, and not left entirely to the discretion of individual managers.

Conclusion

By ensuring that every project that purports to be Agile has a mandatory allowance for enterprise resource requirements, teams can have confidence that there is a route for them to get their dependencies addressed through other agile teams, in a manner that is independent of annual budget planning processes or short-term individual business priorities.

The effectiveness of this approach can be governed and evaluated by Enterprise Architecture, which would then allow enterprise complexity goals to be addressed without concentrating such spending within the central EA function.

In summary, to effectively scale agile, an effective (and possibly rethought) enterprise architecture capability is needed.

Scaled Agile needs Slack

Strategic Theme #1/5: The Lean Enterprise

[tl;dr Agile and lean concepts are core to any enterprise cultural transformational efforts, but need to extend beyond the realm of the IT function to ensure the desired strategic outcomes from a business perspective are achieved.]

Cultural and process transformation is high on the agenda for many large businesses, as this is (correctly) seen to be necessary to innovate and maintain agility.

However, such transformation is not limited to IT – it needs to extend to financial management, and risk, governance and compliance in order for IT to realise it’s full potential.

A lot of the thinking here is very well captured in the book ‘The Lean Enterprise: How High Performance Organisations Innovate at Scale’ by Humble, Moleskin & O’Reilly.

It covers key ideas such as:

  • extending agile/systems thinking and kaizen (continuous improvement) beyond the realms of IT into financial management, risk, governance and compliance;
  • differentiating exploration from exploitation from a project portfolio perspective (applying real-option theory to portfolio selection and project execution);
  • recognising and transforming the dominant corporate culture (pathological, bureaucratic or generative);
  • starting small and tolerating failure.

This material dovetails nicely with other thought-leading publications such as ‘The Phoenix Project’ (Kim et al) and ‘Continuous Delivery’ by Humble & Farley.

Other ‘lean’ best practices, such as Scaled Agile or LeSS (Large-scale scrum) also address these themes, although these do not attempt to bring the thinking into the non-IT domains of financial management and governance, etc, which (in my view) ignores a key driver for why IT processes in many large organisations are as broken as they are. Never-the-less, they all bring useful insights to the table and any one of them should be part of any transformational agenda.

Addressing complexity is something implicit in this topic: on its own, doing lots of experiments does nothing but create more complexity (even if they do temporarily add value). Systems thinking requires understanding what is no longer needed as much as what is needed (or, alternatively, what is strategic and what is commodity), and is key to preventing complexity from impeding innovation.

For now, I’m not convinced the topic of complexity management has been adequately addressed, but more on this as I dive deeper into this area.

The topic of ‘Lean Enterprise’ is more directly relevant to change management than it is to architecture. However, since architectures invariably follow Conway’s Law (which states that system designs follow the communication structures of organisations), how projects are run, and the level of collaboration established at the outset, can materially impact architectural outcomes, and this must be the starting point for major architectural change,

Strategic Theme #1/5: The Lean Enterprise