The future of modularity is..serverless

[tl;dr As platform solutions evolve and improve, the pressure for firms to reduce costs, increase agility and be resilient to failure will drive teams to adopt modern infrastructure platform solutions, and in the process decompose and simplify monoliths, adopt microservices and ultimately pave the way to building naturally modular systems on serverless platforms.]

“Modularity” – the (de)composition of complex systems into independently composable or replaceable components without sacrificing performance, security or usability – is an architectural holy grail.

Businesses may be modular (commonly expressed through capability maps), and IT systems can be modular. IT modularity can also be described as SOA (Service Oriented Architecture), although because of many aborted attempts at (commercializing) SOA in the past, the term is no longer in fashion. Ideally, the relationship between business ‘modules’ and IT application modules should be fully aligned (assuming the business itself has a coherent underlying business architecture).

Microservices are the latest manifestation of SOA, but this is born from a fundamentally different way of thinking about how applications are developed, tested, deployed and operated – without the need for proprietary vendor software.

Serverless takes takes the microservices concept one step further, by removing the need for developers (or, indeed, operators) to worry about looking after individual servers – whether virtual or physical.

A brief history of microservices

Commercial manifestations of microservices have been around for quite a while – for example Spring Boot, or OSGi for Java – but these have very commercial roots, and implement a framework tied to a particular language. Firms may successfully implement these technologies, but they will need to have already gone through much of the microservices stone soup journey. It is not possible to ‘buy’ a microservices culture from a technology vendor.

Because microservices are intended to be independently testable and deployable components, a microservices architecture inherently rejects the notion of a common framework for implementing/supporting the microservices natures of an application. This therefore puts the onus on the infrastructure platform to provide all the capabilities needed to build and run microservices.

So, capabilities like naming, discovery, orchestration, encryption, load balancing, retries, tracing, logging, monitoring, etc which used to be handled by language-specific frameworks are now increasingly the province of the ‘platform’. This greatly reduces the need for complex, hard-to-learn frameworks, but places a lot of responsibility on the platform, which must handle these requirements in a language-neutral way.

Currently, the most popular ‘platforms’ are the major cloud providers (Azure, Google, AWS, Digital Ocean, etc), IaaS vendors (e.g., VMWare, HPE), core platform building blocks such as Kubernetes, and platform solutions such as Pivotal Cloud Foundry,  Open Shift and Mesophere. (IBM’s BlueMix/Cloud is likely to be superseded by Red Hat’s Open Shift.)

The latter solutions previously had their own underlying platform solutions (e.g., OSGi for BlueMix, Bosh for PKS), but most platform vendors have now shifted to use Kubernetes under the hood. These solutions are intended to work in multiple cloud environments or on-premise, and therefore in principle allow developers to avoid caring about whether their applications are deployed on-premise or on-cloud in an IaaS-neutral way.

Decomposing Monolithic Architectures

With the capabilities these platforms offer, developers will be incentivized to decompose their applications into logical, distributed functional components, because the marginal additional cost of maintaining/monitoring each new process is relatively low (albeit definitely not zero). This approach is naturally amenable to supporting event driven architectures, as well as more conventional RESTful and RPC architectures (such as gRPC), as running processes can be mapped naturally to APIs, services and messages.

But not all processes need to be running constantly – and indeed, many processes are ‘out-of-band’ processes, which serve as ‘glue’ to connect events that happen in one system to another system: if events are relatively infrequent (e.g., less than one every few seconds), then no resources need to be used in-between events. So provisioning long-running docker containers etc may be overkill for many of these processes – especially if the ‘state’ required by those processes can be made available in a low-latency, highly available long-running infrastructure service such as a high-performance database or cache.

Functions on Demand

Enter ‘serverless’, which aims to specify the resources required to execute a single piece of code (basically a functional monolith) on-demand in a single package – roughly the equivalent of, for example, a declarative service in OSGi. The runtime in which the code runs is not the concern of the developer in a serverless architecture. There are no VMs, containers or side-cars – only functions communicating via APIs and events.

Currently, the serverless offerings by the major cloud providers are really only intended for ‘significant’ functions which justify the separate allocation of compute, storage and network resources needed to run them. A popular use case are ‘transformational’ functions which convert binary data from one form to another – e.g., create a thumbnail image from a full image – which may temporarily require a lot of CPU or memory. In contrast, an OSGi Declarative Service, for example, could be instantiated by the runtime inside the same process/memory space as the calling service – a handy technique for validating a modular architecture without worrying about the increased failure modes of a distributed system, while allowing the system to be readily scaled out in the future.

Modular Architectures vs Distributed Architectures

Serverless functions can be viewed as ‘modules’ by another name – albeit modules that happen to require separately allocated memory, compute and storage to the calling component. While this is a natural fit for browser-based applications, it is not a great fit for monolithic applications that would benefit from modular architectures, but not necessarily benefit from distributed architectures. For legacy applications, the key architectural question is whether it is necessary or appropriate to modularize the application prior to distributing the application or migrating it to an orchestration platform such as Kubernetes, AWS ECS, etc.

As things currently stand, the most appropriate (lowest risk) migration route for complex monolithic applications is likely to be a migration of some form to one of the orchestrated platforms identified above. By allowing the platform to take care of ‘non-functional’ features (such as naming, resilience, etc), perhaps the monolith can be simplified. Over time, the monolith can then be decomposed into modular ‘microservices’ aligned by APIs or events, and perhaps eventually some functionality could decompose into true serverless functions.

Serverless and Process Ownership

Concurrently with decomposing the monolith, a (significant) subset of features – mainly those not built directly using the application code-base, or which straddle two applications – may be meaningfully moved to serverless solutions without depending on the functional decomposition of the monolith.

It’s interesting to note that such an architectural move may allow process owners to own these serverless functions, rather than relying on application owners, where often, in large enterprises, it isn’t even clear which application owner should own a piece of ‘glue’ code, or be accountable when such code breaks due to a change in a dependent system.

In particular, existing ‘glue’ code which relies on centralized enterprise service buses or equivalent would benefit massively from being migrated to a serverless architecture. This not only empowers teams that look after the processes the glue code supports, but also ensures optimal infrastructure resource allocation, as ESBs can often be heavy consumers of infrastructure resources. (Note that a centralized messaging system may still be needed, but this would be a ‘dumb pipe’, and should itself be offered as a service.)

Serverless First Architecture

Ultimately, nirvana for most application developers and businesses, is a ‘serverless-first’ architecture, where delivery velocity is only limited by the capabilities of the development team, and solutions scale both in function and in usage seamlessly without significant re-engineering. It is fair to say that serverless is a long way from achieving this nirvana (technologies like ‘AIOps‘ have a long way to go), and most teams still have to shift from monolithic to modular and distributed thinking, while still knowing when a monolith is still the most appropriate solution for a given problem.

As platform solutions improve and mature, however, and the pressure mounts on businesses whose value proposition is not in the platform engineering space to reduce costs, increase agility and be increasingly resilient to failures of all kinds, the path from monolith to orchestrated microservices to serverless (and perhaps ‘low-code’) applications seems inevitable.

The future of modularity is..serverless

What I realized from studying AWS Services & APIs

[tl;dr The weakest link for firms wishing to achieve business agility is principally based around the financial and physical constraints imposed by managing datacenters and infrastructure. The business goals of agile, devops and enterprise architecture are fundamentally unachievable unless these constraints can be fully abstracted through software services.]

Background

Anybody who has grown up with technology with the PC generation (1985-2005) will have developed software with a fairly deep understanding of how the software worked from an OS/CPU, network, and storage perspective. Much of that generation would have had some formal education in the basics of computer science.

Initially, the PC generation did not have to worry about servers and infrastructure: software ran on PCs. As PCs became more networked, dedicated PCs to run ‘server’ software needed to be connected to the desktop PCs. And folks tasked with building software to run on the servers would also have to buy higher-spec PCs for server-side, install (network) operating systems, connect them to desktop PCs via LAN cables, install disk drives and databases, etc. This would all form part of the ‘waterfall’ project plan to deliver working software, and would all be rather predictable in timeframes.

As organizations added more and more business-critical, network-based software to their portfolios, organization structures were created for datacenter management, networking, infrastructure/server management, storage and database provisioning and operation, middleware management, etc, etc. A bit like the mainframe structures that preceded the PC generation, in fact.

Introducing Agile

And so we come to Agile. While Agile was principally motivated by the flexibility in GUI design offered by HTML (vs traditional GUI design) – basically allowing development teams to iterate rapidly over, and improve on, different implementations of UI – ‘Agile’ quickly became more ‘enterprise’ oriented, as planning and coordinating demand across multiple teams, both infrastructure and application development, was rapidly becoming a massive bottleneck.

It was, and is, widely recognized that these challenges are largely cultural – i.e., that if only teams understood how to collaborate and communicate, everything would be much better for everyone – all the way from the top down. And so a thriving industry exists in coaching firms how to ‘improve’ their culture – aka the ‘agile industrial machine’.

Unfortunately, it turns out there is no silver bullet: the real goal – organizational or business agility – has been elusive. Big organizations still expend vast amounts of time and resources doing small incremental change, most activity is involved in maintaining/supporting existing operations, and truly transformational activities which bring an organization’s full capabilities together for the benefit of the customer still do not succeed.

The Reality of Agile

The basic tenet behind Agile is the idea of cross-functional teams. However, it is obvious that most teams in organizations are unable to align themselves perfectly according to the demand they are receiving (i.e., the equivalent of providing a customer account manager), and even if they did, the number of participants in a typical agile ‘scrum’ or ‘scrum of scrums’ meeting would quickly exceed the consensus maximum of about 9 participants needed for a scrum to be successful.

So most agile teams resort to the only agile they know – i.e., developers, QA and maybe product owner and/or scrum-master participating in daily scrums. Every other dependency is managed as part of an overall program of work (with communication handled by a project/program manager), or through on-demand ‘tickets’ whereby teams can request a service from other teams.

The basic impact of this is that pre-planned work (resources) gets prioritized ahead of on-demand ‘tickets’ (excluding tickets relating to urgent operational issues), and so agile teams are forced to compromise the quality of their work (if they can proceed at all).

DevOps – Managing Infrastructure Dependencies

DevOps is a response to the widening communications/collaboration chasm between application development teams and infrastructure/operations teams in organizations. It recognizes that operational and infrastructural concerns are inherent characteristics of software, and software should not be designed without these concerns being first-class requirements along with product features/business requirements.

On the other hand, infrastructure/operations providers, being primarily concerned with stability, seek to offer a small number of efficient standardized services that they know they can support. Historically, infrastructure providers could only innovate and adapt as fast as hardware infrastructure could be procured, installed, supported and amortized – which is to say, innovation cycles measured in years.

In the meantime, application development teams are constantly pushing the boundaries of infrastructure – principally because most business needs can be realized in software, with sufficiently talented engineers, and those tasked with building software often assume that infrastructure can adapt as quickly.

Microservices – Managing AppDev Team to AppDev Team Dependencies

While DevOps is a response to friction in application development and infrastructure/operations engagement, microservices can be usefully seen as a response to how application development team can manage dependencies on each other.

In an ideal organization, an application development team can leverage/reuse capabilities provided by another team through their APIs, with minimum pre-planning and up-front communication. Teams would expose formal APIs with relevant documentation, and most engagement could be confined to service change requests from other teams and/or major business initiatives. Teams would not be required to test/deploy in lock-step with each other.

Such collaboration between teams would need to be formally recognized by business/product owners as part of the architecture of the platform – i.e., a degree of ‘mechanical sympathy’ is needed by those envisioning new business initiatives to know how best to leverage, and extend, software building blocks in the organization. This is best done by Product Management, who must steward the end-to-end business and data architecture of the organization or value-stream in partnership with business development and engineering.

Putting it all together

To date, most organizations have been fighting a losing battle. The desire to do agile and devops is strong, but the fundamental weakness in the chain is the ability for internal infrastructure providers and operators to move as fast as software development teams need them to – issues as much related to financial management as it is to managing physical buildings, hardware, etc.

What cloud providers are doing is creating software-level abstractions of infrastructure services, allowing the potential of agile, devops and microservices to begin to be realized in practice.

Understanding these services and abstractions is like re-learning the basic principles of Computer Science and Engineering – but through a ‘service’ lens. The same issues need to be addressed, the same technical challenges exist. Except now some aspects of those challenges no longer need to be solved by organizations (e.g., how to efficiently abstract infrastructure services at scale), and businesses can focus on the designing the infrastructure services that are matched with the needs of application developers (rather than a compromise).

Conclusion

The AWS Service Catalog and APIs is an extraordinary achievement (as is similar work by other cloud providers, although they have yet to achieve the catalog breadth that AWS has). Architects need to know and understand these service abstractions and focus on matching application needs with business needs, and can worry less about the traditional constraints infrastructure organizations have had to work with.

In many respects, the variations between these abstractions across providers will vary only in syntax and features. Ultimately (probably at least 10 years from now) all commodity services will converge, or be available through efficient ‘cross-plane’ solutions which abstract providers. So that is why I am choosing to ‘go deep’ on the AWS APIs. This is, in my opinion, the most concrete starting point to helping firms achieve ‘agile’ nirvana.

What I realized from studying AWS Services & APIs

Transforming IT: From a solution-driven model to a capability-driven model

[tl;dr Moving from a solution-oriented to a capability-oriented model for software development is necessary to enable enterprises to achieve agility, but has substantial impacts on how enterprises organise themselves to support this transition.]

Most organisations which manage software change as part of their overall change portfolio take a project-oriented approach to delivery: the project goals are set up front, and a solution architecture and delivery plan are created in order to achieve the project goals.

Most organisations also fix project portfolios on a yearly basis, and deviating from this plan can often very difficult for organisations to cope with – at least partly because such plans are intrinsically tied into financial planning and cost-saving techniques such as capitalisation of expenses, etc, which reduce bottom-line cost to the firm of the investment (even if it says nothing about the value added).

As the portfolio of change projects rise every year, due to many extraneous factors (business opportunities, revenue protection, regulatory demand, maintenance, exploration, digital initiatives,  etc), cross-project dependency management becomes increasingly difficult. It becomes even more complex to manage solution architecture dependencies within that overall dependency framework.

What results is a massive set of compromises that ends up with building solutions that are sub-optimal for pretty much every project, and an investment in technology that is so enterprise-specific, that no other organisation could possibly derive any significant value from it.

While it is possible that even that sub-optimal technology can yield significant value to the organisation as a whole, this benefit may be short lived, as the cost-effective ability to change the architecture must inevitably decrease over time, reducing agility and therefore the ability to compete.

So a balance needs to be struck, between delivering enterprise value (even at the expense of individual projects) while maintaining relative technical and business agility. By relative I mean relative to peers in the same competitive sector…sectors which are themselves being disrupted by innovative technology firms which are very specialist and agile within their domain.

The concept of ‘capabilities’ realised through technology ‘products’, in addition to the traditional project/program management approach, is key to this. In particular, it recognises the following key trends:

  • Infrastructure- and platform-as-a-service
  • Increasingly tech-savvy work-force
  • Increasing controls on IT by regulators, auditors, etc
  • Closer integration of business functions led by ‘digital’ initiatives
  • The replacement of the desktop by mobile & IoT (Internet of Things)
  • The tension between innovation and standards in large organisations

Enterprises are adapting to all the above by recognising that the IT function cannot be responsible for both technical delivery and ensuring that all technology-dependent initiatives realise the value they were intended to realise.

As a result, many aspects of IT project and programme management are no longer driven out of the ‘core’ IT function, but by domain-specific change management functions. IT itself must consolidate its activities to focus on those activities that can only be performed by highly qualified and expert technologists.

The inevitable consequence of this transformation is that IT becomes more product driven, where a given product may support many projects. As such, IT needs to be clear on how to govern change for that product, to lead it in a direction that is most appropriate for the enterprise as a whole, and not just for any particular project or business line.

A product must provide capabilities to the stakeholders or users of that product. In the past, those capabilities were entirely decided by whatever IT built and delivered: if IT delivered something that in practice wasn’t entirely fit for purpose, then business functions had no alternative but to find ways to work around the system deficiencies – usually creating more complexity (through end-user-developed applications in tools like Excel etc) and more expense (through having to hire more people).

By taking a capability-based approach to product development, however, IT can give business functions more options and ways to work around inevitable IT shortfalls without compromising controls or data integrity – e.g., through controlled APIs and services, etc.

So, while solutions may explode in number and complexity, the number of products can be controlled – with individual businesses being more directly accountable for the complexity they create, rather than ‘IT’.

This approach requires a step-change in how traditional IT organisations manage change. Techniques from enterprise architecture, scaled agile, and DevOps are all key enablers for this new model of structuring the IT organisation.

In particular, except for product-strategy (where IT must be the leader), IT must get out of the business of deciding the relative value/importance of individual product changes requested by projects, which historically IT has been required to do. By imposing a governance structure to control the ‘epics’ and ‘stories’ that drive product evolution, projects and stakeholders have some transparency into when the work they need will be done, and demand can be balanced fairly across stakeholders in accordance with their ability to pay.

If changes implemented by IT do not end up delivering value, it should not be because IT delivered the wrong thing, but rather the right thing was delivered for the wrong reason. As long as IT maintains its product roadmap and vision, such mis-steps can be tolerated. But they cannot be tolerated if every change weakens the ability of the product platform to change.

Firms which successfully balance between the project and product view of their technology landscape will find that productivity increases, complexity is reduced and agility increases massively. This model also lends itself nicely to bounded domain development, microservices, use of container technologies and automated build/deployment – all of which will likely feature strongly in the enterprise technology platform of the future.

The changes required to support this are significant..in terms of financial governance, delivery oversight, team collaborations, and the roles of senior managers and leaders. But organisations must be prepared to do this transition, as historical approaches to enterprise IT software development are clearly unsustainable.

Transforming IT: From a solution-driven model to a capability-driven model

The hidden costs of PaaS & microservice engineering innovation

[tl;dr The leap from monolithic application development into the world of PaaS and microservices highlights the need for consistent collaboration, disciplined development and a strong vision in order to ensure sustainable business value.]

The pace of innovation in the PaaS and microservice space is increasing rapidly. This, coupled with increasing pressure on ‘traditional’ organisations to deliver more value more quickly from IT investments, is causing a flurry of interest in PaaS enabling technologies such as Cloud Foundry (favoured by the likes of IBM and Pivotal), OpenShift (favoured by RedHat), Azure (Microsoft), Heroku (SalesForce), AWS, Google Application Engine, etc.

A key characteristic of all these PaaS solutions is that they are ‘devops’ enabled – i.e., it is possible to automate both code and infrastructure deployment, enabling the way to have highly automated operational processes for applications built on these platforms.

For large organisations, or organisations that prefer to control their infrastructure (because of, for example, regulatory constraints), PaaS solutions that can be run in a private datacenter rather than the public cloud are preferable, as this a future option to deploy to external clouds if needed/appropriate.

These PaaS environments are feature-rich and aim to provide a lot of the building blocks needed to build enterprise applications. But other framework initiatives, such as Spring Boot, DropWizard and Vert.X aim to make it easier to build PaaS-based applications.

Combined, all of these promise to provide a dramatic increase in developer productivity: the marginal cost of developing, deploying and operating a complete application will drop significantly.

Due to the low capital investment required to build new applications, it becomes ever more feasible to move from a heavy-weight, planning intensive approach to IT investment to a more agile approach where a complete application can be built, iterated and validated (or not) in the time it takes to create a traditional requirements document.

However, this also has massive implications, as – left unchecked – the drift towards entropy will increase over time, and organisations could be severely challenged to effectively manage and generate value from the sheer number of applications and services that can be created on such platforms. So an eye on managing complexity should be in place from the very beginning.

Many of the above platforms aim to make it as easy as possible for developers to get going quickly: this is a laudable goal, and if more of the complexity can be pushed into the PaaS, then that can only be good. The consequence of this approach is that developers have less control over the evolution of key aspects of the PaaS, and this could cause unexpected issues as PaaS upgrades conflict with application lifecycles, etc. In essence, it could be quite difficult to isolate applications from some PaaS changes. How these frameworks help developers cope with such changes is something to closely monitor, as these platforms are not yet mature enough to have gone through a major upgrade with a significant number of deployed applications.

The relative benefit/complexity trade-off between established microservice frameworks such as OSGi and easier to use solutions such as described above needs to be tested in practice. Specifically, OSGi’s more robust dependency model may prove more useful in enterprise environments than environments which have a ‘move fast and break things’ approach to application development, especially if OSGi-based PaaS solutions such as JBoss Fuse on OpenShift and Paremus ServiceFabric gain more popular use.

So: all well and good from the technology side. But even if the pros and cons of the different engineering approaches are evaluated and a perfect PaaS solution emerges, that doesn’t mean Microservice Nirvana can be achieved.

A recent article on the challenges of building successful micro-service applications, coupled with a presentation by Lisa van Gelder at a recent Agile meetup in New York City, has emphasised that even given the right enabling technologies, deploying microservices is a major challenge – but if done right, the rewards are well worth it.

Specifically, there are a number of factors that impact the success of a large scale or enterprise microservice based strategy, including but not limited to:

  • Shared ownership of services
  • Setting cross-team goals
  • Performing scrum of scrums
  • Identifying swim lanes – isolating content failure & eventually consistent data
  • Provision of Circuit breakers & Timeouts (anti-fragile)
  • Service discoverability & clear ownership
  • Testing against stubs; customer driven contracts
  • Running fake transactions in production
  • SLOs and priorities
  • Shared understanding of what happens when something goes wrong
  • Focus on Mean time to repair (recover) rather than mean-time-to-failure
  • Use of common interfaces: deployment, health check, logging, monitoring
  • Tracing a users journey through the application
  • Collecting logs
  • Providing monitoring dashboards
  • Standardising common metric names

Some of these can be technically provided by the chosen PaaS, but a lot is based around the best practices consistently applied within and across development teams. In fact, it is quite hard to capture these key success factors in traditional architectural views – something that needs to be considered when architecting large-scale microservice solutions.

In summary, the leap from monolithic application development into the world of PaaS and microservices highlights the need for consistent collaboration, disciplined development and a strong vision in order to ensure sustainable business value.
The hidden costs of PaaS & microservice engineering innovation

Know What You Got: Just what, exactly, should be inventoried?

[tl;dr Application inventories linked to team structure, coupled with increased use of meta-data in the software-development process that is linked to architectural standards such as functional, data and business ontologies, is key to achieving long term agility and complexity management.]

Some of the biggest challenges with managing a complex technology platform is knowing when to safely retire or remove redundant components, or when there is a viable reusable component (instantiated or not) that can be used in a solution. (In this sense, a ‘component’ can be a script, a library, or a complete application – or a multitude of variations in-between.)

There are multiple ways of looking at the inventory of components that are deployed. In particular, components can view from the technology perspective, or from the business or usage perspective.

The technology perspective is usually referred to as configuration management, as defined by ITIL. There are many tools (known as ‘CMDB’s) which, using ‘fingerprints’ of known software components, can automatically inventorise which components are deployed where, and their relationships to each other. This is a relatively well-known problem domain – although years of poor deployment practice means that random components can be found running on infrastructure months or even years after the people who created it have moved on. Although the ITIL approach is imminently sensible in principle, in practice it is always playing catch-up because deployment practices are only improving slowly in legacy environments.

Current cloud-aware deployment practices encapsulated in dev-ops are the antithesis of this approach: i.e., all aspects of deployment are encapsulated in script and treated as source code in and of itself. The full benefits of the ITIL approach to configuration management will therefore only be realised when the transition to this new approach to deployment is fully completed (alongside the virtualisation of data centres).

The business or usage perspective is much harder: typically this is approximated by establishing an application inventory, and linking various operational accountabilities to that inventory.

But such inventories suffer from key failings..specifically:

  • The definition of an application is very subjective and generally determined by IT process needs rather than what makes sense from a business perspective.
  • Application inventories are usually closely tied to the allocation of physical hardware and/or software (such as operating systems or databases).
  • Applications tend to evolve and many components could be governed by the same ‘application’ – some of which may be supporting multiple distinct business functions or products.
  • Application inventories tend to be associated with running instances rather than (additionally) instantiable instances.
  • Application inventories may capture interface dependencies, but often do not capture component dependencies – especially when applications may consist of components that are also considered to be a part of other applications.
  • Application and component versioning are often not linked to release and deployment processes.

The consequence is that the application inventory is very difficult to align with technology investment and change, so it is not obvious from a business perspective which components could or should be involved in any given solution, and whether there are potential conflicts which could lead to excessive investment in redundant components, and/or consequent under-investment in potentially reusable components.

Related to this, businesses typically wish to control the changes to ‘their’ application: the thought of the same application being shared with other businesses is usually something only agreed as a last resort and under extreme pressure from central management: the default approach is for each business to be fully in control of the change management process for their applications.

So IT rationally interprets this bias as a license to create a new, duplicate application rather than put in place a structured way to share reusable components, such that technical dependencies can be managed without visibly compromising business-line change independence. This is rational because safe, scalable ways to reuse components is still a very immature capability most organisations do not yet have.

So what makes sense to inventory at the application level? Can it be automated and made discoverable?

In practice, processes which rely on manual maintenance of inventory information that is isolated from the application development process are not likely to succeed – principally because the cost of ensuring data quality will make it prohibitive.

Packaging & build standards (such as Maven, Ant, Gradle, etc) and modularity standards (such as OSGi for Java, Gems for Ruby, ECMAsript for JS, etc) describe software components, their purpose, dependencies and relationships. In particular, disciplined use of software modules may allow applications to self-declare their composition at run-time.

Non-reusable components, such as business-specific (or context-specific) applications, can in principle be packaged and deployed similarly to reusable modules, also with searchable meta-data.

Databases are a special case: these can generally be viewed as reusable, instantiated components – i.e., they may be a component of a number of business applications. The contents of databases should likely best be described through open standards such as RDF etc. However, other components (such as API components serving a defined business purpose) could also be described using these standards, linked to discoverable API inventories.

So, what precisely needs to be manually inventoried? If all technical components are inventoried by the software development process, the only components that remain to be inventoried must be outside the development process.

This article proposes that what exists outside the development process is team structure. Teams are usually formed and broken up in alignment with business needs and priorities. Some teams are in place for many years, some may only last a few months. Regardless, every application component must be associated with at least one team, and teams must be responsible for maintaining/updating the meta-data (in version control) for every component used by that team. Where teams share multiple components, a ‘principle’ team owner must be appointed for each component to ensure consistency of component meta-data, to handle pull-requests etc for shared components, and to oversee releases of those components. Teams also have relevance for operational support processes (e.g., L3 escalation, etc).

The frequency of component updates will be a reflection of development activity: projects which propose to update infrequently changing components can expect to have higher risk than projects whose components frequently change, as infrequently changing components may indicate a lack of current knowledge/expertise in the component.

The ability to discover and reason about existing software (and infrastructure) components is key to managing complexity and maintaining agility. But relying on armies of people to capture data and maintain quality is impractical. Traditional roadmaps (while useful as a communication tool) can deviate from reality in practice, so keeping them current (except for communication of intent) may not be a productive use of resources.

In summary, application inventories linked to team structure, and Increased use of meta-data in the software-development process that is linked to broader architectural standards (such as functional, data and business ontologies) are key to achieving agility and long term complexity management.

Know What You Got: Just what, exactly, should be inventoried?

Achieving modularity: functional vs volatility decomposition

Enterprise architecture is all about managing complexity. Many EA initiatives tend to focus on managing IT complexity, but there is only so much that can be done there before it becomes obvious that IT complexity is, for the most part, a direct consequence of enterprise complexity. To recap, complexity needs to be managed in order to maintain agility – the ability for an organisation to respond (relatively) quickly and efficiently to changes in markets, regulations or innovation, and to continue to do this over time.

Enterprise complexity can be considered to be the activities performed and resources consumed by the organisation in order to deliver ‘value’, a metric usually measured through the ability to maintain (financial) income in excess of expenses over time.

Breaking down these activities and resources into appropriate partitions that allow holistic thinking and planning to occur is one of the key challenges of enterprise architecture, and there are various techniques to do this.

Top-Down Decomposition

The natural approach to decomposition is to first understand what an organisation does – i.e., what are the (business) functions that it performs. Simply put, a function is a collection of data and decision points that are closely related (e.g., ‘Payments ‘is a function). Functions typically add little value in and of themselves – rather they form part of an end-to-end process that delivers value for a person or legal entity in some context. For example, a payment on its own means nothing: it is usually performed in the context of a specific exchange of value or service.

So a first course of action is to create a taxonomy (or, more accurately, an ontology) to describe the functions performed consistently across an enterprise. Then, various processes, products or services can be described as a composition of those functions.

If we accept (and this is far from accepted everywhere) that EA is focused on information systems complexity, then EA is not responsible for the complexity relating to the existence of processes, products or services. The creation or destruction of these are usually a direct consequence of business decisions. However, EA should be responsible for cataloging these, and ensuring these are incorporated into other enterprise processes (such as, for example, disaster recovery or business continuity processes). And EA should relate these to the functional taxonomy and the information systems architecture.

This can get very complex very quickly due to the sheer number of processes, products and services – including their various variations – most organisations have. So it is important to partition or decompose the complexity into manageable chunks to facilitate meaningful conversations.

Enterprise Equivalence Relations

One way to do this at enterprise level is to group functions into partitions (aka domains) according to synergy or autonomy (as described by Roger Sessions), for all products/services supporting a particular business. This approach is based on the mathematical concept of equivalenceBecause different functions in different contexts may have differing equivalence relationships, functions may appear in multiple partitions. One role of EA is to assess and validate if those functions are actually autonomous or if there is the potential to group apparently duplicate functions into a new partition.

Once partitions are identified, it is possible to apply ‘traditional’ EA thinking to a particular partition, because that partition is of a manageable size. By ‘traditional’ EA, I mean applying Zachman, TOGAF, PEAF, or any of the myriad methodologies/frameworks that are out there. More specifically, at that level, it is possible to establish a meaningful information systems strategy or goal for a particular partition that is directly supporting business agility objectives.

The Fallacy of Functional Decomposition

Once you get down to the level of partition, the utility of functional decomposition when it comes to architecting solutions becomes less useful. The natural tendency for architects would be to build reusable components or services that realise the various functions that comprise the partition. In fact, this may be the wrong thing to do. As Jüval Lowy demonstrates in his excellent webinar, this may result in more complexity, not less (and hence less agility).

When it comes to software architecture, the real reason to modularise your architecture is to manage volatility or uncertainty – and to ensure that volatility in one part of the architecture does not unnecessarily negatively impact another part of the architecture over time. Doing this allows agility to be maintained, so volatile parts of the application can, in fact, change frequently, at low impact to other parts of the application.

When looking at a software architecture through this lens, a quite different set of components/modules/services may become evident than those which may otherwise be obvious when using functional decomposition – the example in the webinar demonstrates this very well. A key argument used by Jüval in his presentation is that (to paraphrase him somewhat) functions are, in general, highly dependent on the context in which they are used, so to split them out into separate services may require making often impossible assumptions about all possible contexts the functions could be invoked in.

In this sense, identified components, modules or services can be considered to be providing options in terms of what is done, or how it is done, within the context of a larger system with parts of variable volatility. (See my earlier post on real options in the context of agility to understand more about options in this context.)

Partitions as Enterprise Architecture

When each partition is considered with respect to its relationship with other partitions, there is a lot of uncertainty around how different partitions will evolve. To allow for maximum flexibility, every partition should assume each other partition is a volatile part of their architecture, and design accordingly for this. This allows each partition to evolve (reasonably) independently with minimum fixed co-ordination points, without compromising the enterprise architecture by having different partitions replicate the behaviours of partitions they depend on.

This then allows:

  • Investment to be expressed in terms of impact to one or more partitions
  • Partitions to establish their own implementation strategies
  • Agile principles to be agreed on a per partition basis
  • Architectural standards to be agreed on a per partition basis
  • Partitions to define internally reusable components relevant to that partition only
  • Partitions to expose partition behaviour to other partitions in an enterprise-consistent way

In generative organisation cultures, partitions do not need to be organisationally aligned. However, in other organisation cultures (pathological or bureaucratic), alignment of enterprise infrastructure functions such as IT or operations (at least) with partitions (domains) may help accelerate the architectural and cultural changes needed – especially if coupled with broader transformations around investment planning, agile adoption and enterprise architecture.

Achieving modularity: functional vs volatility decomposition

Achieving Agile at Scale

[tl;dr Scaling agile at the enterprise level will need rethinking how portfolio management and enterprise architecture are done to ensure success.]

Agility,as a concept, is gaining increasing attention within large organisations. The idea that business functions – and in particular IT – can respond quickly and iteratively to business needs is an appealing one.

The reasons why agility is getting attention are easy to spot: larger firms are getting more and more obviously unagile – i.e., the ability of business functions to respond to business needs in a timely and sustainable manner is getting progressively worse, even as a rapidly evolving competitive and technology-led commercial environment is demanding more agility.

Couple that with the heavy cost of failing to meet ever increasing regulatory compliance obligations, and ‘agile’ seems a very good idea indeed.

Agile is a great idea, but when implemented at scale (in large enterprise organisations), it can actually reduce enterprise agility, rather than increase it, unless great care is taken.

This is partly because Agile’s origins come from developing web applications: in these scenarios, there is usually a clear customer, a clear goal (to the extent that the team exists in the first place), and relatively tight timelines that favour short or non-existent analysis/design phases. Agile is perfect for these scenarios.

Let’s call this scenario ‘local agile’. It is quite easy to see a situation where every team, in response to the question, ‘are you doing agile?’, for teams to say ‘Yes, we do!’. So if every team is doing ‘local agile’, does that mean your organisation is now ‘agile’?

The answer is No. Getting every team to adopt agile practices is a necessary but insufficient step towards achieving enterprise agility. In particular, two key factors needs to be addressed before true a firm can be said to be ‘agile’ at the enterprise level. These are:

  1. The process by which teams are created and funded, and
  2. Enterprise awareness

Creating & Funding (Agile) Teams

Historically, teams are usually created as a result of projects being initiated: the project passes investment justification criteria, the project is initiated and a team is put in place, led by the project manager. Also, this process was owned entirely by the IT organisation, irrespective of which other organisations were stakeholders in the project.

At this point, IT’s main consideration is, will the project be delivered on time and on budget? The business sponsor’s main consideration is, will it give us what we need when we need it? And the enterprise’s consideration (which is often ignored) is who is accountable for ensuring that the IT implementation delivers value to the enterprise. (In this sense, the ‘enterprise’ could be either a major business line with full P&L responsibility for all activities performed in support of their business, or the whole organisation, including shared enterprise functions).

Delivering ‘value’ is principally about ensuring that  on-going or operational processes, roles and responsibilities are adjusted to maximise the benefits of a new technology implementation – which could include organisational change, marketing, customer engagement, etc.

However, delivering ‘value’ is not always correlated to one IT implementation; value can be derived from leveraging multiple IT capabilities in concert. Given the complexity of large organisations, it is often neither desirable or feasible to have a single IT partner be responsible for all the IT elements that collectively deliver business value.

On this basis, it is evident that how businesses plan and structure their portfolio of IT investments needs to change dramatically. In particular,

  1. The business value agenda is outcome focused and explicit about which IT capabilities are required to enable it, and
  2. IT investment is focused around the capability investment lifecycle that IT is responsible for stewarding.

In particular ‘capabilities’ (or IT products or services) have a lifecycle: this affects the investment and expectations around those capabilities. And some capabilities need to be more ‘agile’ than others – some must be agile to be useful, whereas for others, stability may be the over-riding priority, and therefore their lack of agility must be made explicit – so agile teams can plan around that.

Enterprise Awareness

‘Locally’ agile teams are a step in the right direction – particularly if the business stakeholders all agree they are seeing the value from that agility. But often this comes at the expense of enterprise awareness. In short, agility in the strict business sense can often only deliver results by ignoring some stakeholders interests. So ‘locally’ agile teams may feel they must minimise their interactions with other teams – particularly if those teams are not themselves agile.

If we assume that teams have been created through a process as described in the previous section, it becomes more obvious where the team sits in relation to its obligations to other teams. Teams can then make appropriate compromises to their architecture, planning and agile SDLC to allow for those obligations.

If the team was created through ‘traditional’ planning processes, then it becomes a lot harder to figure out what ‘enterprise awareness’ is appropriate (except perhaps or IT-imposed standards or gates, which only contributes indirectly to business value).

Most public agile success stories describe very well how they achieved success up to  – but not including – the point at which architecture becomes an issue. Architecture, in this sense, refers to either parts of the solution architecture which can no longer be delivered via one or two members of an agile team, or those parts of the business value chain that cannot be entirely delivered via the agile team on its own.

However, there are success stores (e.g., Spotify) that show how ‘enterprise awareness’ can be achieved without limiting agility. For many organisations, transitioning from existing organisation structures to new ‘agile-ready’ structures will be a major challenge, and far harder than simply having teams ‘adopt agile’.

Conclusions

With the increased attention on Agile, there is fortunately increased attention on scaling agile. Methodologies like Disciplined Agile Development (DaD) and LargE Scale Scrum (LeSS), coupled with portfolio concepts like Scaled Agile Framework (SAFe) propose ways in which Agile can scale beyond the team and up to enterprise level, without losing the key benefits of the agile approach.

All scaled agile methodologies call for changes in how Portfolio Management and Enterprise Architecture are typically done within an organisation, as doing these activities right are key to the success of adopting Agile at scale.

Achieving Agile at Scale