What I realized from studying AWS Services & APIs

[tl;dr The weakest link for firms wishing to achieve business agility is principally based around the financial and physical constraints imposed by managing datacenters and infrastructure. The business goals of agile, devops and enterprise architecture are fundamentally unachievable unless these constraints can be fully abstracted through software services.]

Background

Anybody who has grown up with technology with the PC generation (1985-2005) will have developed software with a fairly deep understanding of how the software worked from an OS/CPU, network, and storage perspective. Much of that generation would have had some formal education in the basics of computer science.

Initially, the PC generation did not have to worry about servers and infrastructure: software ran on PCs. As PCs became more networked, dedicated PCs to run ‘server’ software needed to be connected to the desktop PCs. And folks tasked with building software to run on the servers would also have to buy higher-spec PCs for server-side, install (network) operating systems, connect them to desktop PCs via LAN cables, install disk drives and databases, etc. This would all form part of the ‘waterfall’ project plan to deliver working software, and would all be rather predictable in timeframes.

As organizations added more and more business-critical, network-based software to their portfolios, organization structures were created for datacenter management, networking, infrastructure/server management, storage and database provisioning and operation, middleware management, etc, etc. A bit like the mainframe structures that preceded the PC generation, in fact.

Introducing Agile

And so we come to Agile. While Agile was principally motivated by the flexibility in GUI design offered by HTML (vs traditional GUI design) – basically allowing development teams to iterate rapidly over, and improve on, different implementations of UI – ‘Agile’ quickly became more ‘enterprise’ oriented, as planning and coordinating demand across multiple teams, both infrastructure and application development, was rapidly becoming a massive bottleneck.

It was, and is, widely recognized that these challenges are largely cultural – i.e., that if only teams understood how to collaborate and communicate, everything would be much better for everyone – all the way from the top down. And so a thriving industry exists in coaching firms how to ‘improve’ their culture – aka the ‘agile industrial machine’.

Unfortunately, it turns out there is no silver bullet: the real goal – organizational or business agility – has been elusive. Big organizations still expend vast amounts of time and resources doing small incremental change, most activity is involved in maintaining/supporting existing operations, and truly transformational activities which bring an organization’s full capabilities together for the benefit of the customer still do not succeed.

The Reality of Agile

The basic tenet behind Agile is the idea of cross-functional teams. However, it is obvious that most teams in organizations are unable to align themselves perfectly according to the demand they are receiving (i.e., the equivalent of providing a customer account manager), and even if they did, the number of participants in a typical agile ‘scrum’ or ‘scrum of scrums’ meeting would quickly exceed the consensus maximum of about 9 participants needed for a scrum to be successful.

So most agile teams resort to the only agile they know – i.e., developers, QA and maybe product owner and/or scrum-master participating in daily scrums. Every other dependency is managed as part of an overall program of work (with communication handled by a project/program manager), or through on-demand ‘tickets’ whereby teams can request a service from other teams.

The basic impact of this is that pre-planned work (resources) gets prioritized ahead of on-demand ‘tickets’ (excluding tickets relating to urgent operational issues), and so agile teams are forced to compromise the quality of their work (if they can proceed at all).

DevOps – Managing Infrastructure Dependencies

DevOps is a response to the widening communications/collaboration chasm between application development teams and infrastructure/operations teams in organizations. It recognizes that operational and infrastructural concerns are inherent characteristics of software, and software should not be designed without these concerns being first-class requirements along with product features/business requirements.

On the other hand, infrastructure/operations providers, being primarily concerned with stability, seek to offer a small number of efficient standardized services that they know they can support. Historically, infrastructure providers could only innovate and adapt as fast as hardware infrastructure could be procured, installed, supported and amortized – which is to say, innovation cycles measured in years.

In the meantime, application development teams are constantly pushing the boundaries of infrastructure – principally because most business needs can be realized in software, with sufficiently talented engineers, and those tasked with building software often assume that infrastructure can adapt as quickly.

Microservices – Managing AppDev Team to AppDev Team Dependencies

While DevOps is a response to friction in application development and infrastructure/operations engagement, microservices can be usefully seen as a response to how application development team can manage dependencies on each other.

In an ideal organization, an application development team can leverage/reuse capabilities provided by another team through their APIs, with minimum pre-planning and up-front communication. Teams would expose formal APIs with relevant documentation, and most engagement could be confined to service change requests from other teams and/or major business initiatives. Teams would not be required to test/deploy in lock-step with each other.

Such collaboration between teams would need to be formally recognized by business/product owners as part of the architecture of the platform – i.e., a degree of ‘mechanical sympathy’ is needed by those envisioning new business initiatives to know how best to leverage, and extend, software building blocks in the organization. This is best done by Product Management, who must steward the end-to-end business and data architecture of the organization or value-stream in partnership with business development and engineering.

Putting it all together

To date, most organizations have been fighting a losing battle. The desire to do agile and devops is strong, but the fundamental weakness in the chain is the ability for internal infrastructure providers and operators to move as fast as software development teams need them to – issues as much related to financial management as it is to managing physical buildings, hardware, etc.

What cloud providers are doing is creating software-level abstractions of infrastructure services, allowing the potential of agile, devops and microservices to begin to be realized in practice.

Understanding these services and abstractions is like re-learning the basic principles of Computer Science and Engineering – but through a ‘service’ lens. The same issues need to be addressed, the same technical challenges exist. Except now some aspects of those challenges no longer need to be solved by organizations (e.g., how to efficiently abstract infrastructure services at scale), and businesses can focus on the designing the infrastructure services that are matched with the needs of application developers (rather than a compromise).

Conclusion

The AWS Service Catalog and APIs is an extraordinary achievement (as is similar work by other cloud providers, although they have yet to achieve the catalog breadth that AWS has). Architects need to know and understand these service abstractions and focus on matching application needs with business needs, and can worry less about the traditional constraints infrastructure organizations have had to work with.

In many respects, the variations between these abstractions across providers will vary only in syntax and features. Ultimately (probably at least 10 years from now) all commodity services will converge, or be available through efficient ‘cross-plane’ solutions which abstract providers. So that is why I am choosing to ‘go deep’ on the AWS APIs. This is, in my opinion, the most concrete starting point to helping firms achieve ‘agile’ nirvana.

What I realized from studying AWS Services & APIs

What I learned from using Kubernetes

What is Kubernetes?

Kubernetes is a fundamentally a automated resource management platform – principally for computational resources (CPU, RAM, networks, local storage). It realises the ideas behind the ‘cattle not pets‘ approach to IT infrastructure, by defining in software what previously was implicit in infrastructure configuration and provisioning.

In particular, Kubernetes enables continuous integration/continuous delivery (CI/CD) processes which are critical to achieving the business benefits (agility, reliability, security) that underpin devops (Kim et al).

The standard for abstracting computational resources today is the container  – essentially, technologies based on OS virtual isolation primitives first provided by Linux. So Kubernetes (and other resource orchestration tools) focus on orchestrating (i.e., managing the lifecycle of) containers. The de-facto container standard is Docker.

Cloud providers (such as AWS, Azure, GCP and Digital Ocean) specialize in abstracting computational resources via services. However, while all support Docker containers as the standard for runnable images, they each have different APIs and interfaces to engage with the management services they provide. Kubernetes is already a de-facto standard for resource management abstraction that can be implemented both in private data centers as well as on cloud, which makes it very attractive for firms that care about cloud-provider lock-in.

It is worth noting that currently Kubernetes focuses on orchestrating containers, which can be run on either virtual machines or on bare-metal (i.e., OS without hypervisor). However, as VM technology becomes more efficient – and in particular the cost of instantiating VMs decreases – it is possible that Kubernetes will orchestrate VMs, and not (just) containers. This is because containers do not provide a safe multi-tenancy model, and there are many good reasons why firms, even internally, will seek stronger isolation of processes running on the same infrastructure than that offered by containers – especially for software-as-a-service offerings where different clients do need strong separation from each others process instances.

What Kubernetes is not

It is easy to mistake Kubernetes as a solution to be used by development teams: as the Kubernetes website makes clear, it is not a development platform in and of itself, but rather is technology upon which such platforms can be built.

An example may be platforms like Pivotal’s Container Service (PKS), which is migrating from its own proprietary resource management solution to using Kubernetes under the hood (and providing a path for VMware users to Kubernetes in the process). For Java, frameworks like Spring Boot and Spring Cloud provide good abstractions for implementing distributed systems, but these can be implemented on top of Kubernetes without developers needing to be aware.

A key benefit of Kubernetes is that it is language/framework agnostic. The downside is that this means that some language or platform-specific abstractions (tools, APIs, policies, etc) need to be created in order for Kubernetes to be useful to development teams, and to avoid the need for developers to master Kubernetes themselves.

For legacy applications, or applications with no distribution framework, some language-neutral mechanism should be used to allow development teams to describe their applications in terms that can enable deployment and configuration automation, but that does not bind kubernetes-specific dependencies to those applications (for example, Spring Cloud).

For new projects, some investment in both general-purpose and language-specific platform tooling will be needed to maximize the benefits of Kubernetes – whether this is a 3rd party solution or developed in-house is a decision each organization needs to make. It is not a viable option to delegate this decision to individual (pizza-sized) teams, assuming such teams are staffed principally with folks charged with implementing automated business processes and not automated infrastructure processes.

How cool is Kubernetes?

It is very cool – only a little bit less cool than using a cloud provider and the increasing number of infrastructure & PaaS services they offer. Certainly, for applications which are not inherently cloud native, Kubernetes offers a viable path to achieving CI/CD nirvana, and hence offers a potential path to improvements in deployment agility, security and reliability for applications which for various reasons are not going to be migrated to cloud or re-engineered.

It takes a remarkably small amount of time for an engineer to install and configure a Kubernetes cluster on resources they have direct control over (e.g., MiniKube or Docker on a laptop, EKS on AWS, or a bunch of VMs you have root access to in your data center). Creating a dockerfile for an application where dev/test environments are fully specified in code (e.g., Vagrant specifications) is a fairly quick exercise. Deploying to a kubernetes cluster is quick and painless – once you have a docker image repository setup and configured.

Technical challenges arise when it comes to network configuration (e.g., load balancers, NAT translation, routers, firewalls, reverse proxies, API gateways, VPNs/VPCs, etc), automation of cluster infrastructure definition, automation of dockerfile creation, storage volume configuration and management, configuration parameter injection, secrets management, logging and monitoring, namespace management, etc. Advanced networking such as service mesh, policy control, use of multiple network interfaces, low-latency routing to external services, multi-region resilience and recovery are typically not considered a priority during development, but are critical for production environments and must also be automated. All this is doable via Kubernetes mechanisms, but is not for free.

In short, Kubernetes is only the start of the journey to implement zero-touch infrastructure and therefore to make infrastructure provisioning a seamless part of the software development process.

So what should a Kubernetes strategy be?

For many firms, Kubernetes could end up being nothing but a stepping stone to full serverless architectures – more on serverless another time, but in essence it means all resource abstractions for an application are fully encoded in code deployed as part of a serverless application.

For firms that are moving towards a more fully integrated polyglot microservices-type architecture (i.e., with more and more infrastructure and business capabilities delivered as services rather than deployable artifacts or code), and where direct control over network, storage and/or compute infrastructure is seen as a competitive necessity, then Kubernetes seems like a prudent option.

How microservice frameworks evolve with respect to their use of Kubernetes will be critical: in particular, frameworks should ideally be able to happy co-exist with other frameworks in the same Kubernetes cluster. Reducing clusters to single-application usage would be a step backwards – although in the context of many microservices written using the same language framework deployed to the same cluster, perhaps this is acceptable (provided operations teams did not see such microservices as parts of a single monolithic application).

Irrespective of a microservices strategy, deploying applications to containers may be seen as a way to efficiently manage the deployment and configuration of applications across multiple environments. In this regard, there is a convergence of innovation (see PKS above) in the definition, creation and orchestration of virtual machines and containers (as noted above), which may eventually make it more sensible for enterprises which have already made the move to virtualization (via VMWare) to continue along that technology path rather than prematurely taking the Kubernetes/Container path, as legacy applications struggle with the ephemeral nature of containers. In either case, the goal is zero-touch infrastructure via ‘infrastructure-as-code‘, and process automation such as ‘gitops‘, as these are what will ultimately deliver devops business goals.

Summary

So, the key takeaways are:

  • Kubernetes is important for organizations where containerization is part of a broader (microservices/distributed systems/cloud) strategy, and not just deployment/configuration automation/optimization.
  • Organizations should at least learn how to operate Kubernetes clusters at scale, as this function will likely remain siloed.
  • Developing in-house kubernetes engineering capabilities is a strategic business question. For most enterprises, let 3rd parties focus on this (via open-source solutions)
  • For organizations which heavily use VMware VMs, the path to Kubernetes is likely via VMs (which are long-lived) rather than via Containers (which are ephemeral). Commercial VM managers are expensive, but effectively using Kubernetes is not free or cheap either.
  • Organizations should seriously assess the role serverless technologies could play in future technical roadmaps.
What I learned from using Kubernetes

The Learning CTO’s Strategic Themes (2015) Revisited

The Learning CTO’s Strategic Themes (2015) Revisited

Way back in late 2014, I published what I considered at the time to be key themes that would dominate the technology landscape in 2015. 4 years on, I’m revisiting these themes. Strategic themes for 2019 will be covered in a separate blog post.

2015 Strategic Themes Recap

The Lean EnterpriseThe cultural and process transformations necessary to innovate and maintain agility in large enterprises – in technology, financial management, and risk, governance and compliance.
Enterprise ModularityThe state-of-the-art technologies and techniques which enable agile, cost effective enterprise-scale integration and reuse of capabilities and services. Aka SOA Mark II.
Continuous DeliveryThe state-of-the-art technologies and techniques which brings together agile software delivery with operational considerations. Aka DevOps Mark I.
Systems Theory & Systems ThinkingThe ability to look at the whole as well as the parts of any dynamic system, and understand the consequences/impacts of decisions to the whole or those parts.
Machine LearningUsing business intelligence and advanced data semantics to dynamically improve automated or automatable processes.

The Lean Enterprise

Over the past few years, the most obvious themes related to the ‘lean enterprise’ have been simultaneously the focus on ‘digital transformation’ and scepticism of the ‘agile-industrial’ machine (see Fowler (2018)). 

Firms have finally realized that digital transformation extends well beyond having a mobile application and a big-data analytics solution (see Mastering the Mix). However, this understanding is far from universal, and many firms believe digital transformation involves funding a number of ‘digital projects’ without changing fundamentally how the business plans, prioritises and funds both change initiatives and on-going operations.

The implications of this are nicely captured in Mark Schwarz’s ‘Digital CFO’ blog post. In essence, if the CFO function hasn’t materially changed how it works, it’s hard to see how the rest of the organization can truly be ‘digital’, irrespective of how much it is spending on ‘digital transformation’.

Other notable material on this subject is Barry O’Reilly’s “Unlearn” book – more on this when I have read it, but essentially it recommends folks (and corporations) unlearn mindsets and behaviours in order to excel in the fast-changing, technology-enabled environments of today.

Related to digital transformation is the so-called ‘agile industrial’ machine, which aims to sell agile to enterprises. Every firm wants to ‘be’ agile, but many firms end up ‘doing’ agile (often imposing ceremony on teams) – and are then surprised when overall business agility does not change. If ‘agile’ isn’t led and implemented by appropriately structured cross-functional value-stream aligned teams, then it is not going to move the dial.

The latest and best thinking on end-to-end value-stream delivery improvement and optimization is from Mik Kersten, as captured in the book Project-to-Product. This is a significant development in digital transformation execution, and merits further understanding – in particular, potential implications for teams to adopting cloud and other self-service-based capabilities as part of delivering value to customers.

Enterprise Modularity

These days, ‘enterprise modularity’ is dominated principally by microservices, and in particular the engineering challenges associated with delivering microservices-based architectures.

While there is no easy (cheap) solution to unwinding the complexity of non-modular ‘ball-of-mud’ legacy applications, numerous patterns and solutions exist to enable such applications to participate in a enterprise’s microservices eco-system (the work by Chris Richardson in this space is especially notable). Indeed, if these ‘legacy’ applications can be packaged up in containers and orchestrated via automation-friendly, framework-neutral tools like Kubernetes, it would go some way to extend the operable longevity and potential re-usability of these platforms. But for many legacy systems, the business case of such a migration is not yet obvious, especially when viewed in the context of overall end-to-end value-stream delivery (vs the horizontal view of central-IT optimizing for itself).

It is interesting to note that (synchronous) RESTful APIs are still the dominant communication mechanism for distributed modular applications – most likely because no good decentralized alternative to enterprise-service buses has been identified. And while event-driven architectures are getting more popular, IT organizations will generally avoid having centrally managed messaging infrastructure that is not completely self-service (and self-governed), as otherwise the central bus team becomes a significant bottleneck to every other team.

But RESTful APIs are complex when used at scale – hence the need for software patterns like ‘Circuit Breaker‘, Service Discovery and Observability, among others. Service mesh technologies like Istio and Envoy need to be introduced to fill the gaps – which need their own management, but in principle do not impair team-level progress.

Asynchronous messaging-based technologies such as Rabbit MQ address many of these issues, but come with their own complexities. Messaging technologies, especially those providing any level of delivery guarantee, require their own management. But, for the most part, these can be delivered as managed services used on a self-service basis.

For cloud-based systems, usually the cloud provider will have some messaging services available to use. Messaging systems are also available packaged into Docker containers and can be readily configured and launched on cloud infrastructure.

Related to distributed IPC is network configuration and security: with the advent of cloud, security is no longer confined to the ‘perimeter’ of the datacenter. This mean network configurations, authentication/encryption and security monitoring much be implemented and configured at a per-service level. This is not scalable without a significant level of infrastructure configuration automation (i.e., infrastructure-as-code), and enabled through mesh or messaging technologies.

Finally, it is worth noting that most of the effort today is going into framework- and language-neutral solutions. Language specific frameworks such as OSGi and Cloud Foundry (for Java), are generally assumed to be built on top of these capabilities, but are not necessary to take advantage of them. Indeed, any software written in any language should be able to leverage these technologies.

So, should IT departments mandate use of frameworks? In the end, to avoid developers needing to know too much about the specifics about networking, orchestration, packaging, deployment, etc, some level of abstraction is required. The question is who provides this abstraction – the cloud provider (e.g., serverless), an in-house SRE engineering team, 3rd party software vendors (such as Pivotal), or application-development teams themselves?

Continuous Delivery

The state of DevOps has advanced a lot since 2018, as described by the research published by DORA (just bought by Google, in a sign that it seems to be getting serious about listening to the needs of its cloud customers).

The findings have been compiled into a series of very solid principles and practices published in the book Accelerate – book well worth reading.

Continuous delivery itself is still an aspiration for most firms – even being capable of doing it in principle, vs actually doing it. But it seems to be evident that not having a continuous delivery capability will severely impact the extent to which a firm can succeed at digital transformation.

Whether a firm needs to digitally transform at all (and implement all the findings from the DevOps survey) depends largely on whether its business model is natively digital..the popular view these days is that every industry and business model must eventually go digital to compete.

A key aspect of ‘continuous delivery’ is that there is never a case of a system reaching ‘stability’ – i.e., that no further changes are needed (or are needed very rarely). This approach worked in the days of packaged software, but for software delivered as a service, it is impractical. Change always is needed – even if it is principally for security/operational purposes and not for features. To avoid the cost of supporting/maintaining existing software dragging down resources available for new software, software maintenance must be highly automated – including automated builds, testing, configuration, deployment and monitoring – of both infrastructure and applications.

If devops practices are not adopted and invested in, expect a high and increasing proportion of IT costs to be towards (high value) maintenance work and less towards new (uncertain value) development. With devops practices, there should be a much more consistent balance over time, even as new features continually get deployed.

Systems Theory & Systems Thinking

Systems Theory & Systems Thinking is still a fairly niche topic area, with aspects of it being addressed by the Value Stream Architecture work mentioned above, as well as continuing work in frameworks like Cynefin and business model canvas content from Tom Graves, insightful content on architecture from Graham Berrisford , ground-breaking work from Simon Wardley on strategy maps and some interesting capabilities and approaches from a number of EA tool vendors.

Generally, technical architects tend to focus on system behavior from the perspective of applications and infrastructure. ‘Human activity’ systems is rarely given sufficient thought by technical architects (beyond the act of actually developing the required software). In fact, much of the ‘devops’ movement is driven by human activity systems as they relate to the desired behavior of applications and infrastructure.

On the flip side, technical architects and implementation teams tend to rely on product owners to address the human activity systems relating to the use of applications. The balance between appropriate reliance/engagement of users vs technology (and supporting technologists) in the definition of application behaviour is where product owners have an opportunity to make a significant impact, and where IT implementation teams need to be demanding more from their product owners.

With respect to product roadmaps, product managers should encourage the use of techniques such as the Wardley Maps mentioned above to better position where investment should lie, and ensure IT teams are using 3rd party solutions appropriately rather than trying to engineer commodity/utility solutions in-house.

Machine Learning

Machine learning has had a massive resurgence over the past few years, driven principally by simple but popular use cases (such as recommendation engines) as well as more advanced use cases such as self-driving cars. A significant factor is the sheer volume of data available to train machine-learning algorithms.

As ever, the appropriate use of machine learning is still an area in need of development: machine learning is often seen by corporations as a means of reducing costs through eliminating the need to rely on humans, leading to fear and scepticism around the adoption of machine learning technologies.

For the most part, however, machine learning will be used to augment or support human activity in the face of ever increasing complexity and data. For example, in the realm of devops, an explosion in the number of interacting components in applications will make supporting and operating complex distributed systems of the futures orders of magnitude more complex than we have today – and we don’t do a particularly good job of managing such systems even today! Without some form of augmentation, humans would simply not be up to the task.

The arrival of pay-as-you-go machine learning services such as AWS SageMaker and Rekognition herald a new era where machine learning capabilities can be within reach of ‘average’ development teams, without necessarily requiring AI experts or PhD-level statisticians to be part of the teams.

In reality, machine learning can only be used for mature processes for which much data is available: humans will, for the foreseeable future at least, be much better at addressing new or immature situations.

An interesting side-effect of the focus on machine-learning is the increased interest in semantic data: general machine-learning is impossible without learning to describe data semantics. However, most firms would benefit from this practice even without machine learning. General and deep machine learning appears to be creating an increase of interest in semantic data standards such as RDF and Open Linked Data but hopefully interest in these will trickle down to more mundane but critical tasks such as system integration efforts and data lake implementations.

The Learning CTO’s Strategic Themes (2015) Revisited

Why IoT is changing Enterprise Architecture

[tl;dr The discipline of enterprise architecture as an aid to business strategy execution has failed for most organizations, but is finding a new lease of life in the Internet of Things.]

The strategic benefits of having an enterprise-architecture based approach to organizational change – at least in terms of business models and shared capabilities needed to support those models – have been the subject of much discussion in recent years.

However, enterprise architecture as a practice (as espoused by The Open Group and others) has never managed to break beyond it’s role as an IT-focused endeavor.

In the meantime, less technology-minded folks are beginning to discuss business strategy using terms like ‘modularity’, which is a first step towards bridging the gap between the business folks and the technology folks. And technology-minded folks are looking at disruptive business strategy through the lens of the ‘Internet of Things‘.

Business Model Capability Decomposition

Just like manufacturing-based industries decomposed their supply-chains over the past 30+ years (driving an increasingly modular view of manufacturing), knowledge-based industries are going through a similar transformation.

Fundamentally, knowledge based industries are based on the transfer and management of human knowledge or understanding. So, for example, you pay for something, there is an understanding on both sides that that payment has happened. Technology allows such information to be captured and managed at scale.

But the ‘units’ of knowledge have only slowly been standardized, and knowledge-based organizations are incentivized to ensure they are the only ones to be able to act on the information they have gathered – to often disastrous social and economic consequences (e.g., the financial crisis of 2008).

Hence, regulators are stepping into to ensure that at least some of this ‘knowledge’ is available in a form that allows governments to ensure such situations do not arise again.

In the FinTech world, every service provided by big banks is being attacked by nimble competitors able to take advantage of new, more meaningful technology-enabled means of engaging with customers, and who are willing to make at least some of this information more accessible so that they can participate in a more flexible, dynamic ecosystem.

For these upstart FinTech firms, they often have a stark choice to make in order to succeed. Assuming they have cleared the first hurdle of actually having a product people want, at some point, they must decide whether they are competing directly with the big banks, or if they are providing a key part of the electronic financial knowledge ecosystem that big banks must (eventually) be part of.

In the end, what matters is their approach to data: how they capture it (i.e., ‘UX’), what they do with it, how they manage it, and how it is leveraged for competitive and commercial advantage (without falling foul of privacy laws etc). Much of the rest is noise from businesses trying to get attention in an increasingly crowded space.

Historically, many ‘enterprise architecture’ or strategy departments fail to have impact because firms do not treat data (or information, or knowledge) as an asset, but rather as something to be freely and easily created and shunted around, leaving a trail of complexity and lost opportunity cost wherever it goes. So this attitude must change before ‘enterprise architecture’ as a concept will have a role in boardroom discussions, and firms change how they position IT in their business strategy. (Regulators are certainly driving this for certain sectors like finance and health.)

Internet of Things

Why does the Internet Of Things (IoT) matter, and where does IoT fit into all this?

At one level, IoT presents a large opportunity for firms which see the potential implied by the technologies underpinning IoT; the technology can provide a significant level of convenience and safety to many aspects of a modern, digitally enabled life.

But fundamentally, IoT is about having a large number of autonomous actors collaborating in some way to deliver a particular service, which is of value to some set of interested stakeholders.

But this sounds a lot like what a ‘company’ is. So IoT is, in effect, a company where the actors are technology actors rather than human actors. They need some level of orchestration. They need a common language for communication. And they need laws/protocols that govern what’s permitted and what is not.

If enterprise architecture is all about establishing the functional, data and protocol boundaries between discrete capabilities within an organization, then EA for IoT is the same thing but for technical components, such as sensors or driverless cars, etc.

So IoT seems a much more natural fit for EA thinking than traditional organizations, especially as, unlike departments in traditional ‘human’ companies, technical components like standards: they like fixed protocols, fixed functional boundaries and well-defined data sets. And while the ‘things’ themselves may not be organic, their behavior in such an environment could exhibit ‘organic’ characteristics.

So, IoT and and the benefits of an enterprise architecture-oriented approach to business strategy do seem like a match made in heaven.

The Converged Enterprise

For information-based industries in particular, there appears to be an inevitable convergence: as IoT and the standards, protocols and governance underpinning it mature, so too will the ‘modular’ aspects of existing firms operating models, and the eco-system of technology-enabled platforms will mature along with it. Firms will be challenged to deliver value by picking the most capable components in the eco-system around which to deliver unique service propositions – and the most successful of those solutions will themselves become the basis for future eco-systems (a Darwinian view of software evolution, if you will).

The converged enterprise will consist of a combination of human and technical capabilities collaborating in well-defined ways. Some capabilities will be highly human, others highly technical, some will be in-house, some will be part of a wider platform eco-system.

In such an organization, enterprise architects will find a natural home. In the meantime, enterprise architects must choose their starting point, behavioral or structural: focusing first on decomposed business capabilities and finishing with IoT (behavioral->structural), or focusing first on IoT and finishing with business capabilities (structural->behavioral).

Technical Footnote

I am somewhat intrigued at how the OSGi Alliance has over the years shifted its focus from basic Java applications, to discrete embedded systems, to enterprise systems and now to IoT. OSGi, (disappointingly, IMO), has had a patchy record changing how firms build enterprise software – much of this is to do with a culture of undisciplined dependency management in the software industry which is very, very hard to break.

IoT raises the bar on dependency management: you simply cannot comprehensively test software updates to potentially hundreds of thousands or millions of components running that software. The ability to reliably change modules without forcing a test of all dependent instantiated components is a necessity. As enterprises get more complex and digitally interdependent, standards such as OSGi will become more critical to the plumbing of enabling technologies. But as evidenced above, for folks who have tried and failed to change how enterprises treat their technology-enabled business strategy, it’s a case of FIETIOT – Failed in Enterprise, Trying IoT. And this, indeed, seems a far more rational use of an enterprise architect’s time, as things currently stand.

 

 

 

 

 

 

Why IoT is changing Enterprise Architecture

Transforming IT: From a solution-driven model to a capability-driven model

[tl;dr Moving from a solution-oriented to a capability-oriented model for software development is necessary to enable enterprises to achieve agility, but has substantial impacts on how enterprises organise themselves to support this transition.]

Most organisations which manage software change as part of their overall change portfolio take a project-oriented approach to delivery: the project goals are set up front, and a solution architecture and delivery plan are created in order to achieve the project goals.

Most organisations also fix project portfolios on a yearly basis, and deviating from this plan can often very difficult for organisations to cope with – at least partly because such plans are intrinsically tied into financial planning and cost-saving techniques such as capitalisation of expenses, etc, which reduce bottom-line cost to the firm of the investment (even if it says nothing about the value added).

As the portfolio of change projects rise every year, due to many extraneous factors (business opportunities, revenue protection, regulatory demand, maintenance, exploration, digital initiatives,  etc), cross-project dependency management becomes increasingly difficult. It becomes even more complex to manage solution architecture dependencies within that overall dependency framework.

What results is a massive set of compromises that ends up with building solutions that are sub-optimal for pretty much every project, and an investment in technology that is so enterprise-specific, that no other organisation could possibly derive any significant value from it.

While it is possible that even that sub-optimal technology can yield significant value to the organisation as a whole, this benefit may be short lived, as the cost-effective ability to change the architecture must inevitably decrease over time, reducing agility and therefore the ability to compete.

So a balance needs to be struck, between delivering enterprise value (even at the expense of individual projects) while maintaining relative technical and business agility. By relative I mean relative to peers in the same competitive sector…sectors which are themselves being disrupted by innovative technology firms which are very specialist and agile within their domain.

The concept of ‘capabilities’ realised through technology ‘products’, in addition to the traditional project/program management approach, is key to this. In particular, it recognises the following key trends:

  • Infrastructure- and platform-as-a-service
  • Increasingly tech-savvy work-force
  • Increasing controls on IT by regulators, auditors, etc
  • Closer integration of business functions led by ‘digital’ initiatives
  • The replacement of the desktop by mobile & IoT (Internet of Things)
  • The tension between innovation and standards in large organisations

Enterprises are adapting to all the above by recognising that the IT function cannot be responsible for both technical delivery and ensuring that all technology-dependent initiatives realise the value they were intended to realise.

As a result, many aspects of IT project and programme management are no longer driven out of the ‘core’ IT function, but by domain-specific change management functions. IT itself must consolidate its activities to focus on those activities that can only be performed by highly qualified and expert technologists.

The inevitable consequence of this transformation is that IT becomes more product driven, where a given product may support many projects. As such, IT needs to be clear on how to govern change for that product, to lead it in a direction that is most appropriate for the enterprise as a whole, and not just for any particular project or business line.

A product must provide capabilities to the stakeholders or users of that product. In the past, those capabilities were entirely decided by whatever IT built and delivered: if IT delivered something that in practice wasn’t entirely fit for purpose, then business functions had no alternative but to find ways to work around the system deficiencies – usually creating more complexity (through end-user-developed applications in tools like Excel etc) and more expense (through having to hire more people).

By taking a capability-based approach to product development, however, IT can give business functions more options and ways to work around inevitable IT shortfalls without compromising controls or data integrity – e.g., through controlled APIs and services, etc.

So, while solutions may explode in number and complexity, the number of products can be controlled – with individual businesses being more directly accountable for the complexity they create, rather than ‘IT’.

This approach requires a step-change in how traditional IT organisations manage change. Techniques from enterprise architecture, scaled agile, and DevOps are all key enablers for this new model of structuring the IT organisation.

In particular, except for product-strategy (where IT must be the leader), IT must get out of the business of deciding the relative value/importance of individual product changes requested by projects, which historically IT has been required to do. By imposing a governance structure to control the ‘epics’ and ‘stories’ that drive product evolution, projects and stakeholders have some transparency into when the work they need will be done, and demand can be balanced fairly across stakeholders in accordance with their ability to pay.

If changes implemented by IT do not end up delivering value, it should not be because IT delivered the wrong thing, but rather the right thing was delivered for the wrong reason. As long as IT maintains its product roadmap and vision, such mis-steps can be tolerated. But they cannot be tolerated if every change weakens the ability of the product platform to change.

Firms which successfully balance between the project and product view of their technology landscape will find that productivity increases, complexity is reduced and agility increases massively. This model also lends itself nicely to bounded domain development, microservices, use of container technologies and automated build/deployment – all of which will likely feature strongly in the enterprise technology platform of the future.

The changes required to support this are significant..in terms of financial governance, delivery oversight, team collaborations, and the roles of senior managers and leaders. But organisations must be prepared to do this transition, as historical approaches to enterprise IT software development are clearly unsustainable.

Transforming IT: From a solution-driven model to a capability-driven model

Culture, Collaboration & Capabilities vs People, Process & Technology

[TL;DR The term ‘people, process and technology’ has been widely understood to represent the main dimensions impacting how organisations can differentiate themselves in a fast-changing technology-enabled world. This article argues that this expression may be misinterpreted with the best of intentions, leading to undesirable/unintended outcomes. The alternative, ‘culture, collaboration and capability’ is proposed.]

People, process & technology

When teams, functions or organisations are under-performing, the underlying issues can usually be narrowed down to one or more of the dimensions of people, process and technology.

Unfortunately, these terms can lead to an incorrect focus. Specifically,

  • ‘People’ can be understood to mean individuals who are under-performing or somehow disruptive to overall performance
  • ‘Process’ can be understood to mean formal business processes, leading to a focus on business process design
  • ‘Technology’ can be understood to mean engineering or legacy technology challenges which are resolvable only by replacing or updating existing technology

In many cases, this may in fact be the approach needed: fire the disruptive individual, redesign business processes using Six Sigma experts, or find another vendor selling technology that will solve all your engineering challenges.

In general, however, these approaches are neither practical nor desirable. Removing people can be fraught with challenges, and should only be used as a last resort. Firms using this as a way to solve problems will rapidly build up a culture of distrust and self-preservation.

Redesigning business processes using Six Sigma or other techniques may work well in very mature, well understood, highly automatable situations. However, in most dynamic business situations, no sooner has the process been optimised than it requires changing again. In addition, highly optimised processes may cause the so-called ‘local optimisation’ problem, where the sum of the optimised parts yields something far from an optimised whole.

Technology is generally not easy to replace: some technologies are significantly embedded in an organisation, with a large investment in people and skills to support the technology. But technologies change faster than people can adapt, and business environments change even quicker than technology changes. So replacing technologies come at a massive cost (and risk) of replacing functionally rich existing systems with relatively immature new technology, and replacing existing people with people who may be familiar with the technology, but less so with your organisation. And what to do with the folks who have invested so much of their careers in the ‘old’ technology? (Back to the ‘people’ problem.)

A new meme

In order to effect change within a team, department or organisation, the focus on ‘people, process and technology’ needs to be adapted to ‘culture, collaboration and capabilities’. The following sections lays out what the subtle difference is and how it could change how one approaches solving certain types of performance challenges.

Culture

When we talk about ‘people’, we are really not talking about individuals, but about cultures. The culture of a team, department or organisation has a more significant impact on how people collectively perform than anything else.

Changing culture is hard: for a ‘bad’ culture caught early enough, it may be possible to simply replace the most senior person who is (consciously or not) leading the creation of the undesirable cultural characteristics. But once a culture is established, even replacing senior leadership does not guarantee it will change.

Changing culture requires the willing participation of most of the people involved. For this to happen, people need to realise that there is a problem, and they need to be open to new cultural leadership. Then, it is mainly a case of finding the right leadership to establish the new norms and carry them forward – something which can be difficult for some senior managers to do, particularly when they have an arms-length approach to management.

Typical ‘bad’ cultures in a (technology) organisation include poor practices such as lack of testing discipline, poor collaboration with other groups focused on different concerns (such as stability, infrastructure, etc), a lack of transparency into how work is done, or even a lack of collaboration within members of the same team (i.e., a ‘hero’ based approach to development).

Changing these can be notoriously difficult, especially if the firm is highly dependent on this team and what it does.

Collaboration

Processes are, ultimately, a way to formalise how different people collaborate at different times. Processes formalise collaborations, but often collaborations happen before the process is formalised – especially in high-performance teams who are aware of their environment and are open to collaboration.

Many challenges in teams, departments or organisations can be boiled down to collaboration challenges. People not understanding how (or even if) they should be collaborating, how often, how closely, etc.

In most organisations, ‘cooperation’ is a necessity: there are many different functions, most of which depend on each other. So there is a minimum level of cooperation in order to get certain things done. But this cooperation does not necessarily extend to collaboration, which is cooperation based on trust and a deeper understanding of the reasons why a collaboration is important.

Collaboration ultimately serves to strengthen relationships and improve the overall performance of the team, department or organisation.

Collaborations can be captured formally using business process design notation (such as BPMN) but often these treat roles as machines, not people, and can lead to forgetting the underlying goal: people need to collaborate in order to meet the goals of the organisation. Process design often aims to define people’s roles so narrowly that the individuals may as well be a machine – and as technology advances, this is exactly what is happening in many cases.

People will naturally resist this; defining processes in terms of collaborations will change the perspective and result in a more sustainable and productive outcome.

Capabilities

Much has been written here about ‘capabilities’, particularly when it comes to architecture. In this article, I am narrowing my definition to anything that allows an individual (or group of individuals) to perform better than they otherwise would.

From a technology perspective, particular technologies provide developers with capabilities they would not have otherwise. These capabilities allow developers to offer value to other people who need software developed to help them do their job, and who in turn offer capabilities to other people who need those people to perform that job.

When a given capability is ‘broken’ (for example, where people do not understand a particular technology very well, and so it limits their capabilities rather than expands them), then it ripples up to everybody who depends directly or indirectly on that capability: systems become unstable, change takes a long time to implement, users of systems become frustrated and unable to do their jobs, the clients of those users become frustrated at  people being paid to do a job not being able to do it.

In the worst case, this can bring a firm to its knees, unable to survive in an increasingly dynamic, fast-changing world where the weakest firms do not survive long.

Technology should *always* provide a capability: the capability to deliver value in the right hands. When it is no longer able to achieve that role in the eyes of the people who depend on it (or when the ‘right hands’ simply cannot be found), then it is time to move on quickly.

Conclusion

Many of todays innovations in technology revolves around culture, collaboration and capabilities. An agile, disciplined culture, where collaboration between systems reflects collaborations between departments and vice-versa, and where technologies provide people with the capabilities they need to do their jobs, is what every firm strives for (or should be striving for).

For new startups, much of this is a given – this is, after all, how they differentiate themselves against more established players. For larger organisations that have been around for a while, the challenge has been, and continues to be, how to drive continuous improvement and change along these three axes, while remaining sensitive to the capacity of the firm to absorb disruption to the people and technologies that those firms have relied on to get them to the (presumably) successful state they are in today.

Get it wrong and firms could rapidly lose market share and become over-taken by their upstart competitors. Get it right, and those upstart competitors will be bought out by the newly agile established players.

Culture, Collaboration & Capabilities vs People, Process & Technology

The hidden costs of PaaS & microservice engineering innovation

[tl;dr The leap from monolithic application development into the world of PaaS and microservices highlights the need for consistent collaboration, disciplined development and a strong vision in order to ensure sustainable business value.]

The pace of innovation in the PaaS and microservice space is increasing rapidly. This, coupled with increasing pressure on ‘traditional’ organisations to deliver more value more quickly from IT investments, is causing a flurry of interest in PaaS enabling technologies such as Cloud Foundry (favoured by the likes of IBM and Pivotal), OpenShift (favoured by RedHat), Azure (Microsoft), Heroku (SalesForce), AWS, Google Application Engine, etc.

A key characteristic of all these PaaS solutions is that they are ‘devops’ enabled – i.e., it is possible to automate both code and infrastructure deployment, enabling the way to have highly automated operational processes for applications built on these platforms.

For large organisations, or organisations that prefer to control their infrastructure (because of, for example, regulatory constraints), PaaS solutions that can be run in a private datacenter rather than the public cloud are preferable, as this a future option to deploy to external clouds if needed/appropriate.

These PaaS environments are feature-rich and aim to provide a lot of the building blocks needed to build enterprise applications. But other framework initiatives, such as Spring Boot, DropWizard and Vert.X aim to make it easier to build PaaS-based applications.

Combined, all of these promise to provide a dramatic increase in developer productivity: the marginal cost of developing, deploying and operating a complete application will drop significantly.

Due to the low capital investment required to build new applications, it becomes ever more feasible to move from a heavy-weight, planning intensive approach to IT investment to a more agile approach where a complete application can be built, iterated and validated (or not) in the time it takes to create a traditional requirements document.

However, this also has massive implications, as – left unchecked – the drift towards entropy will increase over time, and organisations could be severely challenged to effectively manage and generate value from the sheer number of applications and services that can be created on such platforms. So an eye on managing complexity should be in place from the very beginning.

Many of the above platforms aim to make it as easy as possible for developers to get going quickly: this is a laudable goal, and if more of the complexity can be pushed into the PaaS, then that can only be good. The consequence of this approach is that developers have less control over the evolution of key aspects of the PaaS, and this could cause unexpected issues as PaaS upgrades conflict with application lifecycles, etc. In essence, it could be quite difficult to isolate applications from some PaaS changes. How these frameworks help developers cope with such changes is something to closely monitor, as these platforms are not yet mature enough to have gone through a major upgrade with a significant number of deployed applications.

The relative benefit/complexity trade-off between established microservice frameworks such as OSGi and easier to use solutions such as described above needs to be tested in practice. Specifically, OSGi’s more robust dependency model may prove more useful in enterprise environments than environments which have a ‘move fast and break things’ approach to application development, especially if OSGi-based PaaS solutions such as JBoss Fuse on OpenShift and Paremus ServiceFabric gain more popular use.

So: all well and good from the technology side. But even if the pros and cons of the different engineering approaches are evaluated and a perfect PaaS solution emerges, that doesn’t mean Microservice Nirvana can be achieved.

A recent article on the challenges of building successful micro-service applications, coupled with a presentation by Lisa van Gelder at a recent Agile meetup in New York City, has emphasised that even given the right enabling technologies, deploying microservices is a major challenge – but if done right, the rewards are well worth it.

Specifically, there are a number of factors that impact the success of a large scale or enterprise microservice based strategy, including but not limited to:

  • Shared ownership of services
  • Setting cross-team goals
  • Performing scrum of scrums
  • Identifying swim lanes – isolating content failure & eventually consistent data
  • Provision of Circuit breakers & Timeouts (anti-fragile)
  • Service discoverability & clear ownership
  • Testing against stubs; customer driven contracts
  • Running fake transactions in production
  • SLOs and priorities
  • Shared understanding of what happens when something goes wrong
  • Focus on Mean time to repair (recover) rather than mean-time-to-failure
  • Use of common interfaces: deployment, health check, logging, monitoring
  • Tracing a users journey through the application
  • Collecting logs
  • Providing monitoring dashboards
  • Standardising common metric names

Some of these can be technically provided by the chosen PaaS, but a lot is based around the best practices consistently applied within and across development teams. In fact, it is quite hard to capture these key success factors in traditional architectural views – something that needs to be considered when architecting large-scale microservice solutions.

In summary, the leap from monolithic application development into the world of PaaS and microservices highlights the need for consistent collaboration, disciplined development and a strong vision in order to ensure sustainable business value.
The hidden costs of PaaS & microservice engineering innovation