The cloudy future of data management & governance

[tl;dr The cloud enables novel ways of handling an expected explosion in data store types and instances, allowing stakeholders to know exactly what data is where at all times without human process dependencies.]

Data management & governance is a big and growing concerns for more and more organizations of all sizes. Effective data management is critical for compliance, resilience, and innovation.

Data governance is necessary to know what data you have, when you got it, where it came from, where it is being used, and whether it is of good quality or not.

While the field is relatively mature, the rise of cloud-based services and service-enabled infrastructure will, I believe, fundamentally change the nature of how data is managed in the future and enable greater agility if leveraged effectively.

Data Management Meta-Data

Data and application architects are concerned about ensuring that applications use the most appropriate data storage solution for the problem being solved. To better manage cost and complexity, firms tend to converge on a handful of data management standards (such as Oracle or SQL Server for databases; NFS or NTFS for filesystems; Netezza, Terradata for data warehousing, Hadoop/HDFS for data processing, etc). Expertise is concentrated around central teams that manage provisioning, deployments, and operations for each platform. This introduces dependencies that project teams must plan around. This also requires forward planning and long-term commitment – so not particularly agile.

Keeping up with data storage technology is a challenge – technologies like key/value stores, graph databases, columnar databases, object stores, and document databases exist as these represent varying datasets in a more natural way for applications to consume, reducing or eliminating the ‘impedance mismatch‘ between how applications view state and how that state is stored.

In particular, may datastore technologies are used to scaling up rather than out; i.e., the only way to make them perform faster is to add more CPU/memory, or faster IO hardware. While this keeps applications simpler, it require significant forward planning and longer-term commitments to scale up, and is out of the control of application development teams. Cloud-based services can typically handle scale-out transparently, although applications may need to be aware of the data dimensions across which scale out happens (e.g., sharding by primary key, etc).

Fulfilling provisioning requests for a new datastore on-premise is mostly ticket driven, but fulfillment is still mostly by humans and not by software within enterprises – which means an “infrastructure-as-code” approach is not feasible.

Data Store Manageability vs Application Complexity

Most firms decide that it is better to simplify the data landscape such that fewer datastore solutions are available, but to resource those solutions so that they are properly supported to handle business critical production workloads with maximum efficiency.

The trade-off is in the applications themselves, where the data storage solutions available end up driving the application architecture, rather than the application architecture (i.e., requirements) dictating the most appropriate data store solution, which would result in the lowest impedance mismatch.

A typical example of an impedance mismatch are object-oriented applications (written in, say C++ or Java) which use relational databases. Here, object/relational mapping technologies such as Hibernate or Gigaspaces are used to map the application view of the data (which likes to view data as in-memory objects) to the relational view. These middle layers, while useful for naturally relational data, can be overly expensive to maintain and operate if what your application really needs is a more appropriate type of datastore (e.g., graph).

This mismatch gets exacerbated in a microservices environment where each microservice is responsible for its own persistence, and individual microservices are written in the language most appropriate for the problem domain. Typical imperative, object-oriented languages implementing transactional systems will lean heavily towards relational databases and ORMs, whereas applications dealing with multi-media, graphs, very-large objects, or simple key/value pairs will not benefit from this architecture.

The rise of event-driven architectures (in particular, transactional ‘sagas‘, and ‘aggregates‘ from DDD) will also tend to move architectures away from ‘kitchen-sink’ business object definitions maintained in a single code-base into multiple discrete but overlapping schemas maintained by different code-bases, and triggered by common or related events. This will ultimately lead to an increase in the number of independently managed datastores in an organisation, all of which need management and governance across multiple environments.

For on-premise solutions, the pressure to keep the number of datastore options down, while dealing with an explosion in instances, is going to limit application data architecture choices, increase application complexity (to cope with datastore impedance mismatch), and reduce the benefits from migrating to a microservices architecture (shared datastores favor a monolithic architecture).

Cloud Changes Everything

So how does cloud fundamentally change how we deal with data management and governance? The most obvious benefit cloud brings is around the variety of data storage services available, covering all the typical use cases applications need. Capacity and provisioning is no longer an operational concern, as it is handled by the cloud provider. So data store resource requirements can now be formulated in code (e.g., in CloudFormation, Terraform, etc).

This, in principle, allows applications (microservices) to choose the most appropriate storage solution for their problem domain, and to minimize the need for long-term forward planning.

Using code to specify and provision database services also has another advantage: cloud service providers typically offer the means to tag all instantiated services with your own meta-data. So you can define and implement your own data management tagging standards, and enforce these using tools provided by the cloud provider. These can be particularly useful when integrating with established data discovery tools, which depend on reliable meta-data sources. For example, tags can be defined based on a data ontology defined by the chief data office (see my previous article on CDO).

These mechanisms can be highly automated via service catalogs (such as AWS Service Catalog or ServiceNow), which allow compliant stacks to be provisioned without requiring developers to directly access the cloud providers APIs.

Let a thousand flowers bloom

The obvious downside to letting teams select their storage needs is the likely explosion of data stores – even if they are selected from a managed service catalog. But the expectation is that each distinct store would be relatively simple – at least compared to relational stores which support many application use cases and queries in a single database.

In on-premise situations, data integration is also a real challenge – usually addressed by a myriad of ad-hoc jobs and processes whose purpose is to extract data from one system and send it to another (i.e., ETL). Usually no meta-data exists around these processes, except that afforded by proprietary ETL systems.

In best case integration scenarios, ‘glue’ data flows are implemented in enterprise service buses that generally will have some form of governance attached – but which usually has the undesirable side-effect of introducing yet another dependency for development teams which needs planning and resourcing. Ideally, teams want to be able to use ‘dumb’ pipes for messaging, and be able to self-serve their message governance, such that enterprise data governance tools can still know what data is being published/consumed, and by whom.

Cloud provides two main game-changing capabilities to manage data complexity management at scale. Specifically:

  • All resources that manage data can be tagged with appropriate meta-data – without needing to, for example, examine tables or know anything about the specifics about the data service. This can also extend to messaging services.
  • Serverless functions (e.g., AWS Lambda, Azure Functions, etc) can be used to implement ‘glue’ logic, and can themselves be tagged and managed in an automated way. Serverless functions can also be used to do more intelligent updates of data management meta-data – for example, update a specific repository when a particular service is instantiated, etc. Serverless functions can be viewed as on-demand microservices which may have their own data stores – usually provided via a managed service.

Data, Data Everywhere

By adopting a cloud-enabled microservice architecture, using datastore services provisioned by code, applying event driven architecture, leveraging serverless functions, and engaging with the chief data officer for meta-data standards, it will be possible to have an unprecedented up-to-date view of what data exists in an organization and where. It may even address static views of data in motion (through tagging queue and notification topic resources). The data would be maintained via policies and rules implemented in service catalog templates and lambda functions triggered automatically by cloud configuration changes, so it would always be current and correct.

The CDO, as well as data and enterprise architects, would be the chief consumer of this metadata – either directly or as inputs into other applications, such as data governance tools, etc.

Conclusion

The ultimate goal is to avoid data management and governance processes which rely on reactive human (IT) input to maintain high-quality data management metadata. Reliable metadata can give rise to a whole new range of capabilities for stakeholders across the enterprise, and finally take IT out of the loop for business-as-usual data management queries, freeing up valuable resources for building even more data-driven applications.

The cloudy future of data management & governance

The future of modularity is..serverless

[tl;dr As platform solutions evolve and improve, the pressure for firms to reduce costs, increase agility and be resilient to failure will drive teams to adopt modern infrastructure platform solutions, and in the process decompose and simplify monoliths, adopt microservices and ultimately pave the way to building naturally modular systems on serverless platforms.]

“Modularity” – the (de)composition of complex systems into independently composable or replaceable components without sacrificing performance, security or usability – is an architectural holy grail.

Businesses may be modular (commonly expressed through capability maps), and IT systems can be modular. IT modularity can also be described as SOA (Service Oriented Architecture), although because of many aborted attempts at (commercializing) SOA in the past, the term is no longer in fashion. Ideally, the relationship between business ‘modules’ and IT application modules should be fully aligned (assuming the business itself has a coherent underlying business architecture).

Microservices are the latest manifestation of SOA, but this is born from a fundamentally different way of thinking about how applications are developed, tested, deployed and operated – without the need for proprietary vendor software.

Serverless takes takes the microservices concept one step further, by removing the need for developers (or, indeed, operators) to worry about looking after individual servers – whether virtual or physical.

A brief history of microservices

Commercial manifestations of microservices have been around for quite a while – for example Spring Boot, or OSGi for Java – but these have very commercial roots, and implement a framework tied to a particular language. Firms may successfully implement these technologies, but they will need to have already gone through much of the microservices stone soup journey. It is not possible to ‘buy’ a microservices culture from a technology vendor.

Because microservices are intended to be independently testable and deployable components, a microservices architecture inherently rejects the notion of a common framework for implementing/supporting the microservices natures of an application. This therefore puts the onus on the infrastructure platform to provide all the capabilities needed to build and run microservices.

So, capabilities like naming, discovery, orchestration, encryption, load balancing, retries, tracing, logging, monitoring, etc which used to be handled by language-specific frameworks are now increasingly the province of the ‘platform’. This greatly reduces the need for complex, hard-to-learn frameworks, but places a lot of responsibility on the platform, which must handle these requirements in a language-neutral way.

Currently, the most popular ‘platforms’ are the major cloud providers (Azure, Google, AWS, Digital Ocean, etc), IaaS vendors (e.g., VMWare, HPE), core platform building blocks such as Kubernetes, and platform solutions such as Pivotal Cloud Foundry,  Open Shift and Mesophere. (IBM’s BlueMix/Cloud is likely to be superseded by Red Hat’s Open Shift.)

The latter solutions previously had their own underlying platform solutions (e.g., OSGi for BlueMix, Bosh for PKS), but most platform vendors have now shifted to use Kubernetes under the hood. These solutions are intended to work in multiple cloud environments or on-premise, and therefore in principle allow developers to avoid caring about whether their applications are deployed on-premise or on-cloud in an IaaS-neutral way.

Decomposing Monolithic Architectures

With the capabilities these platforms offer, developers will be incentivized to decompose their applications into logical, distributed functional components, because the marginal additional cost of maintaining/monitoring each new process is relatively low (albeit definitely not zero). This approach is naturally amenable to supporting event driven architectures, as well as more conventional RESTful and RPC architectures (such as gRPC), as running processes can be mapped naturally to APIs, services and messages.

But not all processes need to be running constantly – and indeed, many processes are ‘out-of-band’ processes, which serve as ‘glue’ to connect events that happen in one system to another system: if events are relatively infrequent (e.g., less than one every few seconds), then no resources need to be used in-between events. So provisioning long-running docker containers etc may be overkill for many of these processes – especially if the ‘state’ required by those processes can be made available in a low-latency, highly available long-running infrastructure service such as a high-performance database or cache.

Functions on Demand

Enter ‘serverless’, which aims to specify the resources required to execute a single piece of code (basically a functional monolith) on-demand in a single package – roughly the equivalent of, for example, a declarative service in OSGi. The runtime in which the code runs is not the concern of the developer in a serverless architecture. There are no VMs, containers or side-cars – only functions communicating via APIs and events.

Currently, the serverless offerings by the major cloud providers are really only intended for ‘significant’ functions which justify the separate allocation of compute, storage and network resources needed to run them. A popular use case are ‘transformational’ functions which convert binary data from one form to another – e.g., create a thumbnail image from a full image – which may temporarily require a lot of CPU or memory. In contrast, an OSGi Declarative Service, for example, could be instantiated by the runtime inside the same process/memory space as the calling service – a handy technique for validating a modular architecture without worrying about the increased failure modes of a distributed system, while allowing the system to be readily scaled out in the future.

Modular Architectures vs Distributed Architectures

Serverless functions can be viewed as ‘modules’ by another name – albeit modules that happen to require separately allocated memory, compute and storage to the calling component. While this is a natural fit for browser-based applications, it is not a great fit for monolithic applications that would benefit from modular architectures, but not necessarily benefit from distributed architectures. For legacy applications, the key architectural question is whether it is necessary or appropriate to modularize the application prior to distributing the application or migrating it to an orchestration platform such as Kubernetes, AWS ECS, etc.

As things currently stand, the most appropriate (lowest risk) migration route for complex monolithic applications is likely to be a migration of some form to one of the orchestrated platforms identified above. By allowing the platform to take care of ‘non-functional’ features (such as naming, resilience, etc), perhaps the monolith can be simplified. Over time, the monolith can then be decomposed into modular ‘microservices’ aligned by APIs or events, and perhaps eventually some functionality could decompose into true serverless functions.

Serverless and Process Ownership

Concurrently with decomposing the monolith, a (significant) subset of features – mainly those not built directly using the application code-base, or which straddle two applications – may be meaningfully moved to serverless solutions without depending on the functional decomposition of the monolith.

It’s interesting to note that such an architectural move may allow process owners to own these serverless functions, rather than relying on application owners, where often, in large enterprises, it isn’t even clear which application owner should own a piece of ‘glue’ code, or be accountable when such code breaks due to a change in a dependent system.

In particular, existing ‘glue’ code which relies on centralized enterprise service buses or equivalent would benefit massively from being migrated to a serverless architecture. This not only empowers teams that look after the processes the glue code supports, but also ensures optimal infrastructure resource allocation, as ESBs can often be heavy consumers of infrastructure resources. (Note that a centralized messaging system may still be needed, but this would be a ‘dumb pipe’, and should itself be offered as a service.)

Serverless First Architecture

Ultimately, nirvana for most application developers and businesses, is a ‘serverless-first’ architecture, where delivery velocity is only limited by the capabilities of the development team, and solutions scale both in function and in usage seamlessly without significant re-engineering. It is fair to say that serverless is a long way from achieving this nirvana (technologies like ‘AIOps‘ have a long way to go), and most teams still have to shift from monolithic to modular and distributed thinking, while still knowing when a monolith is still the most appropriate solution for a given problem.

As platform solutions improve and mature, however, and the pressure mounts on businesses whose value proposition is not in the platform engineering space to reduce costs, increase agility and be increasingly resilient to failures of all kinds, the path from monolith to orchestrated microservices to serverless (and perhaps ‘low-code’) applications seems inevitable.

The future of modularity is..serverless

Message Evolution in High Performance Messaging Environments

tl;dr

Moving to an event-driven architecture in a high-performance environment has specific needs that do not yet have widely standardized solutions, and as such require a high degree of focus on both software engineering and business architecture.

Context

Event- or message-driven applications exist in at least two contexts – an application-specific context, and a domain or enterprise context. For high-performance applications, latency is typically more sensitive within the application context, and less sensitive in the domain/enterprise context.

For the purposes of this article, the application-specific context is assumed to related to components that are typically deployed together when a new feature is released – i.e., there is high coupling and high cohesion between components. All application context components are generally tested and deployed together as a unit.

Application Context

The high-coupling and high cohesion of the application context is usually a compromise to achieve the low latency performance requirements, as normal microservice architecture best practice states that services are independently deployable and hence loosely coupled. This impacts the overall agility of the architecture, but fundamentally, with automated configuration, testing and deployment, it should be manageable by a ‘pizza-sized’ team without losing the integrity of the platform or the performance of the network communications.

A given application can reasonably require a single/common version of a highly optimized serialization library, and related message schema definitions, to be used by all components at any given time, as enforcing such deployments can be guaranteed through deployment/configuration automation processes.

In general, these deployments do not require meta-data to be included in messages, as the meta-data will be explicit in the application code. High-performance binary-encoded protocols like Protocol Buffers and Avro can handle a certain amount of schema evolution, but in general for highly coupled applications, all components should use the same version of serialized objects and be deployed simultaneously.

Application messaging contexts can get complicated, and may include both low-latency and normal-latency scenarios, with different degrees of cohesion and coupling. The key point is that messaging and deployment decisions lie within one ‘pizza-size’ team and are not subject to wider enterprise governance. Eventually, when a single pizza-size team becomes insufficient for the application complexity, technical debt relating to these decisions will be revealed when teams split, and will eventually need to be addressed, as governance, agility and latency needs may all be in conflict.

Domain/Enterprise Context

The domain context may include many pizza-sized application teams. For maximum agility, these disparate applications should have relatively low (functional) cohesion and low coupling, but each may consist of a number of high cohesion microservices – the level of coupling depending on the extent to which message codecs are optimised for network performance or agility.

To maximize decoupling and ensure maximum independence of testing, deployment, configuration and operation, the codecs used in the enterprise context should be as flexible as possible.

A consequence of this approach is that, to ensure decoupling, there will in many cases need to be some translation between events in the application context to events in the domain/enterprise context. Typically this can be done by a separate event-driven component which can do the necessary translations, consuming from one messaging channel and publishing to another. The additional overhead of this should be weighed against the agility cost of trying to maintain application schema consistency at enterprise scale, which can initially be considerable as teams begin to adopt event-driven architecture. (The legacy of Enterprise Service Buses, and the messaging bottlenecks they often cause, show how extreme this cost can be.)

Message Standards & Governance

In general, there is a trade-off between agility and data architecture compliance at the domain/enterprise level. In order to avoid the insertion of another technology team between producers and consumers, it is generally best to follow the microservice best practice of ‘dumb pipes and smart end-points’ – i.e., any compliance with standards is not enforced by the messaging infrastructure but instead at the application (or ‘project’) level.

It is feasible to develop run-time tools to assess the data architecture compliance of messages over the bus – in many circumstances this may offer the best balance between compliance and agility, especially if they run in lower environments prior to production deployment.

Enterprise Message Characteristics

Messages in the enterprise domain have some specific characteristics:

  • No assumptions should be made about the consuming applications, in terms of the languages, libraries or frameworks used, except to the extent that the serialization mechanism is supported.
  • Messages should be readily readable by authorized human readers irrespective of schema versions (in particular, operations & support staff).
  • Messages (or parts of messages) should not be readable by unauthorized readers (human or computer)

When it comes to choosing serialization technology, it is all about compromises. There is no silver bullet. There are trade-offs on performance vs flexibility vs readability vs precision etc. An excellent read on this topic is Martin Kleppman’s book ‘Design Data Intensive Applications‘.

Schema Evolution using a Message Bus

Schema evolution is one of the biggest challenges any technology team has to deal with. Databases, events/messages and RESTful APIs all require schemas to be managed.

Microservices aims to minimize the complexity of managing database schema evolution through ensuring any applications that depend on a particular dataset access that data only through a microservice. In effect, the database has only one reader/writer, and so schema evolution can be tied to deployments of that one application – much easier to manage.

However, this pushes inter-application schema evolution to the messaging or API layer. For breaking schema changes (i.e., a new version of a schema is incompatible with a prior version), two principle approaches can be considered:

  • A single ‘channel’ (topic or URI) handles all schema versions, and each consumer must be able to handle all schema versions received over that channel
  • Each schema version has its own message channel (topic or URL), and new consumers are created specifically to consume messages from that channel.

Note that the approaches above only discuss the technical aspects of decoding incompatible message versions: it does not address semantic changes, which can only be resolved at the application level.

Single Channel

In this case, the consumer must have multiple versions of the deserializer ‘built-in’, so it can interpret the version header and invoke the correct deserializer. For many languages, such as Java, this is difficult, as it requires supporting multiple versions of the same classes in the same process.

It is, in principle, doable using OSGi, but otherwise, the consuming application may be forced to delegate incoming messages to other processes for deserializing, which could be expensive. Alternatively, the IDL parser for the serializer could generate unique encoding/decoding classes for each version of a schema so they could reside in the same process. However, message meta-data indicating the correct schema version would need to be very reliable to ensure this works well.

Multiple Channel

In this case, each new (breaking) schema definition will have its own channel (topic/URI), such that new processes specifically built to consume those schema messages can be deployed that subscribe to that channel.

This avoids the need for delegating deserialization, and may be easier to debug when issues occur. However, it can add additional complexity to channel/topic namespaces, and mechanisms may need to be in place to ensure all expected consumers were running and that there are no accidental ‘orphan’ messages being published (i.e., messages for which there is no consumer active).

Architectural Implications of EDA

Fundamentally, to handle an enterprise-wide event-driven architecture, organizations must be fully committed to implementing a microservices architecture. At the simplest level, this means that the cost and overhead of deploying operating new application components is very low. This means that orchestration, configuration, logging, monitoring and control mechanisms are all standardized across all deployed components, so that there is no operational resistance to deploying separate processes for each message type and/or channel as needed, or to deploy various adaptors/gateways to cope with potentially multiple incompatible enterprise consumers.

Implementing any form of an EDA architecture without addressing the above will likely not substantially improve business agility or lead-time reduction. Instead, it could lead to increased co-dependencies across components, reducing overall system availability and stability, and requiring coordinated integration testing and deployments on a periodic basis (every 2-3 months, for example).

Conclusion

The points above are based on observing the architectural evolution of systems I have directly been involved with, the challenges teams faced by multiple teams moving to an event driven architecture, and the lessons learned from this process.

While a number of issues are squarely in the technical domain, some of the hardest decisions relate to what should be considered in the ‘application’ domain vs what belongs in the ‘enterprise’ domain. Usually, there will be strong business drivers behind merging applications previously in separate domains into a single domain – and this will have implications on team size, message standards, etc. Fundamentally, however, IT should not attempt to draw technical message or data boundaries around applications that are not directly aligned with business architecture goals.

In essence, if business architects and/or product owners are not directly involved in dictating messaging standards (including semantic definitions of fields) across applications, then Conways Law applies: messaging standards remain local to the teams that use them, with many message flows existing as bi-lateral agreements between applications.

This naturally gives rise to a ‘spaghetti’ architecture, but if this reflects how business processes are actually aligned and communicate, and the business is happy with this, then all IT can do is manage it, not eliminate it.

Message Evolution in High Performance Messaging Environments

Becoming a financial ‘super-power’ through emerging technologies

Recently, the Tabb Forum published an article (login needed) proposing 4 key emerging technology strategies that would enable market participants to keep pace with a trading environment that is constantly changing.

Specifically, the emerging technologies are:

  • AI for Risk Management
  • Increased focus on data and analytics
  • Accepting public cloud
  • Being open to industry standardization

It is worth noting that in the author’s view, these emerging technologies are seen as table stakes, rather than differentiators – i.e., that the market will pivot around growing use of these technologies, rather than individual firms having (temporary) advantage by using them. In the long term, this seems plausible, but does not (IMO) preclude opportunistic use cases in the short/medium term, and firms which can effectively use these technologies will basically acquire banking ‘super-powers’.

AI for Risk Management

The key idea here is that AI could be used to augment current risk management processes, and offer new insights into where market participants may be carrying risk, or provide new insights into who to trade with and what to trade.

Current risk management processes are brute force, involving complex calculations with multiple inputs, and with many outputs for different scenarios. In addition, human judgement is needed to apply various adjustments (known as XVA) to model-computed valuations to account for trade-specific context.

For AI to be used effectively for risk management, certain key technical capabilities need to be in place – specifically:

  • Data lineage, semantics and quality management
  • Feedback loops between pre-trade, trade and post-trade analytics

Many financial firms are already addressing data lineage, semantics and quality management through meeting compliance with regulations such as BCBS239. However, these capabilities need to be infused into a firm’s architecture (processes as well as technology) for it to be useful for AI use cases. Currently, the tools available are in generally not highly integrated with each other or with systems that depend on them, and human processes around these tools are still maturing.

With respect to developing machine learning models, an AI system needs to understand what outcomes happened in the past in order to make predictions or suggestions. For most traders today, such knowledge is encapsulated in complex spreadsheets that are amended by them over time as new insights are discovered by the trader. These spreadsheets are often proprietary to the trader, and over time become increasingly unmaintainable as calculations become interdependent on each other, and changing one calculation has a high chance of breaking another. This impedes a traders ability to keep their models aligned with their understanding of the markets.

Clearly another approach is needed. A key challenge is how to augment a trader’s capabilities with AI-enabled tooling, without at the same time suggesting the trader is himself/herself surplus to requirements. (This is a challenge all AI solutions face.)

One approach would require that at some level AI algorithms are biased towards individual traders decision-making and learning processes, and tying the use of such algorithms to the continued employment of the trader.

Brute-force AI learning based on all data passing through pre-trade (including market data), trade and post-trade systems is possible, but the skills in selecting critical data points for different contexts are at least as valuable as basic trading skills, and the infrastructure cost of doing this is still considerable.

Increased Focus on Data and Analytics

The key point being made by the author is that managing data should be a key strategic function, rather than being left to individual areas to manage on a per-application basis.

Again, this is tied to efforts relating to data lineage, semantics and quality. Efforts in this space can be directed to specific areas (such as risk management), but every function has growing needs for analytics that traditional warehouse and analytics based solutions cannot keep pace with – especially if, as is increasingly the case, every functional domain wishes to have their own agenda to introduce AI/machine-learning into their processes to improve customer experience and/or regulatory compliance.

As Risk Management increasingly consumes more data points from across the firm to refine risk predictions, it is not unreasonable for the Risk Management function to take leadership of the requirements and/or technology for data management for a firm’s broader needs. However, significant investment is required to make data management tools and infrastructure available as an easy-to-use service across multiple domains, which is essentially what is required. Hence, the third key emerging technology..

Accept the public cloud

The traditional way of developing technology has been to

  1. identify the requirements
  2. propose an architecture
  3. procure and provision the infrastructure
  4. develop and/or procure the software
  5. test and deploy
  6. iterate

Each of these processes take a considerable amount of time in traditional organizations. Anything learned after deploying the software and getting user feedback has to necessarily be addressed with the previously defined architecture and infrastructure – often incurring technical debt in the process.

Particularly for data and analytics use cases, this process is unsustainable. Rather than adapting applications to the infrastructure (as this approach requires), the infrastructure should be adaptable to the application, and if it is proven to be inappropriate, it must be disposed of without any concern re ‘sunk cost’ or ‘amortization’ etc.

The public cloud is the only way to viably evolve complex data and analytics architectures, to ensure infrastructure is aligned with application needs, and minimize technical debt through constantly reviewing and aligning the infrastructure with application needs.

The discovery of which data management and analytics ‘services’ need to be built and made available to users across a firm is, today, a process of learning and iteration (in the true ‘agile’ sense). Traditional solutions preclude such agility, but embracing public cloud enables it.

One point not raised in the article is around the impact of ‘serverless’ technologies. This could be a game-changer for some use cases. While serverless in general could be taken to represent the virtualization of infrastructure, serverless specifically addresses solutions where development teams do not need to manage *any* infrastructure (virtual or otherwise) – i.e., they just consume a service, and costs are directly related to usage, such that zero usage = zero costs.

‘Serverless’ is not necessarily restricted to public cloud providers – a firm’s internal IT could provide this capability. As standards mature in this space, firms should start to think about how client needs could be better met through adoption of serverless technologies, which would require a substantial rethink of the traditional software development lifecycle.

Be open-minded about industry standardization

Conversation around industry standards today is driven by blockchain/distributed ledger technology and/or the efficient processing of digital assets. Industry standardization efforts have always been a feature in capital markets, mostly driven by the need to integrate with clearing houses and exchanges – i.e., centralized services. Firms generally view their data and processing models as proprietary, and fundamentally are resistant to commoditization of these. After all, if all firms ran the same software, where would their differentiation be?

The ISDA Common Domain Model (CDM) seems to be quite different, in that it is not specifically driven by the need to integrate with a 3rd party, but rather with the need for every party to integrate with every other party – e.g, in an over-the-counter model.

Historically, the easiest way to regulate a market is to introduce a clearing house or regulated intermediary that ensure transparency to the regulators of all market activity. However, this is undesirable for products negotiated directly between trading parties or for many digital products, so some way is needed to square this circle. Irrespective of the underlying technology, only a common data model can permit both market participants and regulators to have a shared view of transactions. Blockchain/DLT may well be an enabling technology for digitally cleared assets, but other crypto-based solutions which ensure only the participants and the regulators have access to commonly defined critical data may arise.

Initially, integrations and adaptors for the CDM will need to be built with existing applications, but eventually most systems will need to natively adopt the CDM. This should eventually give rise to ‘derivatives-as-a-service’ platforms, with firms differentiating based on product innovations, and potentially major participants offering their platforms to other firms as a service.

Conclusion

Firms which can thread all four of these themes together to provide a common platform will indeed, in my view, have a major advantage over firms which do not. It is evident that with judicious use of public cloud, use of strong common data models to drive platform evolution, use of shared services across business functions for data management and analytics, and use of AI/machine learning for augmenting users work in different domains (starting with trading), all combine to yield extraordinary benefits. The big question is how quickly this will happen, and how to effectively balance investment in existing application portfolios vs new initiatives that can properly leverage these technologies, even as they continue to evolve.

Becoming a financial ‘super-power’ through emerging technologies

What I realized from studying AWS Services & APIs

[tl;dr The weakest link for firms wishing to achieve business agility is principally based around the financial and physical constraints imposed by managing datacenters and infrastructure. The business goals of agile, devops and enterprise architecture are fundamentally unachievable unless these constraints can be fully abstracted through software services.]

Background

Anybody who has grown up with technology with the PC generation (1985-2005) will have developed software with a fairly deep understanding of how the software worked from an OS/CPU, network, and storage perspective. Much of that generation would have had some formal education in the basics of computer science.

Initially, the PC generation did not have to worry about servers and infrastructure: software ran on PCs. As PCs became more networked, dedicated PCs to run ‘server’ software needed to be connected to the desktop PCs. And folks tasked with building software to run on the servers would also have to buy higher-spec PCs for server-side, install (network) operating systems, connect them to desktop PCs via LAN cables, install disk drives and databases, etc. This would all form part of the ‘waterfall’ project plan to deliver working software, and would all be rather predictable in timeframes.

As organizations added more and more business-critical, network-based software to their portfolios, organization structures were created for datacenter management, networking, infrastructure/server management, storage and database provisioning and operation, middleware management, etc, etc. A bit like the mainframe structures that preceded the PC generation, in fact.

Introducing Agile

And so we come to Agile. While Agile was principally motivated by the flexibility in GUI design offered by HTML (vs traditional GUI design) – basically allowing development teams to iterate rapidly over, and improve on, different implementations of UI – ‘Agile’ quickly became more ‘enterprise’ oriented, as planning and coordinating demand across multiple teams, both infrastructure and application development, was rapidly becoming a massive bottleneck.

It was, and is, widely recognized that these challenges are largely cultural – i.e., that if only teams understood how to collaborate and communicate, everything would be much better for everyone – all the way from the top down. And so a thriving industry exists in coaching firms how to ‘improve’ their culture – aka the ‘agile industrial machine’.

Unfortunately, it turns out there is no silver bullet: the real goal – organizational or business agility – has been elusive. Big organizations still expend vast amounts of time and resources doing small incremental change, most activity is involved in maintaining/supporting existing operations, and truly transformational activities which bring an organization’s full capabilities together for the benefit of the customer still do not succeed.

The Reality of Agile

The basic tenet behind Agile is the idea of cross-functional teams. However, it is obvious that most teams in organizations are unable to align themselves perfectly according to the demand they are receiving (i.e., the equivalent of providing a customer account manager), and even if they did, the number of participants in a typical agile ‘scrum’ or ‘scrum of scrums’ meeting would quickly exceed the consensus maximum of about 9 participants needed for a scrum to be successful.

So most agile teams resort to the only agile they know – i.e., developers, QA and maybe product owner and/or scrum-master participating in daily scrums. Every other dependency is managed as part of an overall program of work (with communication handled by a project/program manager), or through on-demand ‘tickets’ whereby teams can request a service from other teams.

The basic impact of this is that pre-planned work (resources) gets prioritized ahead of on-demand ‘tickets’ (excluding tickets relating to urgent operational issues), and so agile teams are forced to compromise the quality of their work (if they can proceed at all).

DevOps – Managing Infrastructure Dependencies

DevOps is a response to the widening communications/collaboration chasm between application development teams and infrastructure/operations teams in organizations. It recognizes that operational and infrastructural concerns are inherent characteristics of software, and software should not be designed without these concerns being first-class requirements along with product features/business requirements.

On the other hand, infrastructure/operations providers, being primarily concerned with stability, seek to offer a small number of efficient standardized services that they know they can support. Historically, infrastructure providers could only innovate and adapt as fast as hardware infrastructure could be procured, installed, supported and amortized – which is to say, innovation cycles measured in years.

In the meantime, application development teams are constantly pushing the boundaries of infrastructure – principally because most business needs can be realized in software, with sufficiently talented engineers, and those tasked with building software often assume that infrastructure can adapt as quickly.

Microservices – Managing AppDev Team to AppDev Team Dependencies

While DevOps is a response to friction in application development and infrastructure/operations engagement, microservices can be usefully seen as a response to how application development team can manage dependencies on each other.

In an ideal organization, an application development team can leverage/reuse capabilities provided by another team through their APIs, with minimum pre-planning and up-front communication. Teams would expose formal APIs with relevant documentation, and most engagement could be confined to service change requests from other teams and/or major business initiatives. Teams would not be required to test/deploy in lock-step with each other.

Such collaboration between teams would need to be formally recognized by business/product owners as part of the architecture of the platform – i.e., a degree of ‘mechanical sympathy’ is needed by those envisioning new business initiatives to know how best to leverage, and extend, software building blocks in the organization. This is best done by Product Management, who must steward the end-to-end business and data architecture of the organization or value-stream in partnership with business development and engineering.

Putting it all together

To date, most organizations have been fighting a losing battle. The desire to do agile and devops is strong, but the fundamental weakness in the chain is the ability for internal infrastructure providers and operators to move as fast as software development teams need them to – issues as much related to financial management as it is to managing physical buildings, hardware, etc.

What cloud providers are doing is creating software-level abstractions of infrastructure services, allowing the potential of agile, devops and microservices to begin to be realized in practice.

Understanding these services and abstractions is like re-learning the basic principles of Computer Science and Engineering – but through a ‘service’ lens. The same issues need to be addressed, the same technical challenges exist. Except now some aspects of those challenges no longer need to be solved by organizations (e.g., how to efficiently abstract infrastructure services at scale), and businesses can focus on the designing the infrastructure services that are matched with the needs of application developers (rather than a compromise).

Conclusion

The AWS Service Catalog and APIs is an extraordinary achievement (as is similar work by other cloud providers, although they have yet to achieve the catalog breadth that AWS has). Architects need to know and understand these service abstractions and focus on matching application needs with business needs, and can worry less about the traditional constraints infrastructure organizations have had to work with.

In many respects, the variations between these abstractions across providers will vary only in syntax and features. Ultimately (probably at least 10 years from now) all commodity services will converge, or be available through efficient ‘cross-plane’ solutions which abstract providers. So that is why I am choosing to ‘go deep’ on the AWS APIs. This is, in my opinion, the most concrete starting point to helping firms achieve ‘agile’ nirvana.

What I realized from studying AWS Services & APIs

What I learned from using Kubernetes

What is Kubernetes?

Kubernetes is a fundamentally a automated resource management platform – principally for computational resources (CPU, RAM, networks, local storage). It realises the ideas behind the ‘cattle not pets‘ approach to IT infrastructure, by defining in software what previously was implicit in infrastructure configuration and provisioning.

In particular, Kubernetes enables continuous integration/continuous delivery (CI/CD) processes which are critical to achieving the business benefits (agility, reliability, security) that underpin devops (Kim et al).

The standard for abstracting computational resources today is the container  – essentially, technologies based on OS virtual isolation primitives first provided by Linux. So Kubernetes (and other resource orchestration tools) focus on orchestrating (i.e., managing the lifecycle of) containers. The de-facto container standard is Docker.

Cloud providers (such as AWS, Azure, GCP and Digital Ocean) specialize in abstracting computational resources via services. However, while all support Docker containers as the standard for runnable images, they each have different APIs and interfaces to engage with the management services they provide. Kubernetes is already a de-facto standard for resource management abstraction that can be implemented both in private data centers as well as on cloud, which makes it very attractive for firms that care about cloud-provider lock-in.

It is worth noting that currently Kubernetes focuses on orchestrating containers, which can be run on either virtual machines or on bare-metal (i.e., OS without hypervisor). However, as VM technology becomes more efficient – and in particular the cost of instantiating VMs decreases – it is possible that Kubernetes will orchestrate VMs, and not (just) containers. This is because containers do not provide a safe multi-tenancy model, and there are many good reasons why firms, even internally, will seek stronger isolation of processes running on the same infrastructure than that offered by containers – especially for software-as-a-service offerings where different clients do need strong separation from each others process instances.

What Kubernetes is not

It is easy to mistake Kubernetes as a solution to be used by development teams: as the Kubernetes website makes clear, it is not a development platform in and of itself, but rather is technology upon which such platforms can be built.

An example may be platforms like Pivotal’s Container Service (PKS), which is migrating from its own proprietary resource management solution to using Kubernetes under the hood (and providing a path for VMware users to Kubernetes in the process). For Java, frameworks like Spring Boot and Spring Cloud provide good abstractions for implementing distributed systems, but these can be implemented on top of Kubernetes without developers needing to be aware.

A key benefit of Kubernetes is that it is language/framework agnostic. The downside is that this means that some language or platform-specific abstractions (tools, APIs, policies, etc) need to be created in order for Kubernetes to be useful to development teams, and to avoid the need for developers to master Kubernetes themselves.

For legacy applications, or applications with no distribution framework, some language-neutral mechanism should be used to allow development teams to describe their applications in terms that can enable deployment and configuration automation, but that does not bind kubernetes-specific dependencies to those applications (for example, Spring Cloud).

For new projects, some investment in both general-purpose and language-specific platform tooling will be needed to maximize the benefits of Kubernetes – whether this is a 3rd party solution or developed in-house is a decision each organization needs to make. It is not a viable option to delegate this decision to individual (pizza-sized) teams, assuming such teams are staffed principally with folks charged with implementing automated business processes and not automated infrastructure processes.

How cool is Kubernetes?

It is very cool – only a little bit less cool than using a cloud provider and the increasing number of infrastructure & PaaS services they offer. Certainly, for applications which are not inherently cloud native, Kubernetes offers a viable path to achieving CI/CD nirvana, and hence offers a potential path to improvements in deployment agility, security and reliability for applications which for various reasons are not going to be migrated to cloud or re-engineered.

It takes a remarkably small amount of time for an engineer to install and configure a Kubernetes cluster on resources they have direct control over (e.g., MiniKube or Docker on a laptop, EKS on AWS, or a bunch of VMs you have root access to in your data center). Creating a dockerfile for an application where dev/test environments are fully specified in code (e.g., Vagrant specifications) is a fairly quick exercise. Deploying to a kubernetes cluster is quick and painless – once you have a docker image repository setup and configured.

Technical challenges arise when it comes to network configuration (e.g., load balancers, NAT translation, routers, firewalls, reverse proxies, API gateways, VPNs/VPCs, etc), automation of cluster infrastructure definition, automation of dockerfile creation, storage volume configuration and management, configuration parameter injection, secrets management, logging and monitoring, namespace management, etc. Advanced networking such as service mesh, policy control, use of multiple network interfaces, low-latency routing to external services, multi-region resilience and recovery are typically not considered a priority during development, but are critical for production environments and must also be automated. All this is doable via Kubernetes mechanisms, but is not for free.

In short, Kubernetes is only the start of the journey to implement zero-touch infrastructure and therefore to make infrastructure provisioning a seamless part of the software development process.

So what should a Kubernetes strategy be?

For many firms, Kubernetes could end up being nothing but a stepping stone to full serverless architectures – more on serverless another time, but in essence it means all resource abstractions for an application are fully encoded in code deployed as part of a serverless application.

For firms that are moving towards a more fully integrated polyglot microservices-type architecture (i.e., with more and more infrastructure and business capabilities delivered as services rather than deployable artifacts or code), and where direct control over network, storage and/or compute infrastructure is seen as a competitive necessity, then Kubernetes seems like a prudent option.

How microservice frameworks evolve with respect to their use of Kubernetes will be critical: in particular, frameworks should ideally be able to happy co-exist with other frameworks in the same Kubernetes cluster. Reducing clusters to single-application usage would be a step backwards – although in the context of many microservices written using the same language framework deployed to the same cluster, perhaps this is acceptable (provided operations teams did not see such microservices as parts of a single monolithic application).

Irrespective of a microservices strategy, deploying applications to containers may be seen as a way to efficiently manage the deployment and configuration of applications across multiple environments. In this regard, there is a convergence of innovation (see PKS above) in the definition, creation and orchestration of virtual machines and containers (as noted above), which may eventually make it more sensible for enterprises which have already made the move to virtualization (via VMWare) to continue along that technology path rather than prematurely taking the Kubernetes/Container path, as legacy applications struggle with the ephemeral nature of containers. In either case, the goal is zero-touch infrastructure via ‘infrastructure-as-code‘, and process automation such as ‘gitops‘, as these are what will ultimately deliver devops business goals.

Summary

So, the key takeaways are:

  • Kubernetes is important for organizations where containerization is part of a broader (microservices/distributed systems/cloud) strategy, and not just deployment/configuration automation/optimization.
  • Organizations should at least learn how to operate Kubernetes clusters at scale, as this function will likely remain siloed.
  • Developing in-house kubernetes engineering capabilities is a strategic business question. For most enterprises, let 3rd parties focus on this (via open-source solutions)
  • For organizations which heavily use VMware VMs, the path to Kubernetes is likely via VMs (which are long-lived) rather than via Containers (which are ephemeral). Commercial VM managers are expensive, but effectively using Kubernetes is not free or cheap either.
  • Organizations should seriously assess the role serverless technologies could play in future technical roadmaps.
What I learned from using Kubernetes

The Learning CTO’s Strategic Themes (2015) Revisited

The Learning CTO’s Strategic Themes (2015) Revisited

Way back in late 2014, I published what I considered at the time to be key themes that would dominate the technology landscape in 2015. 4 years on, I’m revisiting these themes. Strategic themes for 2019 will be covered in a separate blog post.

2015 Strategic Themes Recap

The Lean EnterpriseThe cultural and process transformations necessary to innovate and maintain agility in large enterprises – in technology, financial management, and risk, governance and compliance.
Enterprise ModularityThe state-of-the-art technologies and techniques which enable agile, cost effective enterprise-scale integration and reuse of capabilities and services. Aka SOA Mark II.
Continuous DeliveryThe state-of-the-art technologies and techniques which brings together agile software delivery with operational considerations. Aka DevOps Mark I.
Systems Theory & Systems ThinkingThe ability to look at the whole as well as the parts of any dynamic system, and understand the consequences/impacts of decisions to the whole or those parts.
Machine LearningUsing business intelligence and advanced data semantics to dynamically improve automated or automatable processes.

The Lean Enterprise

Over the past few years, the most obvious themes related to the ‘lean enterprise’ have been simultaneously the focus on ‘digital transformation’ and scepticism of the ‘agile-industrial’ machine (see Fowler (2018)). 

Firms have finally realized that digital transformation extends well beyond having a mobile application and a big-data analytics solution (see Mastering the Mix). However, this understanding is far from universal, and many firms believe digital transformation involves funding a number of ‘digital projects’ without changing fundamentally how the business plans, prioritises and funds both change initiatives and on-going operations.

The implications of this are nicely captured in Mark Schwarz’s ‘Digital CFO’ blog post. In essence, if the CFO function hasn’t materially changed how it works, it’s hard to see how the rest of the organization can truly be ‘digital’, irrespective of how much it is spending on ‘digital transformation’.

Other notable material on this subject is Barry O’Reilly’s “Unlearn” book – more on this when I have read it, but essentially it recommends folks (and corporations) unlearn mindsets and behaviours in order to excel in the fast-changing, technology-enabled environments of today.

Related to digital transformation is the so-called ‘agile industrial’ machine, which aims to sell agile to enterprises. Every firm wants to ‘be’ agile, but many firms end up ‘doing’ agile (often imposing ceremony on teams) – and are then surprised when overall business agility does not change. If ‘agile’ isn’t led and implemented by appropriately structured cross-functional value-stream aligned teams, then it is not going to move the dial.

The latest and best thinking on end-to-end value-stream delivery improvement and optimization is from Mik Kersten, as captured in the book Project-to-Product. This is a significant development in digital transformation execution, and merits further understanding – in particular, potential implications for teams to adopting cloud and other self-service-based capabilities as part of delivering value to customers.

Enterprise Modularity

These days, ‘enterprise modularity’ is dominated principally by microservices, and in particular the engineering challenges associated with delivering microservices-based architectures.

While there is no easy (cheap) solution to unwinding the complexity of non-modular ‘ball-of-mud’ legacy applications, numerous patterns and solutions exist to enable such applications to participate in a enterprise’s microservices eco-system (the work by Chris Richardson in this space is especially notable). Indeed, if these ‘legacy’ applications can be packaged up in containers and orchestrated via automation-friendly, framework-neutral tools like Kubernetes, it would go some way to extend the operable longevity and potential re-usability of these platforms. But for many legacy systems, the business case of such a migration is not yet obvious, especially when viewed in the context of overall end-to-end value-stream delivery (vs the horizontal view of central-IT optimizing for itself).

It is interesting to note that (synchronous) RESTful APIs are still the dominant communication mechanism for distributed modular applications – most likely because no good decentralized alternative to enterprise-service buses has been identified. And while event-driven architectures are getting more popular, IT organizations will generally avoid having centrally managed messaging infrastructure that is not completely self-service (and self-governed), as otherwise the central bus team becomes a significant bottleneck to every other team.

But RESTful APIs are complex when used at scale – hence the need for software patterns like ‘Circuit Breaker‘, Service Discovery and Observability, among others. Service mesh technologies like Istio and Envoy need to be introduced to fill the gaps – which need their own management, but in principle do not impair team-level progress.

Asynchronous messaging-based technologies such as Rabbit MQ address many of these issues, but come with their own complexities. Messaging technologies, especially those providing any level of delivery guarantee, require their own management. But, for the most part, these can be delivered as managed services used on a self-service basis.

For cloud-based systems, usually the cloud provider will have some messaging services available to use. Messaging systems are also available packaged into Docker containers and can be readily configured and launched on cloud infrastructure.

Related to distributed IPC is network configuration and security: with the advent of cloud, security is no longer confined to the ‘perimeter’ of the datacenter. This mean network configurations, authentication/encryption and security monitoring much be implemented and configured at a per-service level. This is not scalable without a significant level of infrastructure configuration automation (i.e., infrastructure-as-code), and enabled through mesh or messaging technologies.

Finally, it is worth noting that most of the effort today is going into framework- and language-neutral solutions. Language specific frameworks such as OSGi and Cloud Foundry (for Java), are generally assumed to be built on top of these capabilities, but are not necessary to take advantage of them. Indeed, any software written in any language should be able to leverage these technologies.

So, should IT departments mandate use of frameworks? In the end, to avoid developers needing to know too much about the specifics about networking, orchestration, packaging, deployment, etc, some level of abstraction is required. The question is who provides this abstraction – the cloud provider (e.g., serverless), an in-house SRE engineering team, 3rd party software vendors (such as Pivotal), or application-development teams themselves?

Continuous Delivery

The state of DevOps has advanced a lot since 2018, as described by the research published by DORA (just bought by Google, in a sign that it seems to be getting serious about listening to the needs of its cloud customers).

The findings have been compiled into a series of very solid principles and practices published in the book Accelerate – book well worth reading.

Continuous delivery itself is still an aspiration for most firms – even being capable of doing it in principle, vs actually doing it. But it seems to be evident that not having a continuous delivery capability will severely impact the extent to which a firm can succeed at digital transformation.

Whether a firm needs to digitally transform at all (and implement all the findings from the DevOps survey) depends largely on whether its business model is natively digital..the popular view these days is that every industry and business model must eventually go digital to compete.

A key aspect of ‘continuous delivery’ is that there is never a case of a system reaching ‘stability’ – i.e., that no further changes are needed (or are needed very rarely). This approach worked in the days of packaged software, but for software delivered as a service, it is impractical. Change always is needed – even if it is principally for security/operational purposes and not for features. To avoid the cost of supporting/maintaining existing software dragging down resources available for new software, software maintenance must be highly automated – including automated builds, testing, configuration, deployment and monitoring – of both infrastructure and applications.

If devops practices are not adopted and invested in, expect a high and increasing proportion of IT costs to be towards (high value) maintenance work and less towards new (uncertain value) development. With devops practices, there should be a much more consistent balance over time, even as new features continually get deployed.

Systems Theory & Systems Thinking

Systems Theory & Systems Thinking is still a fairly niche topic area, with aspects of it being addressed by the Value Stream Architecture work mentioned above, as well as continuing work in frameworks like Cynefin and business model canvas content from Tom Graves, insightful content on architecture from Graham Berrisford , ground-breaking work from Simon Wardley on strategy maps and some interesting capabilities and approaches from a number of EA tool vendors.

Generally, technical architects tend to focus on system behavior from the perspective of applications and infrastructure. ‘Human activity’ systems is rarely given sufficient thought by technical architects (beyond the act of actually developing the required software). In fact, much of the ‘devops’ movement is driven by human activity systems as they relate to the desired behavior of applications and infrastructure.

On the flip side, technical architects and implementation teams tend to rely on product owners to address the human activity systems relating to the use of applications. The balance between appropriate reliance/engagement of users vs technology (and supporting technologists) in the definition of application behaviour is where product owners have an opportunity to make a significant impact, and where IT implementation teams need to be demanding more from their product owners.

With respect to product roadmaps, product managers should encourage the use of techniques such as the Wardley Maps mentioned above to better position where investment should lie, and ensure IT teams are using 3rd party solutions appropriately rather than trying to engineer commodity/utility solutions in-house.

Machine Learning

Machine learning has had a massive resurgence over the past few years, driven principally by simple but popular use cases (such as recommendation engines) as well as more advanced use cases such as self-driving cars. A significant factor is the sheer volume of data available to train machine-learning algorithms.

As ever, the appropriate use of machine learning is still an area in need of development: machine learning is often seen by corporations as a means of reducing costs through eliminating the need to rely on humans, leading to fear and scepticism around the adoption of machine learning technologies.

For the most part, however, machine learning will be used to augment or support human activity in the face of ever increasing complexity and data. For example, in the realm of devops, an explosion in the number of interacting components in applications will make supporting and operating complex distributed systems of the futures orders of magnitude more complex than we have today – and we don’t do a particularly good job of managing such systems even today! Without some form of augmentation, humans would simply not be up to the task.

The arrival of pay-as-you-go machine learning services such as AWS SageMaker and Rekognition herald a new era where machine learning capabilities can be within reach of ‘average’ development teams, without necessarily requiring AI experts or PhD-level statisticians to be part of the teams.

In reality, machine learning can only be used for mature processes for which much data is available: humans will, for the foreseeable future at least, be much better at addressing new or immature situations.

An interesting side-effect of the focus on machine-learning is the increased interest in semantic data: general machine-learning is impossible without learning to describe data semantics. However, most firms would benefit from this practice even without machine learning. General and deep machine learning appears to be creating an increase of interest in semantic data standards such as RDF and Open Linked Data but hopefully interest in these will trickle down to more mundane but critical tasks such as system integration efforts and data lake implementations.

The Learning CTO’s Strategic Themes (2015) Revisited