Message Evolution in High Performance Messaging Environments

tl;dr

Moving to an event-driven architecture in a high-performance environment has specific needs that do not yet have widely standardized solutions, and as such require a high degree of focus on both software engineering and business architecture.

Context

Event- or message-driven applications exist in at least two contexts – an application-specific context, and a domain or enterprise context. For high-performance applications, latency is typically more sensitive within the application context, and less sensitive in the domain/enterprise context.

For the purposes of this article, the application-specific context is assumed to related to components that are typically deployed together when a new feature is released – i.e., there is high coupling and high cohesion between components. All application context components are generally tested and deployed together as a unit.

Application Context

The high-coupling and high cohesion of the application context is usually a compromise to achieve the low latency performance requirements, as normal microservice architecture best practice states that services are independently deployable and hence loosely coupled. This impacts the overall agility of the architecture, but fundamentally, with automated configuration, testing and deployment, it should be manageable by a ‘pizza-sized’ team without losing the integrity of the platform or the performance of the network communications.

A given application can reasonably require a single/common version of a highly optimized serialization library, and related message schema definitions, to be used by all components at any given time, as enforcing such deployments can be guaranteed through deployment/configuration automation processes.

In general, these deployments do not require meta-data to be included in messages, as the meta-data will be explicit in the application code. High-performance binary-encoded protocols like Protocol Buffers and Avro can handle a certain amount of schema evolution, but in general for highly coupled applications, all components should use the same version of serialized objects and be deployed simultaneously.

Application messaging contexts can get complicated, and may include both low-latency and normal-latency scenarios, with different degrees of cohesion and coupling. The key point is that messaging and deployment decisions lie within one ‘pizza-size’ team and are not subject to wider enterprise governance. Eventually, when a single pizza-size team becomes insufficient for the application complexity, technical debt relating to these decisions will be revealed when teams split, and will eventually need to be addressed, as governance, agility and latency needs may all be in conflict.

Domain/Enterprise Context

The domain context may include many pizza-sized application teams. For maximum agility, these disparate applications should have relatively low (functional) cohesion and low coupling, but each may consist of a number of high cohesion microservices – the level of coupling depending on the extent to which message codecs are optimised for network performance or agility.

To maximize decoupling and ensure maximum independence of testing, deployment, configuration and operation, the codecs used in the enterprise context should be as flexible as possible.

A consequence of this approach is that, to ensure decoupling, there will in many cases need to be some translation between events in the application context to events in the domain/enterprise context. Typically this can be done by a separate event-driven component which can do the necessary translations, consuming from one messaging channel and publishing to another. The additional overhead of this should be weighed against the agility cost of trying to maintain application schema consistency at enterprise scale, which can initially be considerable as teams begin to adopt event-driven architecture. (The legacy of Enterprise Service Buses, and the messaging bottlenecks they often cause, show how extreme this cost can be.)

Message Standards & Governance

In general, there is a trade-off between agility and data architecture compliance at the domain/enterprise level. In order to avoid the insertion of another technology team between producers and consumers, it is generally best to follow the microservice best practice of ‘dumb pipes and smart end-points’ – i.e., any compliance with standards is not enforced by the messaging infrastructure but instead at the application (or ‘project’) level.

It is feasible to develop run-time tools to assess the data architecture compliance of messages over the bus – in many circumstances this may offer the best balance between compliance and agility, especially if they run in lower environments prior to production deployment.

Enterprise Message Characteristics

Messages in the enterprise domain have some specific characteristics:

  • No assumptions should be made about the consuming applications, in terms of the languages, libraries or frameworks used, except to the extent that the serialization mechanism is supported.
  • Messages should be readily readable by authorized human readers irrespective of schema versions (in particular, operations & support staff).
  • Messages (or parts of messages) should not be readable by unauthorized readers (human or computer)

When it comes to choosing serialization technology, it is all about compromises. There is no silver bullet. There are trade-offs on performance vs flexibility vs readability vs precision etc. An excellent read on this topic is Martin Kleppman’s book ‘Design Data Intensive Applications‘.

Schema Evolution using a Message Bus

Schema evolution is one of the biggest challenges any technology team has to deal with. Databases, events/messages and RESTful APIs all require schemas to be managed.

Microservices aims to minimize the complexity of managing database schema evolution through ensuring any applications that depend on a particular dataset access that data only through a microservice. In effect, the database has only one reader/writer, and so schema evolution can be tied to deployments of that one application – much easier to manage.

However, this pushes inter-application schema evolution to the messaging or API layer. For breaking schema changes (i.e., a new version of a schema is incompatible with a prior version), two principle approaches can be considered:

  • A single ‘channel’ (topic or URI) handles all schema versions, and each consumer must be able to handle all schema versions received over that channel
  • Each schema version has its own message channel (topic or URL), and new consumers are created specifically to consume messages from that channel.

Note that the approaches above only discuss the technical aspects of decoding incompatible message versions: it does not address semantic changes, which can only be resolved at the application level.

Single Channel

In this case, the consumer must have multiple versions of the deserializer ‘built-in’, so it can interpret the version header and invoke the correct deserializer. For many languages, such as Java, this is difficult, as it requires supporting multiple versions of the same classes in the same process.

It is, in principle, doable using OSGi, but otherwise, the consuming application may be forced to delegate incoming messages to other processes for deserializing, which could be expensive. Alternatively, the IDL parser for the serializer could generate unique encoding/decoding classes for each version of a schema so they could reside in the same process. However, message meta-data indicating the correct schema version would need to be very reliable to ensure this works well.

Multiple Channel

In this case, each new (breaking) schema definition will have its own channel (topic/URI), such that new processes specifically built to consume those schema messages can be deployed that subscribe to that channel.

This avoids the need for delegating deserialization, and may be easier to debug when issues occur. However, it can add additional complexity to channel/topic namespaces, and mechanisms may need to be in place to ensure all expected consumers were running and that there are no accidental ‘orphan’ messages being published (i.e., messages for which there is no consumer active).

Architectural Implications of EDA

Fundamentally, to handle an enterprise-wide event-driven architecture, organizations must be fully committed to implementing a microservices architecture. At the simplest level, this means that the cost and overhead of deploying operating new application components is very low. This means that orchestration, configuration, logging, monitoring and control mechanisms are all standardized across all deployed components, so that there is no operational resistance to deploying separate processes for each message type and/or channel as needed, or to deploy various adaptors/gateways to cope with potentially multiple incompatible enterprise consumers.

Implementing any form of an EDA architecture without addressing the above will likely not substantially improve business agility or lead-time reduction. Instead, it could lead to increased co-dependencies across components, reducing overall system availability and stability, and requiring coordinated integration testing and deployments on a periodic basis (every 2-3 months, for example).

Conclusion

The points above are based on observing the architectural evolution of systems I have directly been involved with, the challenges teams faced by multiple teams moving to an event driven architecture, and the lessons learned from this process.

While a number of issues are squarely in the technical domain, some of the hardest decisions relate to what should be considered in the ‘application’ domain vs what belongs in the ‘enterprise’ domain. Usually, there will be strong business drivers behind merging applications previously in separate domains into a single domain – and this will have implications on team size, message standards, etc. Fundamentally, however, IT should not attempt to draw technical message or data boundaries around applications that are not directly aligned with business architecture goals.

In essence, if business architects and/or product owners are not directly involved in dictating messaging standards (including semantic definitions of fields) across applications, then Conways Law applies: messaging standards remain local to the teams that use them, with many message flows existing as bi-lateral agreements between applications.

This naturally gives rise to a ‘spaghetti’ architecture, but if this reflects how business processes are actually aligned and communicate, and the business is happy with this, then all IT can do is manage it, not eliminate it.

Message Evolution in High Performance Messaging Environments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s