Becoming a financial ‘super-power’ through emerging technologies

Recently, the Tabb Forum published an article (login needed) proposing 4 key emerging technology strategies that would enable market participants to keep pace with a trading environment that is constantly changing.

Specifically, the emerging technologies are:

  • AI for Risk Management
  • Increased focus on data and analytics
  • Accepting public cloud
  • Being open to industry standardization

It is worth noting that in the author’s view, these emerging technologies are seen as table stakes, rather than differentiators – i.e., that the market will pivot around growing use of these technologies, rather than individual firms having (temporary) advantage by using them. In the long term, this seems plausible, but does not (IMO) preclude opportunistic use cases in the short/medium term, and firms which can effectively use these technologies will basically acquire banking ‘super-powers’.

AI for Risk Management

The key idea here is that AI could be used to augment current risk management processes, and offer new insights into where market participants may be carrying risk, or provide new insights into who to trade with and what to trade.

Current risk management processes are brute force, involving complex calculations with multiple inputs, and with many outputs for different scenarios. In addition, human judgement is needed to apply various adjustments (known as XVA) to model-computed valuations to account for trade-specific context.

For AI to be used effectively for risk management, certain key technical capabilities need to be in place – specifically:

  • Data lineage, semantics and quality management
  • Feedback loops between pre-trade, trade and post-trade analytics

Many financial firms are already addressing data lineage, semantics and quality management through meeting compliance with regulations such as BCBS239. However, these capabilities need to be infused into a firm’s architecture (processes as well as technology) for it to be useful for AI use cases. Currently, the tools available are in generally not highly integrated with each other or with systems that depend on them, and human processes around these tools are still maturing.

With respect to developing machine learning models, an AI system needs to understand what outcomes happened in the past in order to make predictions or suggestions. For most traders today, such knowledge is encapsulated in complex spreadsheets that are amended by them over time as new insights are discovered by the trader. These spreadsheets are often proprietary to the trader, and over time become increasingly unmaintainable as calculations become interdependent on each other, and changing one calculation has a high chance of breaking another. This impedes a traders ability to keep their models aligned with their understanding of the markets.

Clearly another approach is needed. A key challenge is how to augment a trader’s capabilities with AI-enabled tooling, without at the same time suggesting the trader is himself/herself surplus to requirements. (This is a challenge all AI solutions face.)

One approach would require that at some level AI algorithms are biased towards individual traders decision-making and learning processes, and tying the use of such algorithms to the continued employment of the trader.

Brute-force AI learning based on all data passing through pre-trade (including market data), trade and post-trade systems is possible, but the skills in selecting critical data points for different contexts are at least as valuable as basic trading skills, and the infrastructure cost of doing this is still considerable.

Increased Focus on Data and Analytics

The key point being made by the author is that managing data should be a key strategic function, rather than being left to individual areas to manage on a per-application basis.

Again, this is tied to efforts relating to data lineage, semantics and quality. Efforts in this space can be directed to specific areas (such as risk management), but every function has growing needs for analytics that traditional warehouse and analytics based solutions cannot keep pace with – especially if, as is increasingly the case, every functional domain wishes to have their own agenda to introduce AI/machine-learning into their processes to improve customer experience and/or regulatory compliance.

As Risk Management increasingly consumes more data points from across the firm to refine risk predictions, it is not unreasonable for the Risk Management function to take leadership of the requirements and/or technology for data management for a firm’s broader needs. However, significant investment is required to make data management tools and infrastructure available as an easy-to-use service across multiple domains, which is essentially what is required. Hence, the third key emerging technology..

Accept the public cloud

The traditional way of developing technology has been to

  1. identify the requirements
  2. propose an architecture
  3. procure and provision the infrastructure
  4. develop and/or procure the software
  5. test and deploy
  6. iterate

Each of these processes take a considerable amount of time in traditional organizations. Anything learned after deploying the software and getting user feedback has to necessarily be addressed with the previously defined architecture and infrastructure – often incurring technical debt in the process.

Particularly for data and analytics use cases, this process is unsustainable. Rather than adapting applications to the infrastructure (as this approach requires), the infrastructure should be adaptable to the application, and if it is proven to be inappropriate, it must be disposed of without any concern re ‘sunk cost’ or ‘amortization’ etc.

The public cloud is the only way to viably evolve complex data and analytics architectures, to ensure infrastructure is aligned with application needs, and minimize technical debt through constantly reviewing and aligning the infrastructure with application needs.

The discovery of which data management and analytics ‘services’ need to be built and made available to users across a firm is, today, a process of learning and iteration (in the true ‘agile’ sense). Traditional solutions preclude such agility, but embracing public cloud enables it.

One point not raised in the article is around the impact of ‘serverless’ technologies. This could be a game-changer for some use cases. While serverless in general could be taken to represent the virtualization of infrastructure, serverless specifically addresses solutions where development teams do not need to manage *any* infrastructure (virtual or otherwise) – i.e., they just consume a service, and costs are directly related to usage, such that zero usage = zero costs.

‘Serverless’ is not necessarily restricted to public cloud providers – a firm’s internal IT could provide this capability. As standards mature in this space, firms should start to think about how client needs could be better met through adoption of serverless technologies, which would require a substantial rethink of the traditional software development lifecycle.

Be open-minded about industry standardization

Conversation around industry standards today is driven by blockchain/distributed ledger technology and/or the efficient processing of digital assets. Industry standardization efforts have always been a feature in capital markets, mostly driven by the need to integrate with clearing houses and exchanges – i.e., centralized services. Firms generally view their data and processing models as proprietary, and fundamentally are resistant to commoditization of these. After all, if all firms ran the same software, where would their differentiation be?

The ISDA Common Domain Model (CDM) seems to be quite different, in that it is not specifically driven by the need to integrate with a 3rd party, but rather with the need for every party to integrate with every other party – e.g, in an over-the-counter model.

Historically, the easiest way to regulate a market is to introduce a clearing house or regulated intermediary that ensure transparency to the regulators of all market activity. However, this is undesirable for products negotiated directly between trading parties or for many digital products, so some way is needed to square this circle. Irrespective of the underlying technology, only a common data model can permit both market participants and regulators to have a shared view of transactions. Blockchain/DLT may well be an enabling technology for digitally cleared assets, but other crypto-based solutions which ensure only the participants and the regulators have access to commonly defined critical data may arise.

Initially, integrations and adaptors for the CDM will need to be built with existing applications, but eventually most systems will need to natively adopt the CDM. This should eventually give rise to ‘derivatives-as-a-service’ platforms, with firms differentiating based on product innovations, and potentially major participants offering their platforms to other firms as a service.

Conclusion

Firms which can thread all four of these themes together to provide a common platform will indeed, in my view, have a major advantage over firms which do not. It is evident that with judicious use of public cloud, use of strong common data models to drive platform evolution, use of shared services across business functions for data management and analytics, and use of AI/machine learning for augmenting users work in different domains (starting with trading), all combine to yield extraordinary benefits. The big question is how quickly this will happen, and how to effectively balance investment in existing application portfolios vs new initiatives that can properly leverage these technologies, even as they continue to evolve.

Becoming a financial ‘super-power’ through emerging technologies

Strategic Theme # 5/5: Machine Learning

[tl;dr Business intelligence techniques coupled with advanced data semantics can dynamically improve automated or automatable processes through machine learning. But 2015 is still mainly about exploring the technologies and use cases behind machine learning.]

Given the other strategic themes outlined in this blog (lean enterprise, enterprise modularity, continuous delivery & system thinking), machine learning seems to be a strange addition. Indeed, it is a very specialist area, about which I know very little.

What is interesting about machine learning (at least in the enterprise sense), is that it heavily leans on two major data trends: big data and semantic data. It also has a significant impact on the technology that is the closest equivalent to machine learning in wide use today: business intelligence (aka human learning).

Big Data

Big data is a learning area for many organisations right now, as it has many potential benefits. Architecturally, I see big data as an innovative means of co-locating business logic with data in a scalable manner. The traditional (non big-data) approach to co-locating business logic with data is via stored procedures. But everyone knows (by now) that while stored-procedure based solutions can enable rapid prototyping and delivery, they are not a scalable solution. Typically (after all possible database optimisations have been done) the only way to resolve performance issues related to stored procedures is to buy bigger, faster infrastructure. Which usually means major migrations, etc.

Also, it is generally a very bad idea to include business logic in the database: this is why so much effort has been expended in developing frameworks which make the task of modelling database structures in the middle tier so much easier.

Big data allows business logic to be maintained in the ‘middle’ tier (or at least not the database tier) although it changes the middle tier concept from the traditional centralised application server architecture to a fundamentally distributed cluster of nodes, using tools like SparkMesos and Zookeeper to keep the nodes running as a single logical machine. (This is different from clustering application servers for reasons of resilience or performance, where as much as possible the clustering is hidden from the application developers through often proprietary frameworks.)

While the languages (like Pig, Hive, Cascading, Impala, F#, Python, Scala, Julia/R, etc)  to develop such applications continue to evolve, there is still some way to go before sophisticated big-data frameworks equivalent to JEE /Blueprint and Ruby on Rails on traditional 3-tier architectures are developed.  And clearly ‘big data’ languages are optimised for queries and not transactions.

Generally speaking, traditional 3-tier frameworks still make sense for transactional components, but for components which require querying/interpreting data, big data languages and infrastructure make a lot more sense. So increasingly we can see more standard application architectures using both (with sophisticated messaging technologies like Storm helping keep the two sides in sync).

An interesting subset of the ‘big data’ category (or, more accurately, the category of databases knowns as NoSQL), are graph databases. These are key for machine learning, as will be explained below. However, Graph databases are still evolving when it comes to truly horizontal scaling, and while they are the best fit for implementing machine learning, they do not yet fit smoothly on top of ‘conventional’ big data architectures.

Semantic Data

Semantic data has been around for a while, but only within very specialist areas focused on AI and related spheres. It has gotten more publicity in recent years through Tim Berners-Lee promoting the concept of the semantic web. It requires discipline managing information about data – or meta-data.

Initiatives like Linked Data, platforms like datahub.io, standards like RDF, coupled with increasing demand for Open Data are helping develop the technologies, tools and skillsets needed to make use of the power of semantic data.

Today, standard semantic ontologies – which aim to provide consistency of data definitions – by industry are thin on the ground, but they are growing. However, the most sophisticated ontologies are still private: for example, Wolfram Alpha has a very sophisticated machine learning engine (which forms part of Apple’s Siri capability), and they use an internally developed ontology to interpret meaning. Wolfram Alpha have said that as soon as reliable industry standards emerge, they would be happy to use those, but right now they may be leading the field in terms of general ontology development (with mobile voice tools like Apple Siri etc close behind).

Semantic data is interesting from an enterprise perspective, as it requires knowing about what data you have, and what it means. ‘Meaning’ is quite subtle, as the same data field may be interpreted in different ways at different times by different consumers. For example, the concept of a ‘trade’ is fundamental to investment banking, yet the semantic variations of the ‘trade’ concept in different contexts are quite significant.

As regulated organisations are increasingly under pressure to improve their data governance, firms have many different reasons to get on top of their data:

  • to stay in business they need to meet regulatory needs;
  • to protect against reputational risk due to lost or stolen data;
  • to provide advanced services to clients that anticipate their needs or respond more quickly to client requests
  • to anticipate and react to market changes and opportunities before the competition
  • to integrate systems and processes efficiently with service providers and partners both internally and externally
  • to increase process automation and minimise unnecessary human touch-points
  • etc

A co-ordinated, centrally led effort to gather and maintain knowledge about enterprise data is necessary, supported by federated, bottom-up efforts which tend to be project focused.

Using and applying all the gathered meta-data is a challenge and an opportunity, and will remain high on the enterprise agenda for years to come.

Business Intelligence

Business intelligence solutions can be seen as a form of ‘human learning’. They help people understand a situation from data, which can then aid decision making processes. Eventually, decisions feed into system requirements for teams to implement.

But, in general, business intelligence solutions are not appropriate as machine learning solutions. In most cases, the integrations are fairly unsophisticated (generally batch ETL), and computational ability is optimised for non-technical users to define and execute. The reports and views created in BI tools are not optimised to be included as part of a high performance application architecture, unlike big data tools.

As things stand today, the business intelligence and machine learning worlds are separate and should remain so, although eventually some convergence is inevitable. However, both benefit from the same data governance efforts.

Conclusions

Machine learning is a big topic, which ideally executes in the same context as the other strategic themes. But for 2015, this technology is still in the ‘exploratory’ stages, so localised experiments will be necessary before the technology and business problems they actually solve can be fully exploited.

Strategic Theme # 5/5: Machine Learning