Agile quantitative analytics in Financial Services

[tl;dr technology is empowering quantitative analysts to much more on their own. IT organisations will need to think about how to make these capabilities available to their users, and how to incorporate it into IT strategies around big-data, cloud computing, security and data governance.]

The quest for positive (financial) returns in investments is helping drive considerable innovation in the space of quantitative analytics. This, coupled with the ever-decreasing capital investment required to do number-crunching, has created demand for ‘social’ analytics – where algorithms are shared and discussed amongst practitioners rather than kept sealed behind the closed doors of corporate research & trading departments.

I am not a quant, but have in the past built systems that provided an alternate (to Excel) vehicle  for quantitative research analysts to capture and publish their models. While Excel is hard to beat for experimenting with new ideas, from a quantitative analyst perspective, it suffers from many deficiencies, including:

  • Spreadsheets get complex very quickly and are hard to maintain
  • They are not very efficient for back-end (server-side) use
  • They cannot be efficiently incorporated into scalable automated workflows
  • Models cannot be distributed or shared without losing control of the model
  • Integrating spreadsheets with multiple large data sets can be cumbersome and memory inefficient at best, and impossible at worst (constrained by machine memory limits)

QuantCon was created to provide a forum for quantitative analysts to discuss and share tools and techniques for  quantitative research, with a particular focus on the sharing and distribution of models (either outcomes or logic, or both).

Some key themes from QuantCon which I found interesting were:

  • The emergence of social analytics platforms that can execute strategies on your venue of choice (e.g. quantopian.com)
  • The search for uncorrelated returns & innovation in (algorithmic) investment strategies
  • Back-testing as a means of validating algorithms – and the perils of assuming backtests would execute at the same prices in real-life
  • The rise of freely available interactive model distribution tools such as the Jupyter project (similar to Mathematica Notebooks)
  • The evolution of probabilistic programming and machine learning – in particular the PyMC3 extensions to Python
  • The rise in the number of free and commercial data sources (APIs) of data points (signals) that can be included in models

From an architectural perspective, there are some interesting implications. Specifically:

  • There really is no limit to what a single quant with access to multiple data sources (either internal or external) and access to platform- or infrastructure-as-a-service capabilities can do.
  • Big data technologies make it very easy to ingest, transform and process multiple data sources (static or real-time) at very low cost (but raise governance concerns).
  • It has never been cheaper or easier to efficiently and safely distribute, publish or share analytics models (although the tools for this are still evolving).
  • The line between the ‘IT developer’ and the user has never been more blurred.

Over the coming years, we can expect analytics (and business intelligence in general) capabilities to become key functions within every functional domain in an organisation, integrated both into the function itself through feedback loops, as well as for more conventional MIS reporting. Some of the basic building blocks will be the same as they are today, but the key characteristic is that users of these tools will be technologists specifically supporting business needs – i.e., part of ‘the business’ and not part of ‘IT’.

In the past, businesses have been supported by vendors providing expensive but easy-to-use tools to allow non-technical people to work with large datasets. The IT folks were very specifically supporting the core data warehouse & business intelligence infrastructure, or provided technical support for the development of particular reports. In these cases, the clients were typically non-technical, and the platform could only evolve as quickly as the software  vendors evolved.

The emerging (low-cost) tools for quantitative analytics will give rise to a post-Excel world of innovation, scale and distribution that will empower users, give rise to whole new business models, and be itself a big driver into how enterprise IT defines its role in the agile, unbundled, decentralised and as-a-service technology landscape of the near future.

Traditional IT organisations often saw the ‘application development’ functions as business aligned, and were comfortable with the client-oriented nature of providing technical infrastructure services to development teams. However, internal development teams supporting other (business-aligned) development teams is a fairly new concept, and will likely best be done by external specialist providers. This is a good example of where IT’s biggest role (apart from governance) in the future is in sourcing relevant providers, and ensuring business technologists are able to do their job effectively and efficiently in that environment.

In summary, technology is empowering quantitative analysts to much more on their own. IT organisations will need to think about how to make these capabilities available to their users, and how to incorporate it into IT strategies around big-data, cloud computing, security and data governance.

Agile quantitative analytics in Financial Services

The CDO: Enterprise Architect’s best friend

[tl;dr The CDO’s agenda may be the best way to bring the Enterprise Architecture agenda to the business end of the C-suite table.]

Enterprise Architecture as a concept has for some time claimed intellectual ownership of a holistic, integrated view of architecture within an organisation. In that lofty position, enterprise architects have put a stake in the ground with respect to key architecture activities, and formalised them all within frameworks such as TOGAF.

In TOGAF, Data Architecture has been buried within the Information Systems Architecture phase of the Architecture Development Method, along with Application Architecture.

But the real world often gets in the way of (arguably) conceptually clean thinking: I note the rise of the CDO, or Chief Data Officer, usually as part of the COO function of an organisation. (This CDO is not to be confused with the Chief Digital Officer, who’s primary focus could critically be seen as to build cool apps…)

So, why has Data gotten all this attention, while Enterprise Architecture still, for the most part, languishes within IT?

Well, as with most things, its happening for all the wrong reasons: mostly its because regulators are holding businesses accountable for how they manage regulated data, and businesses need, in turn, to hold people to account for doing that. Hence the need for a CDO.

But in the act of trying to understand what data they are liable for managing, and to what extent, CDO’s necessarily have to do Data Architecture and Data Governance. And at this point the CDO’s activities starts dipping into new areas such as business process management, and business & IT strategy – and EA.

If CDOs choose to take an expansive view of their role (and I believe they should), then it would most definitely include all the domains of Enterprise Architecture – especially if one views data complexity and technology complexity as two sides of the same enterprise complexity coin: one as a consequence of business decisions, the other as a consequence of IT decisions.

The good news is that this would at long last put EAs into a role more aligned with business goals than with the goals of the IT organisation. This is not to say that the goals of the IT organisation are in some way wrong, but rather than because the business had abdicated its responsibility for many decisions that IT needs in order to do “the right thing”, IT has had to fill the gaps itself – and in ways which said abdicating businesses could understand/appreciate.

For anyone in the position of CTO, this should help a lot: without prescribing which technologies should be used, businesses then can give real context to their demand which the CTO can interpret strategically, and, in collaboration with his CDO colleague, agree a technology strategy that can deliver on those needs.

In this way, the CDO/CTO have a very symbiotic relationship: the boundaries should be fuzzy, although the responsibilities should be clear: the CDO provides the context, the CTO delivers the technology.

So collaboration is key: while in principle one should be able to describe what a business needs strictly in terms of process and data flows etc, the reality is that business needs and practices change in *response* to technology. In other words, it is not a one-way street. What technology is *capable* of doing can shape what the business *should* be doing. (In many cases, for example, technology may eliminate the need for entire processes or activities.)

Folks on the edge of this CDO/CTO collaboration have a choice: do I become predominantly business facing (CDO), or do I become predominantly technology facing (CTO)? Folks may choose to specialise, or some, like myself, may choose to focus on different perspectives according to the needs of the organisation.

One interesting impact from this approach may  address one of the biggest questions out there at the moment: how to absorb the services and capabilities offered by innovative, efficient, and nimble external businesses into the architecture of the enterprise, while retaining control, compliance and agility of the data and processes? But more on this later.

So, where does all this sit with respect to the ‘3 pillars of a digital strategy’? The points raised there are still valid. With the right collaboration and priorities, it is possible to have different folks fill the roles of ‘client centricity’, ‘chief data officer’ and ‘head of enterprise complexity’. But for the most part, all these roles begin and end with data. A proper, disciplined and thoughtful approach to how and why data is captured, processed, stored, retrieved and transmitted should properly address firm-wide themes such as improving client experience, integrating innovative services, keeping a lid on complexity and retaining an appropriate level of enterprise agility.

Some folks may be wondering where the CIO fits in all this: the CIO maintains responsibility for the management and operation of all IT processes (as defined in The Open Group’s IT4IT model). This is a big job. But it is not the job of the CTO, whose focus is to ensure the right technology is brought to bear to solve the right problem.

The CDO: Enterprise Architect’s best friend

Strategic Theme # 5/5: Machine Learning

[tl;dr Business intelligence techniques coupled with advanced data semantics can dynamically improve automated or automatable processes through machine learning. But 2015 is still mainly about exploring the technologies and use cases behind machine learning.]

Given the other strategic themes outlined in this blog (lean enterprise, enterprise modularity, continuous delivery & system thinking), machine learning seems to be a strange addition. Indeed, it is a very specialist area, about which I know very little.

What is interesting about machine learning (at least in the enterprise sense), is that it heavily leans on two major data trends: big data and semantic data. It also has a significant impact on the technology that is the closest equivalent to machine learning in wide use today: business intelligence (aka human learning).

Big Data

Big data is a learning area for many organisations right now, as it has many potential benefits. Architecturally, I see big data as an innovative means of co-locating business logic with data in a scalable manner. The traditional (non big-data) approach to co-locating business logic with data is via stored procedures. But everyone knows (by now) that while stored-procedure based solutions can enable rapid prototyping and delivery, they are not a scalable solution. Typically (after all possible database optimisations have been done) the only way to resolve performance issues related to stored procedures is to buy bigger, faster infrastructure. Which usually means major migrations, etc.

Also, it is generally a very bad idea to include business logic in the database: this is why so much effort has been expended in developing frameworks which make the task of modelling database structures in the middle tier so much easier.

Big data allows business logic to be maintained in the ‘middle’ tier (or at least not the database tier) although it changes the middle tier concept from the traditional centralised application server architecture to a fundamentally distributed cluster of nodes, using tools like SparkMesos and Zookeeper to keep the nodes running as a single logical machine. (This is different from clustering application servers for reasons of resilience or performance, where as much as possible the clustering is hidden from the application developers through often proprietary frameworks.)

While the languages (like Pig, Hive, Cascading, Impala, F#, Python, Scala, Julia/R, etc)  to develop such applications continue to evolve, there is still some way to go before sophisticated big-data frameworks equivalent to JEE /Blueprint and Ruby on Rails on traditional 3-tier architectures are developed.  And clearly ‘big data’ languages are optimised for queries and not transactions.

Generally speaking, traditional 3-tier frameworks still make sense for transactional components, but for components which require querying/interpreting data, big data languages and infrastructure make a lot more sense. So increasingly we can see more standard application architectures using both (with sophisticated messaging technologies like Storm helping keep the two sides in sync).

An interesting subset of the ‘big data’ category (or, more accurately, the category of databases knowns as NoSQL), are graph databases. These are key for machine learning, as will be explained below. However, Graph databases are still evolving when it comes to truly horizontal scaling, and while they are the best fit for implementing machine learning, they do not yet fit smoothly on top of ‘conventional’ big data architectures.

Semantic Data

Semantic data has been around for a while, but only within very specialist areas focused on AI and related spheres. It has gotten more publicity in recent years through Tim Berners-Lee promoting the concept of the semantic web. It requires discipline managing information about data – or meta-data.

Initiatives like Linked Data, platforms like datahub.io, standards like RDF, coupled with increasing demand for Open Data are helping develop the technologies, tools and skillsets needed to make use of the power of semantic data.

Today, standard semantic ontologies – which aim to provide consistency of data definitions – by industry are thin on the ground, but they are growing. However, the most sophisticated ontologies are still private: for example, Wolfram Alpha has a very sophisticated machine learning engine (which forms part of Apple’s Siri capability), and they use an internally developed ontology to interpret meaning. Wolfram Alpha have said that as soon as reliable industry standards emerge, they would be happy to use those, but right now they may be leading the field in terms of general ontology development (with mobile voice tools like Apple Siri etc close behind).

Semantic data is interesting from an enterprise perspective, as it requires knowing about what data you have, and what it means. ‘Meaning’ is quite subtle, as the same data field may be interpreted in different ways at different times by different consumers. For example, the concept of a ‘trade’ is fundamental to investment banking, yet the semantic variations of the ‘trade’ concept in different contexts are quite significant.

As regulated organisations are increasingly under pressure to improve their data governance, firms have many different reasons to get on top of their data:

  • to stay in business they need to meet regulatory needs;
  • to protect against reputational risk due to lost or stolen data;
  • to provide advanced services to clients that anticipate their needs or respond more quickly to client requests
  • to anticipate and react to market changes and opportunities before the competition
  • to integrate systems and processes efficiently with service providers and partners both internally and externally
  • to increase process automation and minimise unnecessary human touch-points
  • etc

A co-ordinated, centrally led effort to gather and maintain knowledge about enterprise data is necessary, supported by federated, bottom-up efforts which tend to be project focused.

Using and applying all the gathered meta-data is a challenge and an opportunity, and will remain high on the enterprise agenda for years to come.

Business Intelligence

Business intelligence solutions can be seen as a form of ‘human learning’. They help people understand a situation from data, which can then aid decision making processes. Eventually, decisions feed into system requirements for teams to implement.

But, in general, business intelligence solutions are not appropriate as machine learning solutions. In most cases, the integrations are fairly unsophisticated (generally batch ETL), and computational ability is optimised for non-technical users to define and execute. The reports and views created in BI tools are not optimised to be included as part of a high performance application architecture, unlike big data tools.

As things stand today, the business intelligence and machine learning worlds are separate and should remain so, although eventually some convergence is inevitable. However, both benefit from the same data governance efforts.

Conclusions

Machine learning is a big topic, which ideally executes in the same context as the other strategic themes. But for 2015, this technology is still in the ‘exploratory’ stages, so localised experiments will be necessary before the technology and business problems they actually solve can be fully exploited.

Strategic Theme # 5/5: Machine Learning

How ‘uncertainty’ can be used to create a strategic advantage

[TL;DR This post outlines a strategy for dealing with uncertainty in enterprise architecture planning, with specific reference to regulatory change in financial services.]

One of the biggest challenges anyone involved in technology has is in balancing the need to address the immediate requirement with the need to prepare for future change at the risk of over-engineering a solution.

The wrong balance over time results in complex, expensive legacy technology that ends up being an inhibitor to change, rather than an enabler.

It should not be unreasonable to expect that, over time, and with appropriate investment, any firm should have significant IT capability that can be brought to bear for a multitude of challenges or opportunities – even those not thought of at the time.

Unfortunately, most legacy systems are so optimised to solve the specific problem they were commissioned to solve, they often cannot be easily or cheaply adapted for new scenarios or problem domains.

In other words, as more functionality is added to a system, the ability to change it diminishes rapidly:

Agility vs FunctionalityThe net result is that the technology platform already in place is optimised to cope with existing business models and practices, but generally incapable of (cost effectively) adapting to new business models or practices.

To address this needs some forward thinking: specifically, what capabilities need to be developed to support where the business needs to be, given the large number of known unknowns? (accepting that everybody is in the same boat when it comes to dealing with unknown unknowns..).

These capabilities are generally determined by external factors – trends in the specific sector, technology, society, economics, etc, coupled with internal forward-looking strategies.

An excellent example of where a lack of focus on capabilities has caused structural challenges is the financial industry. A recent conference at the Bank for International Settlements (BIS) has highlighted the following capability gaps in how banks do their IT – at least as it relates to regulator’s expectations:

  • Data governance and data architecture need to be optimised in order to enhance the quality, accuracy and integrity of data.
  • Analytical and reporting processes need to facilitate faster decision-making and direct availability of the relevant information.
  • Processes and databases for the areas of finance, control and risk need to be harmonised.
  • Increased automation of the data exchange processes with the supervisory authorities is required.
  • Fast and flexible implementation of supervisory requirements by business units and IT necessitates a modular and flexible architecture and appropriate project management methods.

The interesting aspect about the above capabilities is that they span multiple businesses, products and functional domains. Yet for the most part they do not fall into the traditional remit of typical IT organisations.

The current state of technology today is capable of delivering these requirements from a purely technical perspective: these are challenging problems, but for the most part they have already been solved, or are being solved, in other industries or sectors – sometimes in a larger scale even than banks have to deal with. However, finding talent is, and remains, an issue.

The big challenge, rather is in ‘business-technology’. That amorphous space that is not quite business but not quite (traditional) IT either. This is the capability that banks need to develop: the ability to interpret what outcomes a business requires, and map that not only to projects, but also to capabilities – both business capabilities and IT capabilities.

So, what core capabilities are being called out by the BIS? Here’s a rough initial pass (by no means complete, but hopefully indicative):

Data Governance Increased focus on Data Ontologies, Semantic Modelling, Linked/Open Data (RDF), Machine Learning, Self-Describing Systems, Integration
Analytics & Reporting Application of Big Data techniques for scaling timely analysis of large data sets, not only for reporting but also as part of feedback loops into automated processes. Data science approach to analytics.
Processes & Databases Use of meta-data in exposing capabilities that can be orchestrated by many business-aligned IT teams to support specific end-to-end business processes. Databases only exposed via prescribed services; model-driven product development; business architecture.
Automation of data exchange  Automation of all report generation, approval, publishing and distribution (i.e., throwing people at the problem won’t fix this)
Fast and flexible implementation Adoption of modular-friendly practices such as portfolio planning, domain-driven design, enterprise architecture, agile project management, & microservice (distributed, cloud-ready, reusable, modular) architectures

It should be obvious looking at this list that it will not be possible or feasible to outsource these capabilities. Individual capabilities are not developed isolation: they complement and support each other. Therefore they need to be developed and maintained in-house – although vendors will certainly have a role in successfully delivering these capabilities. And these skills are quite different from skills existing business & IT folks have (although some are evolutionary).

Nobody can accurately predict what systems need to be built to meet the demands of any business in the next 6 months, let alone 3 years from now. But the capabilities that separates the winners for the losers in given sectors are easier to identify. Banks in particular are under massive pressure, with regulatory pressure, major shifts in market dynamics, competition from smaller, more nimble alternative financial service providers, and rapidly diminishing technology infrastructure costs levelling the playing field for new contenders.

Regulators have, in fact, given banks a lifeline: those that heed the regulators and take appropriate action will actually be in a strong position to deal competitively with significant structural change to the financial services industry over the next 10+ years.

The changes (client-centricity, digital transformation, regulatory compliance) that all knowledge-based industries (especially finance) will go through will depend heavily on all of the above capabilities. So this is an opportunity for financial institutions to get 3 for the price of 1 in terms of strategic business-IT investment.

How ‘uncertainty’ can be used to create a strategic advantage

THE FUTURE OF ESBs (AND SOA)

There are some interesting changes happening in technology, which will likely fundamentally change how IT approaches technology like Enterprise Service Buses (ESBs) and concepts like Service Oriented Architecture (SOA).

Specifically, those changes are:

  • An increased focus on data governance, and
  • Microservice technology

Let’s take each in turn, and conclude by suggesting how this will impact how ESBs and SOA will likely evolve.

Data Governance

Historically, IT has an inconsistent record with respect to data governance. For sure, each application often had dedicated data modellers or designers, but its data architecture tended to be very inward focused. Integration initiatives tended to focus on specific projects with specific requirements, and data was governed only to the extent it enabled inidividual project objectives to be achieved.

Sporadic attempts at creating standard message structures and dictionaries crumbled in the face of meeting tight deadlines for critical business deliverables.

ESBs, except in the most stable, controlled environments, failed to deliver the anticipated business benefits because heavy-weight ESBs turned out to be at least as un-agile as the applications they intended to integrate, and since the requirements on the bus evolve continually, application teams tended to favour reliable (or at least predictable) point-to-point solutions over enterprise solutions.

But there are three new drivers for improving data governance across the enterprise, and not just at the application level. These are:

  • Security/Privacy
  • Digital Transformation
  • Regulatory Control

The security/privacy agenda is the most visible, as organisations are extremely exposed to reputational risk if there are security breaches. An organisation needs to know what data it has where, and who has access to it, in order to ensure it can protect it.

Digital transformation means that every process is a digital-first process (or ‘straight-through-processing’ in the parlance of financial services). Human intervention should only be required to handle exceptions. And it means that the capabilities of the entire enterprise need to be brought to bear in order to provide a consistent connected customer experience.

For regulated industries, government regulators are now insisting that firms govern their data throughout that data’s entire lifecycle, not only from a security/privacy compliance perspective, but also from the perspective of being able to properly aggregate and report on regulated data sets.

The same governance principles, policies, processes and standards within an enterprise should underpin all three drivers – hence the increasing focus on establishing the role of ‘chief data officeer’ within organisations, and resourcing that role to materially improve how firms govern their data.

Microservice Technology

Microservice technology is an evolution of modularity in monolithic application design that started with procedures, and evolved through to object-oriented programming, and then to packages/modules (JARs and DLLs etc).

Along the way were attempts to extend the metaphor to distributed systems – e.g., RPC, CORBA, SOA/SOAP, and most recently RESTful APIs – in addition to completely different ‘message-driven’ approachs such as that advocated by the Reactive Development community.

Unfortunately, until fairly recently, most applications behind distributed end-points were architecturally monolithic – i.e., complex applications that needed to go through significant build-test-deploy processes for even minor changes, making it very difficult to adapt these applications in a timely manner to external change factors, such as integrations.

The microservices movement is a reaction to this, where the goal is to be able to deploy microservices as often as needed, without the risk of breaking the entire application (or of having a simple rollback process if it does break). In addition, microservice architectures are inherently amenable to horizontal scaling, a key factor behind its use within internet-scale technology companies.

So, microservices are an architectural style that favours agile, distributed deployment.

As such, one benefit behind the use of microservices is that it allows teams, or individuals within teams, to take responsibility for all aspects of the microservice over its lifetime. In particular, where microservices are exposed to external teams, there is an implied commitment from the team to continue to support those external teams throughout the life of the microservice.

A key aspect of microservices is that they are fairly lightweight: the developer is in control. There is no need for specific heavyweight infrastructure – in fact, microservices favor anti-fragile architectures, with abundant low-cost infrastructure.

Open standards such as OSGi and abstractions such as Resource Oriented Computing allow microservices to participate in a governed, developer-driven context. And in the default (simplest) case, microservices can be exposed using plain-old RESTful standards, which every web application developer is at least somewhat familiar with.

Data Governance + Microservices = Enterprise Building Blocks

Combining the benefits of both data governance and microservices means that firms for the first time can start buiding up a real catalog of enterprise-re-usable building blocks – but without the need for a traditional ESB, or traditional ESB governance. Microservices are developed in response to developer needs (perhaps influenced by Data Governance standards), and Data Standards can be used to describe, in an enterprise context, what those (exposed) microservices do.

Because microservices technologies allow ‘smart endpoints’ to be easily created and integrated into an application architecture, the need for a central ‘bus’ is eliminated. Developers can create many endpoints with limited complexity overhead, and over time can converge these into a small number of common services.

With respect to the Service Registry function provided by ESBs, the new breed of API Management tools may be sufficient to provide any lookup/resolution capabilities required (above and beyond those provided by the microservice architecture itself). API Management tools also keep complexity out of API development by taking care of monitoring, analytics, authentication, protocol conversion and basic throttling capabilities – for those APIs that require those capabilities.

Culturally, however, microservices requires a collaborative approach to software development and evolution, with minimum top-down command-and-control intervention. Data governance, on the other hand, is necessarily driven top-down. So there is a risk of a cultural conflict between top-down data governance and bottom-up microservice delivery: both sides need to be sensitive to the needs of the other side, and be prepared to make compromises occasionally.

In conclusion, the ESB is dead…but long live (m)SOA.

THE FUTURE OF ESBs (AND SOA)