The Meaning of Data

[tl;dr The Semantic Web may be a CDOs best friend in their efforts to help the business realise the full value of enterprise data.]

Large organisations with complex legacy infrastructure are faced with a dilemma: the technology that allowed them to grow and capture market share has reached a turning point in terms of the value returned from additional spending. The only practical investments that can be made is to shore up (protect) those investments by performing essential maintenance, focusing on security, and (perhaps) moving to lower-cost, cloud-based infrastructure. Mandatory (usually regulatory) enhancements also need to be done.

But in terms of protecting existing revenue or capturing new markets, legacy applications are often seen as a burden. Businesses are increasingly looking outside their core IT capability to address their needs – especially in technologies that can help pinpoint opportunities in the first place (i.e., analysing and processing the digital foot-prints left by customers and prospective customers, whether those foot-prints are internal to the firm or external).

Technology is both the problem and the solution. And herein lies the dilemma: internal IT organisations cannot possibly advise on all possible technology-enabled solutions for a business. Rather, businesses need to become more “tech-savvy” – a term which is bandied about a lot these days.

What does “tech-savvy” actually mean, and where is the line drawn between a “tech-savvy” business person, and the professional IT service provider? And what form should this ‘line’ take?

Arguably, most businesses have always been tech-savvy: they knew by-and-large where technology was necessary to grow their business. So for the most successful firms, there is a *lot* of technology. Does that mean those businesses are tech-savvy?

Yes, if those firms are able to manage the complexity of their IT – in other words, to be able to adapt their IT to shifting market needs, and to incorporate/absorb innovations into their architecture without significantly reducing that agility. In practice, few firms have such mastery of their IT (and, by implication, their processes and business information systems).

So a new form of “tech savvy” is needed, that allows business folks to leverage opportunities to both find customers and meet their needs (profitably), while preventing a build-up of unsustainable complexity of processes and systems.

In essence, tech-savvy business folk need to get better at understanding data. And IT needs to get much better at making meaningful business data available to their business – irrespective of where that business data may actually reside.

What does this mean in practice? It’s all about semantics – something previously in the domain of data geeks. But major initiatives like the semantic web (led by Tim Berners-Lee, of WWW fame) are making semantics useful to traditionally non-technical people.

The tools available to business folks around what information (i.e., data + context) is available is very poor, and even when the required data is found (i.e., it is known where it is) it can be difficult to extract it and use it in a productive way.

A good example of this may be observed in the proliferation of Excel spreadsheets and Access databases in organisations that support critical business functions. These tools are necessary to support those areas as the data they need from various systems are not where they need when they need it – and that is often due to business needs changing faster than IT can capture, absorb and deliver requirements.

Over time (at least in principle), all meaningful business data will (must) end up being processed exclusively by controlled enterprise systems..but this doesn’t mean waiting until IT has made the necessary changes in order to effectively govern that data and make maximum use out of it.

A key principle behind the semantic web is that data can be anywhere. In an enterprise scenario, that means the data could exist in:

  • an internal enterprise system
  • a file system (e.g., Excel sheet or Access database)
  • a website (file download, or website screen scraping)
  • a commercial information provider (Reuters, Bloomberg, etc)
  • a business partner/supplier
  • etc

To take advantage of this, meta-data (i.e., data about data) needs to be made available to businesses, and it needs to be business-relevant and agnostic to internal IT systems and processes.

More than this, the tools needed to discover useful information, as well as to retrieve and process that information need to be available to tech-savvy business folk. The right set of tools, which are suitably agnostic to specific architectures and systems, can allow businesses to explore new opportunities and business models, while allowing the IT systems and platforms to catch up and evolve over time – noting that there is no presumption that the eventual source of useful business data originates from, or is stewarded by, internal systems managed by IT.

Such tools also need to be enforce compliance to ensure only the right people get access to the right data, and (if necessary) limit the ability for people to pull data outside of the retrieval platform. In essence, providing a controlled sandbox in which businesses can distill value from information, whatever its source.

Many organisations may approach this by implementing a ‘data lake‘. This may (perhaps) be a workable solution for all data assets managed by the IT organisation. But it is not feasible for many sets of data which are not managed by the IT organisation, but which still have business utility.

Emerging standards and technologies are evolving to meet this need for a data-centric view of information systems – in particular, RDF, OWL and related ‘ontology‘ languages, as well as standards like SPARQL which enable the discovery and retrieval of data originating from multiple sources. But tools are still very primitive and generally require too much technology savviness. However, efforts like, for example, the Callimachus Project gives a hint at the potential of data-driven applications.

At some point, it will not be unreasonable to ask an IT organisation to expose all of its data assets semantically via SPARQL end-points (with appropriate access controls), and to provide tools to businesses to allow them to explore that data and (where permissible) incorporate them into spreadsheets, models and other tools in ways that allow the business to realise value from that data without requiring IT change requests. Developing the capabilities to understand semantic data, and use it to commercial advantage, will take time but it will be a worthwhile investment (and arguably before folks start spending money on ‘big data’ projects that have no wider context).

In fact, I would go so far as to say that any provider (startup or established) delivering technology services to a business should provide SPARQL endpoints as a matter of course. Making data useful and available to the business will help the business realise the value of data and get more involved in how it is captured, processed and stored in the future – and make it easier to incorporate 3rd party solution providers into business operations.

In a nutshell, the Semantic Web may be the CDOs best friend in their efforts to help the business realise the full value of enterprise data.

The Meaning of Data

The 3 pillars of a ‘digital’ strategy

The ‘term’ digital is bandied around quite a lot, so it is useful to be quite precise about what is meant by it.

I believe a ‘digital’ strategy for a business must formally address all of the following 3 pillars if it is to be considered a true ‘digital’ strategy and to achieve the goals of the business:

  • Customer- or client-centricity
  • Data is an asset
  • Achieve and maintain business agility

Customer- or client-centricity – this is about understanding the client’s needs, and in effect providing the whole capability of the organisation to meet the client’s needs effectively and efficiently. The client is assumed to be the primary source of revenue, and the balance between meeting the firm’s wants (i.e., to get clients and to make as much money as possible from them) and the client’s needs (i.e., to get the service they need) will very much shape any client-centricity programme.

Many ‘digital’ efforts tend to focus exclusively on this space, as this involves big data, social, mobile, etc, etc. It is about building (mobile or traditional desktop) applications to meet client needs, it is about providing clients a richer user experience, it is about clients being able to interact with the firm through a single portal tuned for their needs, and not the firm’s own wants. It is about joining processes and functions which historically have acted as islands.

Client-centricity efforts are usually led by the CEO and/or the COO.

Data is an asset

Data is an asset implies you ‘know what you got’ in terms of knowing what data the firm has and where.

This is becoming a bigger priority for more and more firms, as they realise that key strategic goals such as knowing your client’s needs, meeting demanding regulatory obligations or improving operational excellence are all but impossible to achieve without some stewardship of the vast amounts of data that the organisation captures every day.

Stewardship means putting in place principles, practices and tools to allow data to be discovered and accessed when and where it is needed, and to avoid the creation of redundant or duplicative processes or systems.

Data stewardship needs to be led at the level of COO at least, and potentially CEO.

Achieve and maintain business agility

Business agility is the ability to rapidly and sustainably respond to change. Many businesses have been agile in the past, in the sense that they see market opportunities and do the work needed to take advantage of it, usually within a 1 year timeframe and often much less.

However, as the complexity in an organisation grows, the ability to respond rapidly to change decreases, until eventually sclerotic processes and systems force a massive re-investment in technology with the corresponding high cost and high risk.

So maintaining business agility is a complexity management activity: first to control the complexity and then to manage it.

As with the others, business agility needs to be led at the level of COO at least, and potentially CEO. It should not be the responsibility of the CIO on their own.

Summary

At a minimum, all three of the above threads need to be in place to achieve a successful digital transformation effort. The need for first two threads are generally widely understood at the board- and ‘CXO’ level of firms.

But here’s the question: what are they doing about business agility and complexity management? The answer is buried in the murky world of ‘enterprise architecture’, a discipline which has never quite settled down into a steady-state agreement of what it is or what it should focus on, or who should be accountable for doing it.

In a digital context Enterprise Architecture should be first and foremost about complexity management in the context of business agility – especially when viewed in light of the other two executive areas of focus. (This also applies in other non-digital business contexts, such as Mergers & Acquisitions, or Regulatory Compliance.)

Business agility requires agility in its technology. The way to achieve and maintain agility is through managing complexity *continuously* – the complexity of business processes, and complexity of the supporting IT.

A proven way to address complexity is to partition or modularise activities, from the enterprise all the way down, and to establish principles for identifying partitions and for governing change within and across partitions.

This requires new practices and skills which are specifically focused on these principles. These may be skills more mature IT organisations may have, but the practice is a business-technology endeavour: in the end, in the same way the CEO demands focus on customers and data, they must also demand business agility and hold their business and IT to account for achieving that.

An excellent book on this topic is Roger Sessions ‘Simple Architectures for Complex Systems‘, some of which ideas will feature strongly in future posts. A related enabler is the ‘Scaled Agile Framework (SAFe)‘, which drives business agility goals into execution of projects and programs.

The 3 pillars of a ‘digital’ strategy

Understanding Big Data

Big Data is a Big Topic. So I’m trying to get my head around a few basic concepts. 

My interest stems from the following areas:

  • Managing data complexity – most organisations have more data than they know what to do with. In particular, data semantics is a big problem, as is the ability to find and access the data that is needed.
  • Machine learning – the ability to infer meaning from data and use this in highly automated processes on a real-time or near-real-time basis – e.g., Amazon/Netflix recommendation engines. Any processes which need human ‘4-eyes’ is a good candidate for this. This article from IBM is a good synopsis of open source technologies that can aid machine learning. Note – this is distinct from business intelligence, which (today) assumes that the report produced by the system is the end product; i.e., business intelligence is not in and of itself intended to be part of an automated process. But the lines between business and intelligence and machine learning can be expected to blur..
  • Data Discovery – for many organisations, finding the data you need is a big challenge. Graph databases, triple-stores and open standards like RDF offer a way for these to be useful to, and accessible to, non-architects. In large corporate environments for example, these technologies can enable the creation of a useful who’s-who of experts in different technologies, recognising that the universe of technologies is constantly changing, and many technologies are closely related to each other or tend to be used together. Data discovery initiatives like Datahub and Linked Data are worth watching, as is the W3C efforts around the Semantic Web.
  • Modularity and Data Persistence – the relationship between data and services is historically a challenging one, with the natural tendency to have business logic as close to the data as possible (e.g, stored procedures etc). The sheer number of alternative data store/retrieval options means that it is even more important to separate the implementation of modules from their APIs: by all means (if you must) mix data and logic in the implementation, but do not expose the data any other way except via the module API or you will lose control of the data. This means more and more data should be exposed via services, and business logic should access the data via these services only. In principle, this allows business functionality to be exposed as modules, and data services to support multiple modules without compromising principles of modularity. It also allows a degree of flexibility over which of the many persistence solutions should be used for a given problem.
  • Containers – many database technologies today can be deployed into a self-contained environment, as they expose their interfaces through open APIs (such as RESTful, etc). So they can be isolated from the technology and architecture of the rest of your platform (in much the same way your Oracle database can be on Unix, and your clients running Windows, etc). Technologies like Docker and Mesos enable distributed databases to be built on commodity technology, enabling capacity and resilience to be added horizontally (by adding more commodity nodes) rather than vertically (more big iron). The relationship between these technologies and modular, service-oriented architectures is still rather immature..however, the trend is evident and has significant implications in architectural design.

I don’t pretend to fully understand the nature or implications of all of the above..the real world will decide what is useful or what is not. But there are a number of trends here that are key:

  • Increasing focus on data semantics and data discovery
  • Massive innovation in database technologies – no one-size fits all solution
  • Technologies to support infrastructure management are advancing in lock-step with the advances in database technologies
  • Technologies to be able to do something useful with all this data on a (near) real-time basis are also improving dramatically.

All of the above is mainly concerned with data-at-rest: it’s a whole different subject about how data gets from where it is now to where it is needed, without resorting to building point-to-point interfaces.

 

 

Understanding Big Data