The Meaning of Data

[tl;dr The Semantic Web may be a CDOs best friend in their efforts to help the business realise the full value of enterprise data.]

Large organisations with complex legacy infrastructure are faced with a dilemma: the technology that allowed them to grow and capture market share has reached a turning point in terms of the value returned from additional spending. The only practical investments that can be made is to shore up (protect) those investments by performing essential maintenance, focusing on security, and (perhaps) moving to lower-cost, cloud-based infrastructure. Mandatory (usually regulatory) enhancements also need to be done.

But in terms of protecting existing revenue or capturing new markets, legacy applications are often seen as a burden. Businesses are increasingly looking outside their core IT capability to address their needs – especially in technologies that can help pinpoint opportunities in the first place (i.e., analysing and processing the digital foot-prints left by customers and prospective customers, whether those foot-prints are internal to the firm or external).

Technology is both the problem and the solution. And herein lies the dilemma: internal IT organisations cannot possibly advise on all possible technology-enabled solutions for a business. Rather, businesses need to become more “tech-savvy” – a term which is bandied about a lot these days.

What does “tech-savvy” actually mean, and where is the line drawn between a “tech-savvy” business person, and the professional IT service provider? And what form should this ‘line’ take?

Arguably, most businesses have always been tech-savvy: they knew by-and-large where technology was necessary to grow their business. So for the most successful firms, there is a *lot* of technology. Does that mean those businesses are tech-savvy?

Yes, if those firms are able to manage the complexity of their IT – in other words, to be able to adapt their IT to shifting market needs, and to incorporate/absorb innovations into their architecture without significantly reducing that agility. In practice, few firms have such mastery of their IT (and, by implication, their processes and business information systems).

So a new form of “tech savvy” is needed, that allows business folks to leverage opportunities to both find customers and meet their needs (profitably), while preventing a build-up of unsustainable complexity of processes and systems.

In essence, tech-savvy business folk need to get better at understanding data. And IT needs to get much better at making meaningful business data available to their business – irrespective of where that business data may actually reside.

What does this mean in practice? It’s all about semantics – something previously in the domain of data geeks. But major initiatives like the semantic web (led by Tim Berners-Lee, of WWW fame) are making semantics useful to traditionally non-technical people.

The tools available to business folks around what information (i.e., data + context) is available is very poor, and even when the required data is found (i.e., it is known where it is) it can be difficult to extract it and use it in a productive way.

A good example of this may be observed in the proliferation of Excel spreadsheets and Access databases in organisations that support critical business functions. These tools are necessary to support those areas as the data they need from various systems are not where they need when they need it – and that is often due to business needs changing faster than IT can capture, absorb and deliver requirements.

Over time (at least in principle), all meaningful business data will (must) end up being processed exclusively by controlled enterprise systems..but this doesn’t mean waiting until IT has made the necessary changes in order to effectively govern that data and make maximum use out of it.

A key principle behind the semantic web is that data can be anywhere. In an enterprise scenario, that means the data could exist in:

  • an internal enterprise system
  • a file system (e.g., Excel sheet or Access database)
  • a website (file download, or website screen scraping)
  • a commercial information provider (Reuters, Bloomberg, etc)
  • a business partner/supplier
  • etc

To take advantage of this, meta-data (i.e., data about data) needs to be made available to businesses, and it needs to be business-relevant and agnostic to internal IT systems and processes.

More than this, the tools needed to discover useful information, as well as to retrieve and process that information need to be available to tech-savvy business folk. The right set of tools, which are suitably agnostic to specific architectures and systems, can allow businesses to explore new opportunities and business models, while allowing the IT systems and platforms to catch up and evolve over time – noting that there is no presumption that the eventual source of useful business data originates from, or is stewarded by, internal systems managed by IT.

Such tools also need to be enforce compliance to ensure only the right people get access to the right data, and (if necessary) limit the ability for people to pull data outside of the retrieval platform. In essence, providing a controlled sandbox in which businesses can distill value from information, whatever its source.

Many organisations may approach this by implementing a ‘data lake‘. This may (perhaps) be a workable solution for all data assets managed by the IT organisation. But it is not feasible for many sets of data which are not managed by the IT organisation, but which still have business utility.

Emerging standards and technologies are evolving to meet this need for a data-centric view of information systems – in particular, RDF, OWL and related ‘ontology‘ languages, as well as standards like SPARQL which enable the discovery and retrieval of data originating from multiple sources. But tools are still very primitive and generally require too much technology savviness. However, efforts like, for example, the Callimachus Project gives a hint at the potential of data-driven applications.

At some point, it will not be unreasonable to ask an IT organisation to expose all of its data assets semantically via SPARQL end-points (with appropriate access controls), and to provide tools to businesses to allow them to explore that data and (where permissible) incorporate them into spreadsheets, models and other tools in ways that allow the business to realise value from that data without requiring IT change requests. Developing the capabilities to understand semantic data, and use it to commercial advantage, will take time but it will be a worthwhile investment (and arguably before folks start spending money on ‘big data’ projects that have no wider context).

In fact, I would go so far as to say that any provider (startup or established) delivering technology services to a business should provide SPARQL endpoints as a matter of course. Making data useful and available to the business will help the business realise the value of data and get more involved in how it is captured, processed and stored in the future – and make it easier to incorporate 3rd party solution providers into business operations.

In a nutshell, the Semantic Web may be the CDOs best friend in their efforts to help the business realise the full value of enterprise data.

Advertisements
The Meaning of Data