Despite the fact that today companies live and die by the quality of their data, there are still no universally accepted ways to measure data quality. There are beneficial data inputs — such as good intelligence about your customers — and then there the data gaps. How do we measure the net sum of all those parts?

Investors generally have low expectations about the quality of firms' data. That helps explain why the market often seems less than surprised by a data-related or data security-related fiasco, such as a bad trade made based on faulty data or a cyber breach. Yet for companies that do manage their data well, they often fail to communicate that success effectively, losing an opportunity to build brand loyalty and client retention.

In effect, companies need a "data balance sheet" to convey their data-related advantages to consumers, investors and other key stakeholders, because right now a lack of knowledge about data infrastructure — both good and bad — only helps the companies that are doing a poor job of managing their data.

To begin to build a data balance sheet, institutions should think about what are their data "liabilities" and data "assets." Of course, unlike a financial balance sheet — in which the "balance" is attained by an equal amount of assets to liabilities — the objective of a data balance sheet is different: to show a tipped balance toward the assets.


Chief data officers seem to spend the most amount of their time on the poor data areas, i.e. the liabilities. Liabilities are data elements that are not well understood and that can lead to costly failures and incidents. One example highlighted in the movie "The Big Short" was the data around subprime mortgages. Banks did not understand the extent of their exposure because they were not properly measuring it. That ended up killing some of them.

Another example of a data liability relates to rogue traders. They can prey on and take advantage of bad or fuzzy data. This includes poorly maintained lists of individuals who have access to back-office systems (certain rogue traders have been able to profit from such uncertainty) to lists of those counterparties that are real and those which are not. Other traders have altered spreadsheets to distort a bank's true market risk position. This happened in the London Whale incident, for example.

These traders bank on the idea that nobody knows that these data inputs are bad and cares to find out how bad. Fuzzy data can go undetected in a complex trading environment for long periods.

Stress test failures have also been incurred due to banks' inability to trace their data from its origin to the report about the stress test result. Although a bank's stress test result may have pointed to a "pass," the institution can sometimes be unaware of how its data produced that result, and that can make it hard to prove success to the regulators.

Perhaps the most obvious category of data liabilities are data breaches — unwanted penetrations of a company's data that do nothing to build investor confidence. The most recent high-profile incident, of course, involved Yahoo and its email subscribers but other examples abound. Despite high levels of security spending, such incidents continue to occur.


So what about the asset side — the data banks can leverage?

The first data asset is a company's secret sauce, the proprietary information that is kept under wraps and is crucial to the company's success. A good example of this has been the widely held belief that the Coca-Cola recipe is a trade secret well protected by the beverage company and known by only a few employees. The company purports to keep the "secret formula" in a "vault" that has been made an exhibit at the World of Coca-Cola museum in Atlanta. However, there have been claims that the recipe was uncovered.

Every successful company has a secret sauce of its own. But does it protect that secret adequately from its competitors? Does it use the secret data and method for protecting fully to their potential?

The second major asset category would be customer data. It is not the volume or the lists of customers that a company holds that matter. It is the data about them — how well a company knows those customers, how well attuned it is to their moods, their desires, their needs and wants. This is data that many companies miss. Is there a team in place to monitor and feed social media? A third asset category is data on potential customers. Again, it is not just having a list of them, but understanding their issues, needs and concerns with their current providers.

The fourth data asset category is the employees. Does a company have good data about its employees? Does it really know its top performers, for instance? Baseball and basketball teams, for example, have looked hard at the data and this has led to an upending of long-held views about which players are really driving performance. Do banks have truly good and reliable data on the performance of traders, salespeople and others? Some banks do but others don't. Investors have little way to assess which companies are making strides in this area and which are not.

The Data Balance Sheet

A data balance sheet approach could help companies better understand their data quality — in essence knowing what they know. In turn, this could help provide greater transparency to investors.

There are, however, several challenges to executing this approach. The first is how to measure liabilities if one does not know they exist. In the case of "The Big Short," if the banks did not realize they had a lack of data about subprime credits, how could they have been transparent about that? In such cases, we must acknowledge the difficulty in providing such transparency. Nevertheless, providing information on complex securities, potential gaps in understanding that information and the progress in filling such gaps, can help acknowledge the potential for data liabilities.

Another challenge is the understandable caution for companies to provide such transparency to investors. Those companies who manage their data well, however, have much to gain from such transparency. A data balance sheet is not intended to provide the same degree of precision as a traditional financial balance sheet. Rather, it should provide indications of the ability of a company to manage its data assets and liabilities.

For example, what investments have been made to measure employee performance data? What about investments in data quality tools, cybersecurity to prevent and protect from data breaches? Such information may provide some assistance to investors as they try to figure out which companies handle their data well and which do not. Additional data, such as metrics around data breaches and information leakages, would also support that effort.

Third, companies need to invest in tools, systems and employees to be able to execute on the data balance sheet approach. Unlike financial balance sheets, currently there is no one system that can act as a general ledger for data. But assembling a set of tools and systems that can analyze and measure the change over time in the quality of data will be critical in enabling companies to report better to management, regulators and ultimately investors, the state of their data balance sheet.

Andrew Waxman is an associate partner in IBM Global Business Services' financial markets risk and compliance practice and can be reached on Twitter @abwaxman. The views expressed here are his own.