Information technology professionals in banking must learn to view data warehouses as our colleagues in the distribution industry view actual warehouses.
To the logistics specialists who locate, build, and operate them, warehouses are far more than simple storage facilities. They are critical links in the complex distribution chains that span the business process from raw materials production to the final sale of an end product.
Think, for example, of the journey that the fabric in a cotton garment takes from a cotton field to the consumer. The process consists of at least half a dozen stages, beginning with a cotton mill and ending with a retail bar code reader. Value is added at every stage, and the product must be routed and tracked through every step.
The journey of data through a bank is analogous to that of the cotton in the T-shirt. Data are gathered through a wide variety of supply sources, from branch automated teller machines to commercial loan systems, processed through transaction and accounting systems, and redistributed to users throughout the bank. The information is eventually routed - in the form of general ledger summaries and management information reports - to the desks of senior executives.
To maximize the value they can extract from their data during this process, bankers have turned increasingly to the use of data warehouses.
In its simplest form, the data warehouse is an independent data base that extracts information from a variety of sources, such as transaction processing systems, and makes it available to additional applicationssystems, such as marketing research or management reporting.
The concept of the data warehouse has proven a valuable approach, particularly in relation to decision-support systems. When properly applied, the data warehouse strategy helps to create a single set of numbers that constitute an integrated financial reality for the whole institution. By insuring that all data for decision-support applications are drawn from a common source, the data warehouse represents a meaningful step in the eradication of the data chaos that is prevalent in all too many banks.
Unfortunately, the data warehouse approach can create serious problems if it is applied naively or indiscriminately. Due to the enduring legacy of the mainframe, some bank systems professionals have a tendency to think in terms of a single, all-encompassing data warehouse as a universal solution to their information processing needs. At the very least, the attempt to satisfy all users and all uses with one universal data warehouse will prove expensive and complicated. It may well prove disastrous.
Very few manufacturing enterprises, by analogy, attempt to satisfy all the firm's material distribution needs with a single warehouse. They instead implement sophisticated distribution systems which incorporate specialized or geographically dispersed warehouses, each aimed at satisfying a particular business objective. The use of multiple warehouses as part of a comprehensive distribution strategy reduces the risk inherent in dependence on a single facility, speeds response times, and reduces transportation and communications costs.
The adoption of this manufacturing distribution model will have important implications for banking systems. Manufacturing distributions systems, for example, tolerate and even require considerable duplication of inventories among various warehouses. The duplication is intentional and necessary to meet demand.
A banking system with multiple data warehouses will also require considerable duplication of data, extract feeds, and applications interfaces. Though such duplication may be undesirable from a purely theoretical point of view, it may also be essential to meet the business objectives of the various end user groups the warehouse is intended to serve.
Another important consideration is that data warehouses are actually quite limited in function. They are designed to collect and store data in some central and accessible location. These functions represent only the first stages in the overall information process. To be truly meaningful, data must be interpreted through some analytical process. In this sense, the information is like the raw material in a manufacturing process. It must be processed into the equivalent of finished goods. Most of the ultimate value of data is added as a result of this analytic processing.
Different analysts, of course, may apply different assumptions and methodologies to the same data. There will always be legitimate debates about proper methodologies, and the data warehouse strategy alone cannot impose consistency at this level. The great value of the data warehouse at this stage is that it will force opponents to argue from the same data and confine their debate to the methodological issues.
The information produced in the analytical stages must also be distributed to end users. In the case of decision-support systems, the ultimate end users are the senior managers who make strategic decisions.
But much of the information produced may also be of significant value to tactical decision-makers working on the operational level. A comprehensive information distribution strategy can create additional value for a bank by making tactical information - such as the profitability of specific customer relationships - available for use by line managers.
To realize all the potential value of the underlying data, decision- support systems must be able to capture and hold the results of the analytic process, along with the assumptions that informed them.
They must also be able to store this information for additional analysis or historical use. In effect, decision-support applications must be able to write results and assumptions back into the data warehouse while preserving the original data.
At this point it should be clear that builders of data warehouses at large banks will confront significant technical challenges. The total amount of data to be stored in the warehouse will obviously increase with each reporting cycle. Data warehouses will require enormous storage capacity. And when they are used with decision-support applications, they will place enormous demands on processing power, since decision-support applications may sometimes read millions of rows of data and write back additional data to every record.
These technological demands may impose significant cost and performance tradeoffs on warehouse design. If a sizable bank decides to implement a single, centralized data warehouse to support all its analytic applications, it is effectively deciding that it must implement its warehouse on a massively parallel processing server or on a mainframe. No other current technologies can yield acceptable performance for such very large data bases.
Massively parallel processing technology has already proven quite successful in supporting very large data bases for on-line transaction processing applications. Decision-support applications, however, impose much more intensive processing and data transfer loads. Systems developers are currently adapting massively parallel processing platforms for these more demanding applications, and there is no reason to doubt that they will be very successful. But bankers must realize that if they want to take advantage of the benefits of client/server architecture, the single data warehouse strategy currently means dependence on the leading edge of a new and very sophisticated technology.
If an institution chooses to implement a more highly distributed design, with multiple warehouses serving particular functions or areas, then it can employ a more familiar technology platform such as the symmetrical multiprocessor. But it may also face greater and more complex communications requirements. Fortunately, recent improvements in relational data base management systems, such as advanced data base replication techniques, make distributed data warehouse alternatives more feasible than in the past.
In addition to technological considerations, banks must also anticipate the impact that changes in the business environment may have on their data warehousing and information distribution strategies. By far the mostsignificant and visible change is the industry's current rush to consolidation, a trend documented by the accelerated merger activity that is redrawing the map of American banking.
It is widely acknowledged that systems integration is the most difficult challenge in consolidating the operations of large financial institutions.
The impact of consolidations on the structure and uses of data warehouses will be profound. If nothing else, the increase in the amount of data to be stored could be enormous. Consolidation may also require changes to data structures, formats, extract programs, and interfaces with applications systems.
It is clear from all these considerations that the design and implementation of data warehouses will be a dynamic and evolving process at every institution. Data warehouse development, like systems development generally, will be best managed as an iterative process.
However desirable it would be to supply all the data needs of the banking enterprise from a single, comprehensive, and fully normalized data warehouse, we must acknowledge that we may never fully realize this ideal solution.
Iterative design will instead emphasize satisfaction of a limited number of high-priority objectives and enough flexibility to adapt to changing business requirements.
In practice, the wisest strategy for most banks will be to design and implement a series of limited but related data warehouses, each dedicated to the data requirements of a few major user groups and their most critical applications.
In time, it may be possible to blend many or most of these separate data facilities into a larger and more central warehouse which can populate satellite warehouses with the data their users require. But user demand will tend to create new data facilities faster than a central warehouse can absorb them.
This iterative approach requires us to live with an inherently fuzzy and imperfect model of the data warehouse. But it is a realistic one. The past 20 years have shown us that ideal models are seldom if ever implemented completely. And when they are, they may be quickly outmoded by business and technological change.
We need to remember at all times that data warehouses represent only partial solutions to enterprise-wide information processing needs. They must be designed as elements of larger information distribution strategies that govern all decision-support applications and uses. They must also be designed in the realization that the business objectives they serve can and do change rapidly over time.
Banks will continually create new strategic business units to enter new markets, and they will probably divest themselves of other operations as their principal markets change.
Information strategies that don't recognize the constancy of change will inevitably fail. And they will invariably reduce the value of the institutions that rely on them.