"Big Data" is more than a buzzword. It can also be a literal description.
The McKinsey Global Institute says that 15 out of 17 business sectors have more data stored per company than the U.S. Library of Congress. Dealing with this data is no longer a strategic choice for banks, but a competitive necessity, and that's challenging banks to not only harness the information they have, but make decisions on what to discard.
Advice is available from groups such as the Open Data Center Alliance (ODCA), which has published a guide to making data-based decisions. The guide includes definitions of Big Data, use cases and steps in planning a Big Data strategy, including when to use Big Data, solution sources, and staffing considerations. ODCA suggests examining the enterprise's needs, identifying use cases and testing them.
"When I look over the [ODCA] guide, it's on the right track, but it will take time and energy to get it fully in place," says Rodney Nelsestuen, senior research director at CEB Towergroup.
"I think the big issue with Big Data is you don't try to boil the ocean. I talk with IT people who look at 'big data' as the next big thing, to be able to provide and capture all data. But how do you classify this data, the internal use versus external use?" says Nelsestuen.
Large banks are moving forward with projects that leverage not only broader data sets from traditional sources such as transactions and payments records, but also emerging alternative sources such as social networking sites. BNY Mellon (BK) and State Street (STT) are both in the midst of projects that will leverage new data sourcing to fundamentally change how the banks gather, analyze and deliver information to inform customer-facing and internal functions.
As these initiatives move down market to smaller banks out of competitive necessity, there's a greater challenge to recognize what data is actually valuable and what is unnecessary or overkill.
Some estimates suggest that 90 percent of all existing data has been produced over the past two years. Nelsestuen says managing that vast amount of data will require added technology expense in gathering, storage and analysis. "It's going to take a great deal of resources to provide a Big Data solution that will be proprietary …including [technology such as] search engines with in-memory analytics and a data collection and crunching capability. All of these technologies are necessary to handle data of that magnitude," Nelsestuen says.
The crunching of unstructured data and structured data has its cost. "You need a big investment in enabling technology, you can use technology to take the time to do Monte Carlo simulation from 24 hours to 24 minutes, for example, but that's not free," Nelsestuen says. (A Monte Carlo simulation is a mathematical technique used by risk managers, financial IT executives, project managers and other professionals to determine varied outcomes for a specific course of action. It was first used on the atomic bomb project and is a staple of risk management.)
Michael Chui, a principal at the McKinsey Global Institute, says new data projects pose added challenges in security, organization and compliance. Part of the problem is the basic organization, since banks will be using data that they are not originally compiling, which adds to scrubbing needs. "There are technical challenges. Unstructured data doesn't fit nicely into rows and columns," he says.
There will also be legal and added risk expense, as the use of new data is reconciled with compliance and security protocols. "As the sourcing of data increases, it will increase the temptation of bad actors to access that data in a way that's unauthorized," Chui says.
Some of the early bank projects include prioritization techniques that provide an example. BNY Mellon's project includes an initiative to centralize and automate data retention schedules for paper and electronic delivery, including structured and unstructured data that can be subject to legal discovery.
The bank is unifying processes in legal, risk, and IT and is providing a framework to link job duties and data across departments. It's also creating a global standard taxonomy that identifies the business value of all information. The idea is to determine what data needs to be stored and what data is no longer necessary.