How M&T Bank ensures data quality as it implements gen AI

M&T Bank
Bloomberg
  • Key Insight: Strict data lineage is now central to bank generative AI strategies.
  • What's at Stake: Operational, compliance and reputational risks could translate into lawsuits and financial losses.
  • Forward Look: Expect tighter governance and integrations between lineage platforms and LLM providers.

    Source: Bullets generated by AI with editorial review

As more banks deploy generative AI, they're paying more attention to the data being used to feed those models, to make sure it's accurate, relevant and comes from a trusted source. This calls for data lineage: maintaining a detailed record of the life cycle of data and showing its entire journey from its original source to its final destination, including any changes made to it along the way.
"Data and AI come very tightly coupled, because it's quite hard often for AI deployment to be successful without the trusted data that you need for it to be successful," Andrew Foster, chief data officer at M&T Bank in Buffalo, told American Banker. Like some other data chiefs in the industry, Foster's remit includes defining and executing both an AI strategy and a data strategy for the bank.

Without proper data lineage and data governance inside a company, much can go wrong with generative AI. An Air Canada incident last year is a case in point: A customer who wanted to fly to his grandmother's funeral was assured by the airline's gen AI-based chatbot that he could apply for a bereavement discount up to 90 days after buying his ticket.

This turned out not to be so, and the airline refused to give him the discount. A civil resolution tribunal decided that Air Canada was responsible for all the information on its website, whether it came from a static page or a bot, and it ordered the airline to give the customer the discount and pay fees.

For banks, there are compliance, operations and reputation risks of failing at or overlooking data lineage, according to John Ratzan, senior managing director at Accenture.

"The worst that can happen is that it could lead to lawsuits, diminished brand reputation and a negative impact on company financials," Ratzan told American Banker.

M&T's gen AI journey

Like other banks, when large language models first came out, M&T blocked them from its network, so that employees couldn't upload sensitive company information into a public-facing bot. "It was back to that, who are we? Safety first, trust of our customers," Foster said.

Foster's attitude softened a couple of years ago, and he started vetting large language model providers, looking for "a stable, strong partner that is used to delivering complex technology into other complex institutions."

He chose Microsoft Copilot. Today, 16,000 of the bank's 22,000 employees use the gen AI model for first drafts of emails and reports, and to summarize call center conversations.

"For anything involving capturing and using and interrogating text, it's a starting point," Foster said. Generative AI can also interrogate SQL databases, he noted. M&T's software developers use GitLab to help generate code.

In most such use cases, "gen AI gets you 60% of the way, then a human reviews it and takes it the other 40%," Foster said.

The benefit is an "uplift in human efficiency, which is obviously useful," Foster said. "It makes everyone's work better, faster, stronger." Having generative AI summarize calls, for instance, saves about six minutes per call.

Employees quickly grow fond of the tools, according to Foster. At one point, M&T ran a pilot with 800 people, then got pushback when it considered shutting down the gen AI model. "People say, 'it's transcendent, I can't go back to the way things were,'" Foster said.

But he also noted one challenge of large language models: the problem of having multiple right answers.

"If you ask Copilot, help me craft an email or help me craft a press release, you could get three different versions, and each of them is right for its own version of rightness," he said. "So we've put human decision-making, critical thinking, at the center of AI adoption. You're not deferring your own judgment to the machine through the adoption of Copilot. It's giving you more tools to be effective, but the human being retains that accountability."

Building data lineage

When Foster arrived at M&T in March 2023, after 12 years in a similar job at Deutsche Bank, he started a data academy providing in-person and remote training on data governance. So far, 2,000 people have gone through the training. And he began a data lineage initiative.

"This wasn't in response to gen AI," Foster said. "I saw it as a core capability: Do we know where our data comes from and how we use it, how do we bring it to a level where we can interrogate it, how all the data goes from point A to point B?"

His team created a repository called Edison that contains authoritative documents and data on all bank policies.

The bank deployed data lineage software from Solidatus and from Monte Carlo. The Solidatus software speeds up the production of data lineage, Foster said. It also provides a single repository for the bank's data, which enables interrogation and analysis that before would not have been possible. It's helping to make M&T's data AI-ready.

Solidatus integrates with databases and applications, and it retrieves metadata and lineage from within them, explained Tina Chace, vice president of product at Solidatus.

"When we read an Oracle database, we look at things like the schema, the tables, the structure, but in order to generate the lineage, we also look at the stored procedures within a database," Chase told American Banker. "We have a tool that reads the stored procedures and then understands how the data flows throughout the database."

Solidatus also works with business intelligence tools like Microsoft Power BI and Tableau. "When we look at Power BI or Tableau, we look at the data models, the physical data sources and the logical data sources and reports that are captured within that business intelligence tool, and we're able to pull that in," Chase said.

The most challenging technology for the data lineage software to work with is mainframe applications, she said. "We have an integration that sits near the mainframe applications and pulls out languages like COBOL and interprets them to be able to capture the truthful representation of how data flows within that technology." Solidatus can import semi-structured information as well, such as CSV, XML and JSON files.

"We have transparency over lineage, quality and governance," M&T's Foster said. "If we think of individual data elements, we know where they came from, we know what they mean. This is important because it helps us have well-governed, understood data available for use, and one of those use cases is Copilot."

Though foundational large language models like Microsoft Copilot, Google Gemini, OpenAI's ChatGPT and Anthropic's Claude have been trained on everything on the Internet, which has raised questions about data lineage and potential copyright violations for those companies, those issues are not relevant to M&T, Foster said.

The bank uses a process known as retrieval-augmented generation to limit the data the generative AI models are trained on to internal, governed data.

Foster would like to see Solidatus work with Microsoft, so that the data lineage that starts in business units persists through Copilot. A Solidatus representative said the company does have an API that could be used for this purpose. It's also developing a model context protocol server that would make such integrations easier.

"We fully expect more value will come from the future integration," Foster said.

Foster acknowledges there are risks to having bank employees increasingly rely on generative AI.

"You need to embrace adoption because you become more efficient. If you don't, you fall behind peers who are using it," he said. "But you need to have responsible usage and retain accountability."

All these efforts are in line with Ratzan's best-practice recommendations for clients.

He said there are technical approaches to enhance data lineage, such as automating metadata capture and traceability of data going into and emerging out of gen AI models. Clear policies and training to reinforce those policies are very important, he said.

"Data governance broadly is the key for data lineage," he said. "Clear ownership of the data and accountability for who manages the data, the source and the consumption are paramount."

For reprint and licensing requests for this article, click here.
Artificial intelligence M&T Bank Technology
MORE FROM AMERICAN BANKER