How banks can avoid the dangers of AI-generated 'workslop'

AI-generated image of a cat wearing sunglasses
An AI-generated image of a cat wearing sunglasses.
Adobe Stock: innluga

  • Key insight: Generative AI shifts effort downstream, creating hidden productivity costs.
  • What's at stake: Missed controls can cost banks time, reputation and millions annually.
  • Supporting data: Forty percent of U.S. employees reported receiving AI-generated "workslop" in the past month.

    Source: Bullets generated by AI with editorial review

AI slop – content produced by generative AI models that looks good at first glance but lacks substance – is encroaching into emails, reports and presentations at banks and other companies that deploy the technology.
According to a study written up in Harvard Business Review, this is reducing the very productivity gains generative AI is supposed to achieve, by creating extra work for recipients.

"As AI tools become more accessible, workers are increasingly able to quickly produce polished output: well-formatted slides, long, structured reports, seemingly articulate summaries of academic papers by non-experts, and usable code," the study's authors wrote. "But while some employees are using this ability to polish good work, others use it to create content that is actually unhelpful, incomplete, or missing crucial context about the project at hand. The insidious effect of workslop is that it shifts the burden of the work downstream, requiring the receiver to interpret, correct or redo the work. In other words, it transfers the effort from creator to receiver."

According to the ongoing study, which is being conducted at Stanford University, among 1,150 U.S.-based full-time employees across industries, mostly in professional services including financial services, 40% report having received AI generated slop in the last month. Those employees estimate that an average of 15.4% of the content they receive at work qualifies as slop. About 50% of the time this is shared among peers, 18% of the time it's sent to managers by direct reports and 16% of the time it's sent from managers or executives to their teams. 

This lost productivity can add up. According to the Stanford study, employees reported spending an average of one hour and 56 minutes dealing with each instance of workslop.

"Based on participants' estimates of time spent, as well as on their self-reported salary, these workslop incidents carry an invisible tax of $186 per month," the study authors wrote. "For an organization of 10,000 workers, given the estimated prevalence of workslop (41%), this yields over $9 million per year in lost productivity."

Humans can be sloppy and generate junk, too. But generative AI models can do this at a speed and scale that human reviewers may be at a loss to keep up with.

To be sure, there's still a lot of optimism out there about AI's ability to make work more efficient. In a Gallup survey of 3,128 American adults released Thursday, 62% said AI will increase productivity in the workplace.

AI slop in banking

One Stanford survey respondent in financial services offered an example of AI-generated slop: "It created a situation where I had to decide whether I would rewrite it myself, make him rewrite it, or just call it good enough."

The worry that AI will produce slop is "certainly one of the risks that you need to control for," Zach Wasserman, chief financial officer at Huntington Bank, said in a recent interview. 

Where outputs of a model need to be relied on exactly, "you need to have lots of control," he said. "In other cases, it's generating a recommendation that somebody would then look at."

For instance, in software development, "the way the tools are working now often is to recommend the next tranche of code that somebody would potentially consider for the work that they're doing, and then they can look at it and see what parts seem right and what parts they don't need," Wasserman said. "So it depends on how you're using the tool, but it's definitely one of the big risks that has to be managed. It comes down to the nature of the use case in terms of how to control that risk."

Bank of America has an AI governance process that tests all AI models on 16 different parameters, including bias, transparency, error, hallucinations, reproducibility and predictability, according to Hari Gopalkrishnan, the bank's chief technology and information officer. 

"When it comes to generative AI, obviously hallucination, error and getting it wrong are big concerns," he said in an earlier interview. "There's also concern that continues to hover around intellectual property and making sure that we understand who owns the rights" to information used by a model. 

In using the foundational large language models, "we make sure we know what the ins and outs look like," Gopalkrishnan said. "How do we test it? How do we put the right guardrails around it to make sure it isn't going rogue?"

In her work with banks, Alenka Grelish, principal analyst at Celent, says she has not seen much AI slop.

"I've been honed in on use cases where the risk of slop is lower, because it's very specific where you can train employees, test, have an employee in the loop and use retrieval augmented generation," Grealish told American Banker. "A prime example is call center or small-business banker queries."

The banks she works with tend to partner with foundation model providers that continually enhance their copilots and virtual assistants, she said.

"When you do these incrementals, it just gets better, because you've done the appropriate training," she said. "And something I've underscored is you've got to have employees in the loop. You have to give them guidance, demonstrations and a feedback loop when things go south. If there's slop, you want to know about it ASAP, not in some monthly survey."

But Grealish acknowledged that where companies unleash generative AI across the workforce, "everybody's going to start experimenting, and there could be wasted time." She also expects workers to figure out how to correct this. 

"They're highly motivated to be productive, and whatever cycle time can be shortened, they will do it whether it's sanctioned or not," she said.

Mike Gualtieri, vice president and principal analyst at Forrester Research, who also works with financial services firms, has not had any clients mention that AI workslop is a problem. 

"But it makes perfect sense that while one employee is using AI to save time, the person on the receiving end has to spend more time" correcting errors, he said. 

How to avoid AI workslop

There are several ways banks can make sure their AI models produce useful content rather than slop.

Conduct pilots. Grealish recommends companies use pilot groups to test use cases and determine whether or not there are real productivity gains. "It's easy to track and it's very easy to do control groups," Grealish said. 

For instance, if a bank is thinking about using an AI assistant for small-business onboarding or for gathering documents for loan underwriting, it should test to see what kind of productivity gains are realized by using that assistant compared to an automated workflow without generative AI. 

Teach employees how to prompt. Proper prompting of generative AI models is critical, in Gualtieri's view.

"The output of AI is going to be only as good as the user's prompt and the system prompt," he said. "The system prompt is what's happening behind the scenes, it's what gets added to the prompt the user is using. That's going to get better." If a generative AI model is providing shoddy responses to a prompt, that prompt can be rewritten and fine-tuned to produce more accurate results, Gualtieri said.

This means workslop is not an inherent flaw of generative AI, but a sign of a lack of generative AI skills among workers, he said. 

Use retrieval augmented generation, or RAG. RAG can help reduce or eliminate shoddy work, Gualtieri said. Then the model will only look for answers in a specific dataset it's given. "That's the primary technique that enterprises use," he said.

Consider using smaller or focused models. For some use cases, such as fraud investigations, a model that's trained on a focused set of fraud data will be far more useful to investigators than a general-purpose copilot trained on all of the internet. 

For reprint and licensing requests for this article, click here.
Artificial intelligence Technology
MORE FROM AMERICAN BANKER