Ripple effects of New York Times' suit of OpenAI, Microsoft

By Penny Crosman January 04, 2024, 2:55 p.m. EST 7 Min Read

There is a common tendency to be cavalier about copyright protection, and to use or share protected material such as newspaper articles without permission because companies so rarely get caught or penalized for doing so. But if The New York Times prevails in its copyright infringement lawsuit against Microsoft and OpenAI for using its articles to train ChatGPT, this could change.

At a minimum, companies that use generative AI will need to be more mindful of copyright law. They will need to watch the articles, research and data used to train the large language models they use, even if the training is done by a third party, and know whether any of that content is protected by copyright law and therefore off limits. This is especially true for banks that plan to open up a large language model to customers in the form of a chatbot, the way OpenAI has done with ChatGPT.

U.S. copyright law gives the owner of a copyright — the creator of a piece of content or that creator's employer — the right to control the copying of that work and control the modification or adaptation of the work into new works.

Even if a company only uses a large language model internally, if that model is trained on copyrighted material that the bank did not have a license to use or permission to train the model on, it likely is violating copyright law, according to Nelson Rosario, founder and partner, Rosario Tech Law. However, he noted, it would be hard for an outsider to uncover that infringement.

The New York Times' lawsuit centers mainly on the publicly available ChatGPT, which OpenAI and Microsoft both have a hand in training, according to the complaint. The newspaper said Microsoft's and OpenAI's generative artificial intelligence tools rely on large language models "that were built by copying and using millions of The Times's copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more." The New York Times also said that although Microsoft and OpenAI copied from many sources, "they gave Times content particular emphasis when building their LLMs — revealing a preference that recognizes the value of those works."

Further, the paper said that through Microsoft's Bing Chat and OpenAI's ChatGPT, Microsoft and OpenAI "seek to free-ride on The Times's massive investment in its journalism by using it to build substitutive products without permission or payment."

"We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models," an OpenAI spokeswoman said in a statement. "Our ongoing conversations with the New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development. We're hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers." Microsoft did not respond to a request for an interview by deadline.

The impact on banks

Generally speaking, banks are not developers of generative AI models the way Microsoft and OpenAI are — they are simply users. As such, they are far less likely than tech companies to get sued for copyright infringement.

To date, there have not been any U.S. lawsuits targeting users of generative AI models, noted Cleary Gottlieb intellectual property partner Angela Dunning. Lawsuits against developers of AI models like OpenAI and Microsoft "will likely ultimately turn on the question of whether training a foundational generative AI model on copyrighted content constitutes fair use," Dunning said.

But banks are using large language models for an increasingly broad range of uses. It is only a matter of time before they provide chatbots that give access to some generative AI models to customers. At that point, any use of copyrighted material to train their models could be discovered and called out.

"The risk of intellectual property and copyright infringement is a significant concern for banks, especially when using pre-trained AI models from third parties or data purchased from vendors," said Ryan Favro, managing principal at Capco.

Mitigating the risk

Large language models like OpenAI's GPT are fed vast quantities of documents, data and websites (hence the word "large"). They start to perceive patterns between words and "learn" to predict what the outputs to prompts should be.

Banks that use large language models that were trained by a vendor may get indemnification protection from that vendor in the event they are sued. In November, Microsoft announced that clients licensing Azure OpenAI Service, the company's managed service that adds governance layers on top of OpenAI models, can expect to be defended and compensated by Microsoft for any adverse judgements if they're sued for copyright infringement while using Azure OpenAI Service or the outputs it generates. Microsoft did not respond to questions about its indemnification policy by deadline, but pointed to the announcement as well as a September announcement that users of its Copilot products will be protected. "If you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved," the statement said.

However, even where a software vendor provides protection, banks must still be aware of how these models were trained, Favro said.

"The legal landscape, particularly around what constitutes fair use in AI training, is still evolving," Favro said.

A bank seeking to train a model on data purchased from a vendor should take care to look for appropriate representations and warranties as to the source of and rights to the data, as well as a broad indemnification provision should those representations prove unsound and lead to litigation, Dunning said.

"All companies choosing to develop AI models should set strict parameters around how and on what those models may be trained, instruct and monitor employees to ensure compliance and maintain careful records to document the ethical sourcing, composition, filtering and use of training data," she said.

Banks can further lessen risk by carefully vetting the models they choose and ensuring that their intended uses align with the safety, ethics and privacy interests of their customers, Favro said.

"The focus should be not only on what outputs are generated, but how those outputs are ultimately used and stored," he said.

Most banks do not train their own generative AI models due to the high costs involved, Favro said. Many banks use a method called Retrieval Augmented Generation where the AI system dynamically incorporates a bank's own, proprietary data during its operation, he said.

"Think of it as the AI having real-time access to a bank's unique database to answer questions or solve problems, ensuring that the responses are grounded in the bank's specific knowledge and context," Favro said. "This approach is particularly advantageous for banks because it significantly reduces the risk of IP conflicts. By primarily relying on their internal data, banks avoid the complexities and legal uncertainties associated with using external, potentially copyrighted material. It's a safer route, both in terms of compliance and ethical AI use."

AML

Rushed anti-money-laundering calls backfire. Can AI help?

Bank customers' complaints of sudden account closures track a rise in automated anti-money-laundering decisions and possibly outdated AML rules.

By Penny Crosman

December 4

As banks gradually move toward training their own models, the focus on using internally sourced data will become even more critical, he said.

"They'll need to exercise rigorous scrutiny over the datasets incorporated, ensuring they align with regulatory standards and avoid any IP infringements," Favro said. "This internal vigilance is essential not only for legal compliance but also for maintaining the integrity and trustworthiness of the AI systems they deploy."

Favro also noted that banks need to educate employees about the complexities and risks associated with generative AI and intellectual property, and trace and audit generative AI usage.

"Banks should be able to track and document their AI interactions end to end," Favro said.

A cautious approach would also involve steering clear of use cases that might provoke IP owners' grievances, Favro said. For instance, using generative AI to generate code is risky because there is a danger of adopting code that is someone else's intellectual property. Even where a bank pays licensing fees for software, there can be issues around attribution and contribution, he said. But using the technology to conduct AI-assisted code reviews and test existing code can offer cost savings without overstepping into contentious IP areas, he said.

Banks also have to think about the potential for misuse of generative AI beyond copyright infringement.

"Banks should take care to ensure that models are not trained on documents or data constituting trade secret material or data covered by privacy rights of third parties who have not consented," Dunning said. "Generative AI should also not be used for purposes of making credit, employment, housing, insurance, legal, medical or other important decisions about any individual."

Penny Crosman

Executive Editor, Technology at Arizent, American Banker

	About Penny
twitter	pennycrosman
mailto	penny.crosman@arizent.com
linkedin	pennycrosman