Banking With Bots: The Benefits and Challenges of the Agentic AI Labor Force in Financial Services

June 3, 2025 9:12 AM

38:07

The AI revolution continues. Already a new breed of artificial intelligence has transformed chat, search, image creation and much more. And now agentic AI — systems capable of independent adaptive problem-solving and decision-making — are coming to financial services. Banks, fintechs, crypto providers and more are experimenting with building digital workforces that can increase efficiency and productivity, and even add some creativity to financial services. The industry is placing strong bets on the technology. As of 2024, the market value of agentic AI was $5.1 billion, and Capgemini projects its market value will exceed $47 billion by 2030. Hear from leading experts on how to build, where to deploy and what challenges might arise from utilizing the bots.

Transcription:

Bailey Reutzel (00:16):

All right. Hello, everyone. Who here has been on a call with an AI agent? Raise your hand. Good. Keep that hand up. Keep it up. Who here has cussed out an AI agent? Yes. Right. Maybe. No. Poor Wally. Poor Wally. I joke, but we are also here to talk about how AI agents can be really transformational in financial services. We've had some stumbling blocks, certainly, but I think we're getting there. We're going to start by focusing this conversation with a couple definitions. So sorry, stay put. This won't take that long, but I think it's helpful to do the definitions because so many of these terms get used synonymously. So those AI chatbots that you are talking to and using cuss words for, those are generally called chatbots. And you guys check me on this, too.

(01:20):

Those are specifically created for one job. They follow a script. It's sort of like press one for X, press two for Y, press three for Z. When you say something that is not X, Y, Z, they just repeat that thing. That's like the Dantes Inferno of hell AI agents. And then we have generative AI. With generative AI, we get to AI agents, so those have some more autonomy, right? They are looking through huge data sets to find the answer for you. So they're not just working on one single script, they're understanding, they're responding to your specific human needs. And specifically with these AI agents, they are going to be able to act on your needs for you. So we'll get into a little more of that with these two lovely panelists. Let me introduce you. This is Rajesh Iyer. He is the VP and Global Head of Machine Learning and Gen AI at Capgemini. And then over there we have Nick Morales. He is the Head of Customer Experience at Cohere. So check me there. Should we expand those definitions? Is there anything I missed with those definitions? Rajesh, I'll have you go first.

Rajesh Iyer (02:31):

No, I think that's good. I mean, I do a lot of, I would say that most of my background is actually on the business side. The last six years I've really gotten on the engineering side, got involved on the engineering side, mostly because a lot of the platforms to do data and to actually build some of these solutions aren't ready. So I got a chance to grow up with that stuff. So I've gotten a little bit on the engineering side now.

Bailey Reutzel (02:52):

Okay. What about you, Nick?

Nicolas Morales (02:53):

Yeah, what I would add is actually add a question to the beginning that you started the audience with: How many folks have actually cussed out a human thinking it was a chatbot? I lead our support team and we've actually had some nastygrams saying, "Hey, I want to talk to a human, not a chatbot," and our support engineers said, "I'm actually a person." That's how good these chatbots have gotten. But the spectrum of agents and their ability and capability to be autonomous, and we'll get into definitions around autonomous agents versus more simple agents, it's a wide spectrum. And the use case from giving them the ability to take a transaction all the way through, to inform you after fraud is detected or keep a human in the loop, is the capabilities that we're now seeing.

Bailey Reutzel (03:47):

Okay. So I think the most exciting next question, going to start out strong here, is what are these AI agents doing? We can talk about what you see in the future, what you would like them to do, or also what you're seeing your customers actually use them for. I would like both. So Nick, we'll start with you.

Nicolas Morales (04:05):

So you can think about them from a departmental perspective where you're seeing now software engineering teams who are translating old legacy code into newer code in order to develop faster. These use cases are not specific to financial services industries but very much happening every day in your organizations. Then there's line of business use cases. So if you take wealth management, for example, the amount of time that a wealth advisor would spend prepping for a client meeting, pulling natural language or unstructured data out of the different systems in order to prep for their meeting, in order to advise them with the latest information from the market, from organizations they've invested on, this was an extremely long process that was very much retrieve information and give that information. All of that has then been reduced down to minutes, and this empowers the wealth manager in this case to spend time with the client still doing what humans do best: provide their perspective, be strategic, and then use AI as the thought partner, not the advisor itself. So that's just one example.

Bailey Reutzel (05:14):

Yeah. For that specific example, I'm wondering, so there is some fact-checking that the human does once AI pulls that data. I mean, I don't know with you because you're doing customizable AI deployments, so maybe that's not as big of a problem, but certainly with Gemini at the top of your Google or ChatGPT, you can surface a lot of data, but you do have to sort of go back and check and maybe that adds to sort of the manual time that is spent, as well.

Nicolas Morales (05:44):

Yeah, absolutely. And hallucinations was a big headline word as AI has become more prominent across all industries, and that's still very much something that we all have to look out for as we take more use cases and empower them by AI. Now, what we're doing to be able to protect us against data that was hallucinated or just factually inaccurate is to ground that data, to ground that data in your sources and then require AI to provide citations of where this information was pulled from. How did the model reason to come up with the recommendation? And ultimately, still keep a human in the loop. AI, and I'm sure we're going to get into this topic, is not there to take the role or take the job of the human. It's there to serve as the thought partner, the collaborator, the assistant to the human.

Bailey Reutzel (06:40):

Yeah. Okay. Definitely getting into AI explainability stuff later. Rajesh, I want to give it to you, though. I want you to talk about some of the use cases you've seen with your clients and your customers.

Rajesh Iyer (06:50):

Sure. I can just pick up where he was talking about the software development lifecycle, but I think I've actually seen things happen in three waves. First was basically software engineers doing Gen AI-based coding and so on and so forth, sort of extending beyond coding to testing and across the SDLC. Software engineers get really excited about stuff. What we see with a lot of the stuff that's happening in the software engineering world is that the coding art has gotten better, but I don't know if the enterprises are shipping products faster and so on and so forth. That's really still a problem. So that's the first wave. Lots of good things are happening in that space, not giving up on that at all. So that's wave number one. Wave number two is actually what we saw happen in the contact center. Still not getting to AgTech, but basically saying, can we use LLMs to stitch together data?

(07:40):

And you don't even have to actually have it respond or chat back, but just can you use LLM to get the different sources of data, stitch together the context? Second is once I have the context, can I use it for self-service? If you fail, can I actually provide agent assist? Or next step is can I do auto-escalation? Next step is can I do after-call work? In fact, there's a good example that we have, we actually created this, forget what it's called, blueprint. I'm getting old so I can't remember. So this is actually called a blueprint where we've actually stitched all that together to actually form different aspects of the solution, but it's still not completely agentic. And then the last wave that we're seeing is actually the more agentic where we're basically saying it's no longer about a use case, it's about a process. We're after a process right now. Can we actually use the LLMs for actually getting stuff, reading stuff from one step? It could be a data step or some other process step, and then actually do some processing with that. That could be one step of intelligence or multiple steps of intelligence, but now you're actually having tools plus multiple LLMs all being coordinated again by LLMs to actually take over an entire process. So that's the last wave, and I think that's super exciting. A lot of what we'll talk about today is that last wave.

Bailey Reutzel (08:56):

Yeah. I heard similar narratives when we were all talking about big data. And now when you think about the amount of data that AI can take in, we're talking about humongous data. But some of that data, if it's locked in silos, not very useful. I think you're kind of talking about this where you just have different departments within your enterprise, and if they're not all onboard here, it sort of gets stuck. I don't know if you can expand on some of this. How do we de-silo these huge enterprises who maybe have some incentives for those silos?

Rajesh Iyer (09:34):

And I think that's exactly what I was talking about in the first part. You have so many sources of structured data, and honestly there's so many sources of the same structured data, which is actually quite complicated. If you ask the question of what color is the sky, you might actually get blue and azure and all these kinds of different things that we need to actually figure out. So there is a way that you can actually get all this stuff together and use LLMs themselves to actually make sense of all this, to have a unified view of what's happening. But it also requires a pretty big uplift of the engineering stuff because what used to happen is it used to have data product, then an AI. Now you have AI before you get to data product, then you got AI.

(10:14):

So it's a completely different set of technologies, whether it's actually where you store your information, how you move information, all that changes because now you can actually use GPUs to do all that kind of stuff. Getting into the hardware stuff, not that interesting to all you guys, but all I'm saying is there's a significant uplift that's required to actually stitch together batch and stream, structured and unstructured data to provide a unified view. And this technology actually exists. So we're actually working within Capgemini on this idea called the AI Data Factory that actually does that and moves all the, so the way that we actually have been doing processing in the past is actually JVM and CPU bound, and everything that we do in processing is actually based on that technology. But now that's actually completely transformed, and you've got to use all that stuff to move things very fast and actually stitch together this context very fast.

Bailey Reutzel (11:03):

Yeah. Nick, I'm going to pass it to you just because you, I think, correct me if I'm wrong, Cohere sort of started as one of these big foundational generative AI companies and then decided that they were going to pivot. I don't know, is that the right word here? Sort of transform the business into customizable enterprise deployments. So how does, maybe this helps quite a bit to focus in on one enterprise's use cases or data sets.

Nicolas Morales (11:29):

Yeah, and so Cohere builds state-of-the-art large language models. These are generative models. They are retrieval models to improve search and that ability to do the table capabilities. You still see with ChatGPT and other consumer applications. We still build those models from the ground up, and we're one of the few vendors that builds these proprietary models that are focused specifically for secure enterprise use cases. Now, what we've really evolved over the last in 2024 was to work closer with enterprises to customize their deployments. Now, customization can mean a lot of different things, but what we were seeing is that enterprises were looking for that advantage and the competitive advantage as they deployed AI. Because if everyone is using the same model, then ultimately everyone is using the same technology, and your advantage over your competitors isn't going to be there. So we started working closer with financial services organizations to customize the model to be extremely good with capital markets use cases, wealth management use cases.

(12:38):

So we were looking at specific lines of businesses and saying, can we fine-tune the model to provide better responses, to be able to retrieve information from waterfall diagrams where you had text overlaid charts, now looking at multimodal capabilities? But we also worked with organizations who said, "I have a multilingual global business, so we really want to make the Cohere model the best in Japanese or the best in Arabic." And being the model builder, we can then change the shape of the model and partner with our clients to ensure that is not just best in class but gives these organizations that leg up against other competitive companies.

Bailey Reutzel (13:24):

Yeah, I want to talk a little bit about how you get everyone in an enterprise on board. I think the worry that AI will eliminate some jobs is certainly not unfounded. I have personally seen it in media, and we saw quite a backlash in media with AI sort of spitting out, again, hallucinations. So I think it's really important, everybody's talking about human in the loop, but are we talking to the whole of the enterprise? So in a bank, maybe down to the tellers, all the way up to the CEO. Yeah. How are you all thinking about that, Nick?

Nicolas Morales (14:14):

Yeah, so I think about it from an organizational level and then at a person level. And for an organization to get started, you do want to be strategic and you want to think security and privacy first. So how do you develop a plan to launch a use case that has real business value, find the right technology partners, and develop a plan to test that use case, provide guardrails and governance, but ultimately ensure that that use case is going to solve a business problem for you? Otherwise, you do get stuck in this pilot prison or in this stage of POC that actually never gets implemented. So Cohere, for example, will work with organizations to ensure that that first use case has the right technology, the right governance model. It's deployed, for example, in an air-gapped environment to ensure that your prompts or your data does not leak.

(15:11):

So there are things that organizations can do to go from that first use case into production so you get real value. For the person, there's also a journey that they will need to go through as AI becomes even more prominent. So it starts with learning how to prompt and get the most out of what you're asking from AI. Now, I'm sure many of us have now interacted with a consumer application, and the more details you give it, you probably are getting a more detailed image back or more detailed travel plans or restaurant recommendations. But you think about it for the enterprise, the more data you give the model, the more you're probably risking around client information or your own information. And that's why having a secure deployment becomes so critical to that organization so that you can get the most value back when you're using the most relevant data. So the individual will need to go through the journey of learning how to prompt, learning how to use AI as a thought partner. Email summarization, great. The more you can QA with a chatbot, great. But AI's power is really there to help you do your job better.

Bailey Reutzel (16:28):

Is there an explainer on how best to prompt an AI that you guys have found?

Nicolas Morales (16:34):

Yeah, there's several different publicly available prompt 101 courses that I would recommend folks to get started with. You can go to Cohere and look at our docs page where you can find recommendations on how to prompt. But most of the best prompts, you're going to want to start by giving it context. So what is the situation around that particular topic that you are looking for information from, or you're looking for AI to ask you a question about to get more? You're going to want to define the role. So who is the persona that this context is about? And then you're going to want to then give it tasks or activities of what you're trying to solve for. So again, it goes back to, especially for work or corporate environments, the more you can give the AI, the more accurate responses it can give you back. But then on the other side of the coin, the more you give it, the more it knows. And then you've got to really think about security and governance.

Bailey Reutzel (17:35):

The more it knows, the more it takes over, and all of a sudden it's ruling the world. Rajesh, any tips there?

Rajesh Iyer (17:43):

I think so there's definitely some of the things that you were talking about. On top of that, some of the things that we've realized from having done about a couple hundred of these use cases across different domains is that some large ideas to think about that lets people remember that they need to do these different steps. One is, for example, remembering the role that you're talking about. The way that the role thing works is that when you tell the prompt or this part of the prompt and you say what the role is, it actually opens a pinhole into a part of the model. So, "I want you to actually play the role of a teller" actually opens just that pinhole and it stops the process from getting confused when it's actually answering questions. And if you give that kind of instruction, people remember that as opposed to actually just having a checklist or something like that.

(18:30):

So that's one idea that we have. The other thing that we also say is exactly what he was saying: Make the prompts long and detailed and try to leave the details at the beginning or the end of the prompt because that does have a curious ability to actually miss things that are actually in the middle. It's called the lost in the middle problem. And some of the other things that we also try to do is systematically intercept malformed prompts. So we basically look at prompts that are actually in our prompt library and we'll intercept things that look like what we've seen before, but that's more detailed, and we'll actually reformat the prompt and send it back to the user and say, "Hey, this is what we understood you to say. Does it make sense?" And the last one is we always say, have a conversation. You have to engage Gen AI.

(19:14):

Gen AI is not a search platform. So you basically get people to understand that the first answer is almost never good enough. So the more you start conversing with that, the better you are. And I've heard Sam from OpenAI say, "Don't yell at the LLM." I do it all the time, by the way. Poor Wally, are you kidding me? And it can actually understand, it comes back and actually gives you better stuff. But I think those are the four very broad. So three of them are broad ideas for people to remember, and the fourth one is do something that actually makes this easier for the folks. If there's a library of prompts that you have from the past where you've gotten a lot of thumbs up, why don't you actually use those thumbs up ones instead of this guy that had his kid being sick at home or something like that and didn't actually bother to give a whole lot of detail. And I think maybe that's where we should start, and I didn't say it first, but I think we basically say, how do I automate this stuff and make it easy for people? I think we want to create that Apple experience or, I don't know, hopefully it does not a lot of applicators, but you want to create that easy for the user experience and then talk about all the other broad things that people should think about. What I would say is don't use those checklists because they don't work very well.

Bailey Reutzel (20:29):

Yeah, that's interesting.

Nicolas Morales (20:30):

And I know he's half joking about the AI or the LLM yelling at you, but I think there's a real capability around using the AI to give you feedback or asking it to interview you around any particular topic. And that could unlock then for you how to better prompt or ask it for information. I routinely will create a project requirement document or a strategic plan and ask AI to play the role of one of our leaders and say, or a customer and say, "Find all the holes that are in this strategy document, and what questions would you ask me?" And then that then unlocks our ability to produce even a better output. So whether you want it to yell at you or make a joke at you or give you feedback, it will actually probably provide feedback in ways that a human wouldn't.

Rajesh Iyer (21:27):

No, I'm mad that he said it first, but absolutely a hundred percent.

Bailey Reutzel (21:30):

Yeah, I think we're starting to get into kink territory where your AIs are yelling at you, but we'll bypass that here. All right, I want to talk a little bit about failure. So there's definitely some questions around when an AI agent fails, who is responsible? So probably for all these folks out there, they're pretty interested in that. What are the thoughts there right now?

Rajesh Iyer (21:53):

I will start by saying it's probably a design flaw. If someone says hallucination is a big problem, it's actually the most manageable engineering problem. You send the same question twice and you can understand if you got the same answer, you might be in the right territory, or send it five times and see if you've got the right answer. And the other thing you can actually ask an LLM is, "Do you have the basis for answering the question?" It'll actually do a pretty good job of saying, "No, I don't have it," right? So there are some engineering solutions to this stuff. And if you're actually having all these, hallucination is the biggest hallucination about Gen AI, if you ask me. Not a popular concept, guys, no one else says that, but I've actually said that before. Hallucination is probably the most engineered out of problem having done about 150 of these things. I know that that's the easiest thing to fix about AI. It's all in the design. It's kind of like the architecture of the end to end. It's not the LLM that's the problem. It's about the architecture from end to end. That's actually the problem. So it's an engineering, I don't want to call it malpractice.

Bailey Reutzel (22:55):

The engineers are going to be held responsible.

Rajesh Iyer (22:56):

Yeah, I'm held responsible. But yeah, I think that we shouldn't, there needs to be significant due diligence that goes into it, number one. Number two is, this goes back to a question that you were asking about before. Don't build for the users, build with the users. Get some fraction of people from the user community that are going to work with you to actually get this to be better. I think it sort of helps the adoption and the change management. Let them be your advocates. Let them know the amount of work that's gone into it, and I think you'll have a much better chance actually avoiding some of these issues and errors and so on and so forth. And the last thing is you've got to have a continuous regimen of actually testing and making sure that there's no errors. If you have like a thousand people using your system, almost always have a thumbs up, thumbs down sign. You'll very quickly find some of these issues and be able to resolve them. I don't know if I answered your question right, but I'm just

Bailey Reutzel (23:57):

Yeah, certainly. Interesting. Yeah, I don't know, Nick, if you want to add to that, who you've seen as sort of the one to blame if an AI agent fails?

Nicolas Morales (24:05):

Yeah. It goes back to Rajesh's point around the design, especially the design of the deployment and your risk tolerance of the use case. There are use cases that will have critical client information that if it pulls up the wrong account, that is not something you can afford to make available or expose externally. And that will probably need to be deployed privately in an environment that has been fully tested before it ever makes it into production. But then there are other use cases where you have a chatbot that is providing help recommendations on how to open a new account or the difference between IRA and a 529 account that if it gets it wrong, it will just take an extra click to get to the right answer. And so being able to, for one, design the risk tolerance and the deployment strategy for the different use case helps mitigate against some of the failures that will happen, but ultimately the failure will come down through the different layers of everyone involved with the project, from the LLM builders to our partners in the organization.

Bailey Reutzel (25:15):

I was just thinking there was some recent case, I think it was maybe some Canadian airline where the chatbot had given the passenger a faulty refund description, and then they took them to court because the company was like, "No, that's not our refund policy," and the customer won. I can't remember the exact details, but yeah, I guess it would be interesting to read through those and see who they thought was at fault there. I mean, largely the company, right?

Nicolas Morales (25:46):

Yes. And mistakes will happen whether it's AI or human.

(25:49):

And,

Bailey Reutzel (25:49):

It's rare.

Nicolas Morales (25:49):

We've been refunded before things I probably shouldn't or more than I should or less, and there was yelling involved and got fixed. So it comes back to again, the risk tolerance, the use cases that we can afford to experiment on and make mistakes and provide your organizations with a sandbox where you can make these mistakes before you go into production.

Bailey Reutzel (26:17):

Okay. Gosh, it's happening so fast. We're getting down to the eight-minute mark. First of all, who is adopting this technology and who should, knowing that you're talking to a room full of bankers, financial services institutions, and payments players, my thought is how does a community bank or regional bank get involved in this stuff? Do they have more potential benefits than a larger institution, et cetera? So I know that's a lot. Sorry.

Nicolas Morales (26:43):

Yeah, the short answer is everyone. Everyone who wants a competitive advantage, everyone who wants to continue to innovate and provide a better customer experience through faster responses, better products, et cetera, should be. Now, from a banking perspective, a smaller, a commercial bank might want to get started with an application that already has AI embedded in there, whereas a large enterprise financial institution would probably buy the model and build and customize a project itself. So you can buy or get started with AI with an off-the-shelf product. We have a platform called North, which you essentially just plug into your proprietary data. You still maintain full control, so the data never ever leaves your environment. But then we have organizations who want to customize a model that's only unique that they embed into their products. I was walking through the expo yesterday and I saw so many amazing products, but so much opportunity to infuse these products with AI to provide, again, better search and retrieval capabilities, better generative, smarter insights, which historically took manual work to do, which can now be done in seconds because what LLMs are bringing that traditional automation don't do is the context on all this unstructured data that is available.

(28:10):

Your clients produce every day.

Bailey Reutzel (28:13):

Rajesh, what about you?

Rajesh Iyer (28:15):

Yeah, and I think everybody can benefit definitely, right? But I think that especially with the age agent systems, I think of it as the anti-siloer, the word that I just made up. But essentially the more siloed your stuff is, the more spread out your stuff is, the more disconnects you have between things, the higher the potential for using some of these new Gen AI technologies to bring that unified view to drive value. Because when you get to be a larger organization, you have scale, but then you have all these other complications that get in the way of the scale. Now you can actually scale even more and get your cost structures down and so on and so forth. Absolutely helps everybody, but I think we're just scratching the surface of what it might be able to do for a mega bank or something like that, in my opinion.

Bailey Reutzel (29:02):

Nice. All right, any questions from the audience? We have some runners with microphones. Would love to hear them just to know where we're at.

Audience Member 1 (29:15):

Hi. So my question is related to evaluation metrics. So as you're fine-tuning the model and you're trying to improve or reduce bias, improve completeness, accuracy, that sort of thing, is there a quantitative way to measure how good the output from the model is? Or do you have to rely on qualitative metrics? And if it is qualitative metrics, are you using humans to judge the output or are you using other LLMs to judge the output?

Rajesh Iyer (29:43):

I think it's all, my bias is I think you do all those different things, in my opinion. So for example, if you want to do summarization, it's actually the most good-looking and probably the worst performing Gen AI application there is because it's hard to tell how complete it is and how much it could be in conflict with actually the base text, but it always looks awesome. So you've got to get really clever with that and basically say, "I'm going to actually generate a bunch of question answers and see how those things do." And we actually have this paper called QAG, which is not our procedure, but that actually quantitatively measures the value of how accurate that is. But you have blue scores. Again, depending on exactly what the use case is, there's just a bunch of metrics that are available, but that really does not take away from the fact that you almost always have to have the thumbs up, thumbs down stuff and then basically start looking at the thumbs down sign, and then you've got to use Gen AI to kind of classify these thumbs down and then actually find failure modes and start actually eliminating them one by one by one.

(30:50):

It's a very, very laborious process. It's like I think we'll have lots of people involved in that kind of work for quite some time. I don't know if I answered your question, but that's

Bailey Reutzel (31:01):

Nick, do

Nicolas Morales (31:02):

You? I'd only add on the evals, a really important tool for model or application selection, but as part of that is also the total cost of ownership and these models, which now the gap is really closing on industry eval scores. You want to really get into what is the efficiency of deploying different size models that are available. Because the answer isn't necessarily, "I want the largest model that can answer questions on poetry and directions and connect to my enterprise." You probably want to be very, look for very purpose-built, specific models that will help you solve your business challenges. And generally available eval models, evals alone score isn't going to tell you that.

Bailey Reutzel (31:48):

Yeah. Fair enough. Other questions? Wow. Oh, one over here. One sec.

Rajesh Iyer (32:05):

Rajesh, could you expand a bit more on how hallucinations are a problem with the engineering? Yeah, so basically what we try and do when we're really worried about, so there's some cases, for example, underwriting. I don't have to answer all the questions, but if I'm going to answer it, I've got to be a hundred percent right. There's no way around it. So what we try and do is we'll create maybe parallel paths and send the same prompt through and see what answer we get. Are they actually consistent? The first question that we ask is, "Hey, LLM, do you have all the facts that you need to answer this question?" And it turns out it's actually pretty good at telling you when it's missing things for the same reason that, for example, he was just talking about, you can ask it to say what's missing and what I'm asking.

(32:50):

And the second question that we ask for each one of those five paths is, "What's the answer to the question?" And then I'm actually going to compare those and basically say, is there semantic uniformity on three of them or all five of them? If I'm super worried about that, I'm going to do 10 of them, that I want to make sure that all of them are saying the same thing. Or if I'm not super worried about that, I'm saying at least one of them says the same thing. There's just lots of business judgment that goes into this stuff. Again, it depends on what your risk tolerance and so on and so forth. But hallucination is almost never a problem in the real world, is what I mean. I can actually basically refuse to answer that because I'm not confident. I don't have the facts. I just refuse to answer it. If I don't get the semantic uniformity, I refuse to answer it. And what ended up happening is we actually were able to answer about 45, 50% of the questions. The rest of them, we just brought it back to a human being and said, "Too confused."

Bailey Reutzel (33:40):

That's an engineer right there worried about his responsibility. Are you an engineer? Okay, I have one last question. Yes. I think one last question we'll do. Okay. If anyone in the audience, this is more like consumer facing, maybe not for your bank, for the bank itself, wanted to go out and start a cult of AI agents for themselves. What is one tool they could use right now that you think would just benefit their life so much? Nick, starting with you.

Nicolas Morales (34:14):

Well, I would obviously say our Cohere Agent platform because

Bailey Reutzel (34:18):

I knew you'd say that.

Nicolas Morales (34:19):

I know, but it's just so easy to build and deploy an agent without being a coder or being an engineer. And that's the gap that is closing is that these are really, really powerful capabilities that were reserved for data scientists and machine learning engineers, which now there's so many ways to follow a simple prompt to actually create an agent, deploy an agent, and customize it to be your assistant, to solve a use cases every Monday, 8:00 AM it's going to run and generate a report for you. You don't need to, the word technical is so loaded, but you don't need to be that engineer to be able to do that today.

Bailey Reutzel (35:01):

And are you using AI agents in your day?

Nicolas Morales (35:03):

We use AI agents every single day.

Bailey Reutzel (35:05):

For you personally, what are you using?

Nicolas Morales (35:07):

Oh, okay. For me personally is on the thought partnership. So I use it to bounce ideas off and, yes, asking me ideas back. That's how I use it. I don't use it yet for calendaring or making reservations out of restaurants for me, et cetera.

Bailey Reutzel (35:28):

Okay. Not yet. Rajesh, for you?

Rajesh Iyer (35:30):

Yeah, I think we work with almost every company out there. So I think there's really good platforms for actually building agents from the CSPs, from specialty companies like LangChain and so on. But I think that one of the things that it's very easy to tell things that aren't agent systems by basically saying, just ask yourself. You've got to have Gen AI figuring out what it needs to do. In other words, if you're basically asking to do X, it needs to be able to say, "Hey, what are the things that I'm going to get data on?" And those could be like a million APIs in your organization. You're trying to figure out exactly which API it is. In most of the situations that we actually see today, it's like seven tools that we're trying to figure out. Can't really do any agentic with that stuff. When you get into a real enterprise, what happens is you got like a million APIs, a million functions, 700 trillion data APIs.

(36:25):

You've got to figure out how you're able to actually use an LLM to understand the "do X." I need to actually pull this information, feed it to this service, take that, feed it to this service. Sometimes it could even, for example, be a UI that LLM construction, let's call it in the old days where we used to have a query plan for SQL. You need to have an action plan. And I think that's missing in a lot of, and it's like a dead ringer for, it's not an agent system, and you don't actually see that dynamic construction of the workflow that leverages services that you have within the enterprise. So I think there's very few LLMs that actually can help you do that stuff. It's one of them. First of all, it has to be very good at instruction following. So there's actually a benchmark called IF Eval. It has to be very good at passing data along. So there's actually something called frames, which is actually a Google benchmark. So there's some very good things that, there's lots of check marks. You have to actually have a very good LLM to actually make some of this stuff happen. In the case of a cult, it just happened to be super aligned, so it's not as big a problem. Enterprises are a lot less aligned than cults are, so that's

Bailey Reutzel (37:39):

A problem. That's fair. Yeah, it's probably a good thing. I want somebody to build me an AI cult agent, chatbot, not chatbot, AI agents to manage my cryptocurrency and only win, only number go up, never number go down. So that's what I'm looking for. You guys work on that for me. Round of applause for our two panelists. Thank you all very much. We'll be back soon.