The phone rings. It's your sister, and though you haven't spoken to her for a couple of years, you immediately know it's her. A voice-recognition system, however, might not.
Data scientists at Pindrop, a provider of voice recognition and caller ID software, have found the human voice changes so much as we age that in as little as two years, a voice recognition system might not be able to identify an infrequent caller. The company presented its research on Friday at the RSA security conference in San Francisco.
This is not superalarming — as we’ve written about many times, every form of biometric has its limitations, and biometrics need to be combined with other forms of authentication (device, geolocation, behavior, other biometrics, etc.) to be truly effective. Also, voice recognition technology can be adapted to take the effects of aging into account.
But the research is worth considering as banks continue to deploy voice recognition to authenticate callers to their call centers. USAA, Wells Fargo, Eastern Bank, Tangerine, Barclays and HSBC are among those that use the technology.
How aging affects the voice
To analyze how the human voice changes over time, Elie Khoury, principal researcher at Pindrop, first perused medical studies on the topic.
“Our muscles change with age,” he said. “There will be less mass, less strength and more body fat, and the respiratory system will become less efficient, this will make us speak slower.”
Changes in cartilage affect pitch and volume. Changes to the nervous system that cause a person to shake affect the voice, as does hearing loss, which sometimes makes people speak louder.
Then Khoury studied how this affects voice biometric systems. He used former President Obama as a subject, comparing the first weekly presidential address he gave in 2009 with all subsequent weekly addresses until January 2017.
Khoury saw a significant degradation in the system’s ability to recognize Obama’s voice. At the high accuracy threshold banks use to identify incoming callers, most would start rejecting Obama at the end of two years, he said. He ran the same test on former President George W. Bush and found the rate of voice degradation was even higher.
All told, the study was done on 122 people speaking six different languages. Across the board, Pindrop researchers saw noticeable degradation after four months.
Gender makes a difference, too. Male speakers’ pitch decreases over time, then at a certain point increases again. Women’s voices change continuously over time.
If customers called their banks frequently, none of this would be a problem. But according to Pindrop, nearly half of call center users only call their bank once every eight months. In some cases, the customer’s voice changes enough within eight months that there’s a risk of either rejecting the caller because their voiceprint match wasn’t strong enough, or falling back on knowledge-based authentication, and thereby risking authenticating a fraudster.
All is not lost
The fact that aging voices present a challenge to voice biometric systems is good to know, but shouldn’t be a deal breaker for banks considering the technology. Voice recognition systems can be calibrated to take into account the passage of time, the age of the user and their gender.
Pindrop says it has written algorithms that try to mimic the aging of male and female voices, and these are used to calibrate the voice biometric scores. It’s impossible to get a complete correction, Khoury said.
“Everyone ages differently because of physical activity, social activity, health and problems in nervous and hearing systems,” he said. The best solution is to have customers update their voiceprints periodically.
Nuance, the voice recognition provider most used by banks in their call centers and mobile apps, said its system has rarely been affected by voice aging.
For instance, its first bank client has used Nuance technology to authenticate callers since 2001 and voice aging has never prevented the system from matching an incoming caller to their voiceprint on file, said Brett Beranek, director for product strategy at Nuance.
Nuance has run tests that show that if a bank enrolled a customer in its voice biometric platform at age 20 and never heard from again until the age of 70 or 80, their voice would change, but not enough to alter the accuracy of the voice biometric system, Beranek said.
He also tested recordings of the actors Morgan Freeman and Arnold Schwarzenegger over a 30-year period and found Nuance’s system would recognize them even after a long break.
Beranek also said the impact of voice aging on biometric performance is less than a common cold, he said.
“When you wake up in the morning and it’s dry, your voice sounds different than it does during the day as you’re hydrated and have been speaking for a while,” he said. “That variation between morning and afternoon is more significant than a 10-year period of time for an adult.”
An exception, he said, is children.
“If we took your voice at the age of 10 and then we heard it at the age of 30, maybe your immediate family members would recognize that’s you,” Beranek said.
But a voice biometric system that did not receive a sample of a child’s voice during the voice maturing process might have trouble recognizing the person.
“This is acutely problematic for teenage boys,” he said. That’s likely not surprising to anyone who has been around a 12-year-old boy when his changing voice cracks.
Nuance recommends to customers who enroll preteen children to try to get them to call in once every two years, so that the system might adapt to their voice.
Most banks, though, haven’t had to deal with this because most of their customers are adults, Beranek said.
A challenge with older clients is disease, Beranek said. Many types of cancer can have a degenerative effect on the voice.
Other than these extreme examples, Nuance hasn’t had to adjust its technology for the impact of aging, Beranek said. It has had to adapt to changing devices and phone networks.
Someone calling from a cellphone in rural Mexico would cause far more challenge to a voice biometric system than that same person calling from a solid landline a few years older, he said.
As Nuance’s system listens to new audio, it automatically checks for changes and adapts the voiceprint to understand that some of those changes are caused by variances in networks and devices.
Dan Miller, lead analyst and founder of Opus Research, agreed that many things affect the accuracy of voice biometrics, including the quality of the voiceprints and variabilities in a person’s voice by time of day, by whether they have a cold or whether they’re in a bad mood. And he agreed that biometric providers already take these factors into account.
“A lot of things, including age, will cause different scoring,” Miller said. “That does tend to be baked into these solutions.”
He doesn’t dismiss the impact of aging entirely, though.
“Kudos to Pindrop for documenting it,” he said. “The warning isn’t that you shouldn’t use voice biometrics as an authentication factor, it’s that you should use multiple factors of authentication and use protocols such that should you not have confidence, not achieve the threshold, you’re not falsely rejecting a legitimate person, you’re just asking for something else or applying another factor in order to let them in.”
The next big question, Miller says, is whether banks and other companies will be allowed to store voiceprints and other biometric information for long periods of time, because they are personally identifiable information.
“Companies that are thinking about biometrics for authentication are well counseled to think about how they refresh their stored information, and inform their customers periodically that they’re using their voiceprints, just to keep everything transparent, above board and useful,” he said.
Editor at Large Penny Crosman welcomes feedback at email@example.com.