Like any other data tech, 'garbage in, garbage out' hurts artificial intelligence

By Sunil Madhu June 20, 2018, 12:01 a.m. EDT 2 Min Read

Artificial intelligence and machine learning are leapfrog technologies, transforming the way payment processors, banks, online businesses and others are interacting with current and prospective customers.

They far exceed human intelligence and intuition, and can verify the identity of the person on the other end of an online transaction and detect fraud. However, there are limitations, shortcomings and misapplications of data science that can impact results.

To produce reliable and accurate decisions, AI and machine learning depend on three basic elements.

The first is good data. In fact, data engineering is 30% of data science. Without properly vetted inputs, even the most advanced AI systems cannot generate trustworthy outputs. AI is susceptible to garbage in, garbage out.

Therefore, data must be selected carefully to avoid biases. Understanding the difference between causality and correlation are prerequisites. For instance, if sample data has a pre-existing bias to the specific races of people that live in a given ZIP code and that bias is not corrected, a machine learning system can unintentionally encode racial bias into results for that ZIP code.

The second is selecting meaningful features for training data models. For instance, in e-commerce the age of an account as a feature can be combined with a label such as “chargeback fraud” to train a model to learn how to detect malicious accounts.

Alternatively, a feature that tracks race, if used for models trained to assess consumer lending default risk, can introduce racial bias and result in redlining (https://en.wikipedia.org/wiki/Redlining). Once features are selected, by humans or machines, data scientists should determine which are unique or correlated, and those that can be combined to produce new features.

Finally, efficient algorithms are required to help machines learn and generate models that can make reliable predictions. These include algorithms for linear and nonlinear problems, depending on how the data is distributed. If it is distributed on a 2D chart that it can be divided into two distinct groups (or classes) by a straight line (or a plane in 3D space), then the data is ripe for linear algorithms. When no simple line or hyperplane exists to subdivide the data, nonlinear algorithms are useful, such as those found in artificial neural networks.

When properly implemented for digital identity verification, AI and machine learning can be helpful in several ways. Instead of making slow-batch decisions that require users to wait minutes, hours or days for an answer, data science enables real-time decision-making and a frictionless customer experience.

And the ability to proactively detect and fight fraud, since machines can go beyond human-defined rules and decision trees to recognize patterns in vast amounts of data and predict previously unseen patterns.

AI can also enable organizations to resolve and accept more qualified applicants with fewer false declines that cause friction. This is especially important among the millennial and Generation Z consumer populations, which are mobile-first, demand instant gratification and have thin credit files.

Ultimately, the application of AI and machine techniques for digital identity verification will enable zero-friction consumer experiences, like Amazon Go, which allows shoppers in retail outlets to buy products and leave the store, without paying at a checkout counter.

Sunil Madhu

Founder and Chief Strategy Officer, Socure