Can crowdsourcing mitigate a dearth of data scientists for banks?

Register now

As companies struggle to adopt artificial intelligence technology and offer more personalized, intuitive services to customers, demand for data scientists continues to soar.

When LinkedIn tried to quantify the shortage of data scientists in August, it concluded that nationwide, there were 151,717 data science-related jobs employers were struggling to fill. The need is felt most in big cities like New York (34,032), the San Francisco Bay Area (31,798) and Los Angeles (12,251).

As a result, some financial services firms are testing a new idea to try and meet this demand — crowdsourcing. The idea is to tap the brains of talented data analysts all over the world to solve problems. Kaggle, a platform for data science and analytics competitions that’s now owned by Google, got data scientists used to working this way. It has more than two million registered data scientists (or Kagglers, as it calls them) using its platform.

Two hedge funds, Numerai and Quantopian, have borrowed this idea. Following is a look at what they're doing, and what it means for data modeling and AI-related work going forward:


Quantopian has been running a $150 million hedge fund using crowdsourced data scientists for two years.

According to CEO John Fawcett, the approach solves two problems.

One is the data science talent shortage. Quantopian has 230,000 data scientists from all over the world using its platform. The largest hedge funds have about 150 data scientists on staff.

“We’re finding people long before anyone else because of this global reach,” he said.

The other problem crowdsourcing addresses is the concentration risk that can occur when quants share similar backgrounds and views.

“The most extreme version of this was in the summer of 2007 when there was this quant meltdown, where a bunch of quant funds were in similar strategies and positions and had similar risk controls,” Fawcett said. “There was an unwind of a fund that triggered a bunch of portfolio liquidation and compounded the risk.”

The quants had all been trained in the same way, had built similar systems, had similar strategies and were exploiting similar market inefficiencies, Fawcett said.

“There is deep demand for uncorrelated alpha that is credible,” he said, referring to a system which can significantly outperform the market.

According to Fawcett, the search for alpha is the perfect problem for crowdsourcing.

“In investing because of the effects of alpha decay, where ideas atrophy and die, we have to constantly be searching for new inefficiencies” in the stock market, he said. Having a large number of people going in different directions brings new ideas to the surface.

Quantopian provides free market data, corporate fundamentals, and tools data scientists can use to build investment algorithms. It also provides educational materials. It's adding delayed Factset data to its data library, through an agreement reached last year.

It runs contests in which people submit their algorithms to qualify for cash prizes. When Quantopian wants to use a model for investment purposes, it contacts the author and offers a performance-based loyalty agreement.

“That can pay them quite handsomely if they generate returns,” Fawcett said.

Registered users include academics and data scientists in oil and gas, national laboratories, tech companies, and anywhere else data modeling and predictions take place, Fawcett said. Most are trained in creating models and build them for a living.

“They’re looking for a way to get access to financial data so they can try similar things with financial data,” he said.

Motivations vary. For some, this is a hobby or they’re curious to see if their techniques would work in the market — “the ultimate puzzle,” Fawcett said.

Others want to break into the financial services industry.

“Most of the people with this talent don’t live in New York or London or Hong Kong,” Fawcett said. “They live elsewhere in the world and don’t have a path to participate. We’re creating the opportunity to learn and to be in the industry as a quant.”

Still others try to make a living off the royalties they generate from Quantopian. A growing number of users aren’t quants at all, but investment professionals who want to learn data science.

Quantopian doesn’t crowdsource everything. Professional portfolio managers choose the models the hedge fund uses. A centralized team develops the software platform and prepares data for it. Fawcett wouldn’t say how the hedge fund is performing.

“We’re still at it. I think that counts for something,” he said. “What we’re seeing is in line with what our models predict.”


Newer to this space is Numerai, a global equities hedge fund founded in 2015 and backed by the entrepreneur Peter Diamandis, Renaissance Technologies cofounder Howard Morgan and Union Square Ventures. Like Kaggle and Quantopian, it hosts a community of data scientists around the world — a total of 40,000 — 3,000 of which are active weekly users. Some work at NASA or have Ph.D.s.

Numerai is the brainchild of Richard Craib, who was working as a quant manager in Cape Town, South Africa, for Prudential Asset Management when he became aware of Kaggle and the large international community of data scientists on it.

“A big part of the vision is tapping into this globally elite group of modelers, machine learning experts that don’t have experience with financial data and don’t work on Wall Street already, but are probably better modelers than those that exist on Wall Street,” said Matt Boyd, Numerai's president and COO.

Around the same time, Craib became intrigued by blockchain technology and how it could help parties work together that don’t trust each other. Numerai’s contests are handled by smart contracts running on Ethereum.

Numerai hosts seven data science tournaments a week. Each one asks the data scientists to build a model that forecasts something (e.g. the performance of small-cap equities). They have to ante up a bit of Numerai’s cryptocurrency, Numeraire, to participate. They are eligible for cash prizes if their models perform well for a month.

Numerai’s modeling techniques are based on machine learning.

“Machine learning algorithms can evaluate all the permutations of relationships within a data set in a much more efficient way than a team of researchers doing regression models. You’re able to find less-linear combinations and nonlinear combinations of data that have forecasting power,” Boyd said. “That means we can find additional signals in data sets that other people have been using for many years.”

The hedge fund uses ensemble theory to combine the best-performing machine learning algorithms. Boyd would not say the hedge fund’s returns or even how much money it’s managing.

“We understand based on feedback we’ve received we’re well above the median return, that we’re in the top quartile in that space,” he said.

Use of the platform is growing. Numerai is getting about 3,000 submissions a week, versus a couple of hundred a week in early 2018.

Boyd says Numerai is democratizing data modeling.

“Whether you went to the best school in the world and learned from the best professors or you taught yourself and live in a basement, we don’t really care,” he said. “We don’t even know. It’s not important to us if you’re a really good modeler.”

Modeling experience is more important even than knowledge of the stock market.

“It does play quite contrary to the old-school model of the portfolio manager who has developed all this gut instinct and just knows how a stock is going to move and you’ve got some minion data cleansers around him cutting out articles from the Financial Times and sending them to him,” Boyd said.

Quantopian currently offers a version of its technology to others; Numerai plans to do the same in the coming year.

What could go wrong?

Without knowing how well these funds are performing, it's hard to say objectively whether or not the idea is working well.

“There’s been a move toward these crowdfunded and follower models,” said Brad Bailey, research director at Celent. “What I like about these models is there are brilliant people in little villages in different parts of the world, so you open up to all this talent. Potentially they’ll come up with brilliant ideas that might work.”

But performance will be key, he acknowledged.

“The proof will be in the pudding," Bailey said. "You could have someone looking at the entrails of a dog and making money and people would give them money to invest."

David Weiss, principal analyst at Market Structure Metrics, is even more skeptical.

“There is AI in use — particularly machine learning and data mining, robotic process automation, natural language processing, in more traditional, organic, evolving ways,” he said. “It’s certainly going on on the buy side at hedge funds and prop shops, but not necessarily like this.”

Weiss hasn’t seen any established hedge funds taking the crowdsourcing route.

“It could be totally legit and way ahead of its time,” he said. “I strongly believe when it comes to product management, it’s not enough to have a good idea. You have to have a good idea at the right time. This is probably three years too early. If they have enough capital to keep their technology trade on, then we’ll see.”

Weiss also doubts whether crowdsourcing draws out data scientists’ best work.

“Let’s assert that data scientists are in low supply and high demand,” he said. “What would be the motivation for a data scientist to crowdsource their best ideas? If you want to get primo stuff from 40,000 data scientists, you want their attention and you want them to dedicate time to this.”

Editor at Large Penny Crosman welcomes feedback at

For reprint and licensing requests for this article, click here.