ZestFinance: Where Machine Learning and ‘Human Artistry’ Meet Your WalletNovember 19, 2012
by Derrick Harris
Even if you don’t agree with ZestFinance’s business, it’s hard to argue with its grasp of big data and applied mathematics. Now, the company has added a human touch to machine learning in order to make its models for assessing repayment risk even more accurate.
ZestFinance, the big-data-driven loan-underwriting company from former Google CIO Douglas Merrill, is trying to lower the interest rates on short-term loans by blending machine learning and human judgment. The mission of ZestFinance (formerly ZestCash) has always been to provide needed cash to the “underbanked,” and the company says its latest predictive model is its best yet in terms of assessing an applicant’s ability to pay a loan back without defaulting.
If you’re not familiar with ZestFinance, here’s how it works: The company takes data from approximately 70,000 sources in order to produce a score — similar to a traditional credit score — that determines the relative risk of issuing a loan to any given individual. By considering so many variables, ZestFinance says it can give a more-accurate assessment than traditional underwriters that consider between 10 and 20 variables, meaning lenders that use ZestFinance’s model can offer better repayment terms because they’re confident they’ll be repaid. ZestFinance claims its original model was 40 percent more accurate than the current “best-in-class industry score” and has increased net repayment by 90 percent over those models.
“Because no individual signal is overwhelmingly powerful,” Merrill explained to me during a recent call, “we’re not tripped by one of the variables being bad.”
However, that first-generation model (which the company dubbed “Hollerith” after famous statistician Herman Hollerith) was limited in that it relied very heavily on machine learning in order to discern relationships between the variables it analyzed. Hence “Hilbert,” ZestFinance’s latest model (named after statistician David Hilbert) that the company claims ups its accuracy rate to 54 percent higher than the industry standard. It’s able to achieve this improvement by retrofitting its machine-learning algorithms with good, old-fashioned human input — something Merrill refers to as “human artistry.”
“The combination of really big data and human artistry is the underlying value of Hilbert,” he said.
That’s because although machines are great at finding relationships and patterns, they’re not too great at putting them in context or pruning extraneous knowledge. For example, Merrill explained, we can teach a machine to learn whether it’s raining or snowing based on temperature, but a lot of what it learns is ultimately pointless. So, while the machine would learn that -1 degrees is colder than zero degrees and that there’s not much difference between 50 and 51 degrees, all it really needs to know is whether it’s below or above 32 degrees. (This is why companies like Gravity Labs, for example, also apply human judgment to machine learning systems to form more-accurate interest graphs.)
Merrill speaking at Structure: Data.
One way to think about how the addition of human insights apply to banking is to consider the issue of bankruptcy. Now that Hilbert understands that U.S. citizens can only file for Chapter 7 bankruptcy once every seven years, Merrill said, the model is able to determine that someone who last filed nine years ago might be a better credit risk than someone who just filed two years ago.
Or consider the number of mobile phone numbers a credit applicant has had in the past few years. Whereas that number alone is a relatively weak signal, Merrill said it becomes a lot clearer when viewed in the context of how many addresses someone has had. Five numbers in five years at the same address might mean someone keeps defaulting on their prepaid phone contracts and buying new phones, but five numbers in five years at different addresses might mean someone was just trying to keep a local number wherever they moved.
All told, about 25 percent of the variables Hilbert analyzes are the result of human intervention — but he’s quick to point out that humans are not fit to run the show when it comes to such large datasets and complex algorithms. The artistry must be cast on the math. ”What’s interesting is none of these things could be solved by humans alone,” he said. ” … Humans drown in a 70,000-variable list.”