Today’s book review is Weapons of Math Destruction, How big data increases inequality and threatens Democracy, by Cathy O’Neil.
Cathy O’Neil is a data scientist, with a PhD in mathematics, who blogs here. She has built models, and also tried to deconstruct them for those affected by them.
This book is a thoughtful examination of the uses of Big Data, from someone who deeply understands how it really works, and what you can and can’t do with it. Despite the title, it doesn’t say that all uses of big data are bad. Instead, through many different examples, it explains what anyone who has ever built a model (should) know – models can easily (and unconsciously) replicate the biases of the real world and the model builder. And just because you have lots of data, that doesn’t mean they have predictive power.
There are many examples that O’Neil identifies of complex models built to improve decisions, which end up with consequences from the unintended to the frankly terrifying. A few examples of the more terrifying end of the spectrum:
- models of teachers’ “added value” where the information that is being modelled is too random (depending on the children they happen to have in their class from year to year) to have any statistical significance, but is used to decide which teachers get fired each year
- models used by admissions offices at universities which use current admissions to build computer models – and hence replicate all of the current biases of the admissions system
- Models used for predictive policing which are used to go and knock on people’s doors to tell them that models predict they are more likely to commit a crime and therefore the police are watching them closely (even if they haven’t done anything yet).
And my recent series of blog posts on maths study and HSC scaling is an investigation into a classic example of a complex model (the ATAR scaling model), intended for a noble purpose, which seems to be having the unintended consequence of discouraging maths study.
My favourite example is in the area of credit. As O’Neil points out, back before models, bankers had their own mental models of what made someone a good credit risk:
A banker… was more likely to trust people from his own circles. This was only human. But it meant that for millions of Americans the pre-digital status quo was just as awful as some of the WMDs I’ve been describing. Outsiders, including minorities and women, were routinely locked out.
So banks started developing models, that were intended to be fairer, and would just look at the financial information of the prospective borrowers. These models were generally fairer, and also had a feedback loop, that looked at their actual experience, and so tweaked how the factors could be used. You have the right to see your credit report for free. Credit information used in this context is strictly regulated. But it is also incredibly useful in many other contexts. For example – which customers are likely to be richer, and hence more valuable. Or poorer, less credit worthy customers might be great customers to target for scams.
If you can model this, then you can probably make money from it. But using credit scores for wider purposes (such as direct marketing) is not allowed. So companies develop models from other data that they have, that are likely to approximate the credit score. And that can create a number of problems – most particularly that companies use data proxies that they think might predict a credit score. Those proxies can take us straight back to the bad old days of race, gender, and criminal history, so that people from marginalised groups are assumed to have bad credit scores and treated accordingly.
As O’Neil says,
[Credit scoring]s great advance was to ditch the proxies in favor of the relevant financial data, like past behavior with respect to paying bills. They focused their analysis on the individual in question – and not on other people with similar attributes. E-scores, by contrast, march us back in time. In a few milliseconds, they carry out thousands of “people like you” calculations”. And if enough of these “similar” people turn out to be deadbeats, or worse, criminals, that individual will be treated accordingly.
Much of this credit information is used in widely varied ways – as part of hiring decisions, for dating websites, and even to sort customers in a phone queue. But in some of those examples, despite most companies assuming the more data the better, use of this information can just perpetuate past injustices. For example, if women have historically been paid less, and had to pay more for credit, a woman is probably more likely to default on a loan than a man. Does that mean that women should have a lower score because being female is quite a useful data proxy for credit score, and is statistically significant? Perhaps, as O’Neil says,
in the name of fairness, some of this data should remain uncrunched.
This book is a useful caution in this era of breathless articles every day about the excitement of big data. It is a reminder that models are only as good as the biases of the data being fed into them, and the modellers who are building them. If you start with biased data, you will end with a biased model.
There are nuances – models which actively review their outputs, and test whether they are providing useful predictions, can start biased and end up unbiased. And models can be used for good – a model built by a university to catch students who were at higher risk of dropping out and give them extra support to help them stay in school is a great example. But just because a model has been built with big data doesn’t make it right, or fair.
When models, in the past, were slower to build, such as credit scoring models, then legislative requirements could keep up – in the form of providing rules for what data was fair to be used about people, and in which context. But there is so much data now that is almost impossible. O’Neil calls for open source models – for anyone using models for decision-making to make them public, so that it is possible for others to review their fairness and accuracy.
As O’Neil says,
These models are constructed not just from data but from the choices we make about which data to pay attention to – and which to leave out. These choices are not just about logistics, profits and efficiency. They are fundamentally moral.
Actuaries, as the builders of models, need to remember these wise words. Models are not just a mathematical construct; particularly when (as is most often the case) they are modelling human behaviour in some kind. They involve choices, and judgements. It is worth trying to make them wise ones.
I can’t find an email contact so I’ll write here. Can I send you something about how community workers use actuarial data wrongly … mostly because they don’t use professional actuarial. This creates problems with assessing risk; both under responding and over acting. The line above ” that models are only as good as the biases of the data being fed into them.” I thought it might be useful addition to your blog. I much love your work.
Hi Jennifer, really like your posts and I always enjoy reading them. Unlike traditional way of analyzing actuarial experience, which was often conducted to evaluate the AE trends at an aggregate level (i.e. by age band, tenure, SI band, cause of claims, etc), the relative population residing in each aggregate level is relative stable and homogeneous to distinguish. It is a question to the business whether they are just interested to know bucket to bucket the AE experience is like, or whether we can do something about it, lifting up the AE experience by leveraging the data wisely. I would be prone to believe that statistical modelling and churn propensity prediction of portfolio is an alternative to the key assumption parameters or permutation tables. Actuaries can be smarter to create customers touch points, bringing in data from front office, policy administration and underwriting, claims management, EV altogether, allowing dynamic adjustments to reflect true policyholder behaviour assumption. I would be prone to believe using machine learning tool, actuaries can be more proactively engage with the business and decision making, bringing deeper insight to the policyholder behaviour, or even, numerically say with a 2% gain in experience of using predicting modelling, contributing to xxx of whole year worth of new business income. Now the question is how, and when the modelling tool can be used? Can future actuaries demystify the probabilistic logics and unlock the value of lapsed customers?