AI Machine-Learning: In Bias We Trust?
According to a new study, explanation methods that help users determine whether to trust machine-learning model predictions can be less accurate for disadvantaged subgroups.
Machine-learning algorithms are sometimes employed to assist human decision-makers when the stakes are high. For example, a model may predict which law school candidates are most likely to pass the bar exam, assisting admissions officers in deciding which students to admit.
Because of the complexity of these models, often having millions of parameters, it is nearly impossible for AI researchers to fully understand how they make predictions. An admissions officer with no machine-learning experience might have no idea what is going on under the hood. Scientists sometimes employ explanation methods that mimic a larger model by creating simple approximations of its predictions. These approximations, which are far easier to understand, assist users in deciding whether to trust the model’s predictions.
However, are these explanation methods fair? If an explanation method provides better approximations for men than for women, or for white people than for black people,users may be more inclined to trust the model’s predictions for some people but not for others.
MIT scientists carefully examined the fairness of some widely used explanation methods. They discovered that the approximation quality of these explanations can vary drastically between subgroups and that the quality is often significantly lower for minoritized subgroups.
In practice, this means that if the approximation quality is lower for female applicants, there is a mismatch between the explanations and the model’s predictions, which could lead the admissions officer to wrongly reject more women than men.
Once the MIT researchers saw how pervasive these fairness gaps are, they tried several techniques to level the playing field. They were able to shrink some gaps, but couldn’t eradicate them.
“What this means in the real world is that people might incorrectly trust predictions more for some subgroups than for others. So, improving explanation models is important, but communicating the details of these models to end users is equally important. These gaps exist, so users may want to adjust their expectations as to what they are getting when they use these explanations,” says lead author Aparna Balagopalan, a graduate student in the Healthy ML group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).
High fidelity
Simplified explanation models can approximate predictions of a more complex machine-learning model in a way that humans can grasp. An effective explanation model maximizes a property known as fidelity, which measures how well it matches the larger model’s predictions.
Rather than focusing on average fidelity for the overall explanation model, the MIT researchers studied fidelity for subgroups of people in the model’s dataset. In a dataset with men and women, the fidelity should be very similar for each group, and both groups should have fidelity close to that of the overall explanation model.
“When you are just looking at the average fidelity across all instances, you might be missing out on artifacts that could exist in the explanation model,” Balagopalan says.
They developed two metrics to measure fidelity gaps, or disparities in fidelity between subgroups. One is the difference between the average fidelity across the entire explanation model and the fidelity for the worst-performing subgroup. The second calculates the absolute difference in fidelity between all possible pairs of subgroups and then computes the average. Read More...