Measuring and Mitigating Unintended Bias in Text Classification
Venue
AAAI/ACM Conference on AI, Ethics, and Society (2018)
Publication Year
2018
Authors
Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, Lucy Vasserman
BibTeX
Abstract
We introduce and illustrate a new approach to measuring and mitigating unintended
bias in machine learning models. Our definition of unintended bias is parameterized
by a test set and a subset of input features. We illustrate how this can be used to
evaluate text classifiers using a synthetic test set and a public corpus of
comments annotated for toxicity from Wikipedia Talk pages. We also demonstrate how
imbalances in training data can lead to unintended bias in the resulting models,
and therefore potentially unfair applications. We use a set of common demographic
identity terms as the subset of input features on which we measure bias. This
technique permits analysis in the common scenario where demographic information on
authors and readers is unavailable, so that bias mitigation must focus on the
content of the text itself. The mitigation method we introduce is an unsupervised
approach based on balancing the training dataset. We demonstrate that this approach
reduces the unintended bias without compromising overall model quality
