AI generates covertly racist decisions about people based on their dialect

Hofmann, Valentin; Kalluri, Pratyusha Ria; Jurafsky, Dan; King, Sharese

doi:10.6082/9kdes-wsb93

Published August 28, 2024 | Version v1

Journal article Open

AI generates covertly racist decisions about people based on their dialect

1. Allen Institute for AI
2. Stanford University
3. University of Chicago

Hundreds of millions of people now interact with language models, with uses ranging from help with writing to informing hiring decisions. However, these language models are known to perpetuate systematic racial prejudices, making their judgements biased in problematic ways about groups such as African Americans. Although previous research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time, particularly in the United States after the civil rights movement. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice, exhibiting raciolinguistic stereotypes about speakers of African American English (AAE) that are more negative than any human stereotypes about African Americans ever experimentally recorded. By contrast, the language models' overt stereotypes about African Americans are more positive. Dialect prejudice has the potential for harmful consequences: language models are more likely to suggest that speakers of AAE be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death. Finally, we show that current practices of alleviating racial bias in language models, such as human preference alignment, exacerbate the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level. Our findings have far-reaching implications for the fair and safe use of language technology.

Data availability

All the datasets used in this study are publicly available. The dataset released as ref. 87 can be found at https://aclanthology.org/2020.emnlp-main.473/. The dataset released as ref. 83 can be found at http://slanglab.cs.umass.edu/TwitterAAE/. The human stereotype scores used for evaluation can be found in the published articles of the Princeton Trilogy studies. The most recent of these articles also contains the human favourability scores for the trait adjectives. The dataset of occupational prestige that we used for the employability analysis can be found in the corresponding paper. The Brown Corpus, which we used for the Supplementary Information ('Feature analysis'), can be found at http://www.nltk.org/nltk_data/. The dataset containing the parallel AAE, Appalachian English and Indian English texts, which we used in the Supplementary Information ('Alternative explanations'), can be found at https://huggingface.co/collections/SALT-NLP/value-nlp-666b60a7f76c14551bda4f52.

Our code is written in Python and draws on the Python packages openai and transformers for language-model probing, as well as numpy, pandas, scipy and statsmodels for data analysis. The feature analysis described in the Supplementary Information also uses the VALUE Python library. Our code is publicly available on GitHub at https://github.com/valentinhofmann/dialect-prejudice.

Files