Treffer: Embracing Training Dataset Bias for Automated Harmful Detection.
Weitere Informationen
The increasing volume of social media content surpasses the capacity of human moderation and poses psychological risks, leading to a need for automated moderation systems. However, these systems often exhibit biases against minoritized groups. One way to mitigate these biases is by altering the training data, which are biased by human annotators. Increasing diversity among annotators can help, but implementing this is challenging for machine learning specialists and tends to focus on minimizing identity‐based bias rather than embracing diverse perspectives. Using moral systems theory from social psychology, we suggest that automated systems should incorporate diverse, context‐aware interpretations of harm, embracing biases to adequately address moderation issues. We analyze how different dimensions of 2,180 U.S.‐based annotators' personal moral systems like institutional affiliation (religion, political party), values (political ideology), and identities (age, gender, sexual orientation, and race, ethnicity, or place of origin) influenced how they judged whether 101 social media comments were harmful. We find institutional affiliations have the greatest impact on labeling, followed by values and identities. These insights advocate for a diversity approach that reflects community‐specific user bases, allowing model developers and online communities to intentionally select biases for better moderation outcomes. [ABSTRACT FROM AUTHOR]