Unified Adversarial Training for Bias Mitigation and Privacy Preservation
Loading...
Author
Vuagniaux, Rémy
Dia, Mohamad
Türetken, Engin
DOI
Abstract
Human-Centered Machine Learning (HCML) models often
face challenges due to inherent biases related to population variability
and limited access to large datasets. This results in algorithms that fail to
generalize and accommodate out-of-distribution samples, thereby hinder-
ing real-world applications. Additionally, standard training procedures
tend to make neural networks vulnerable to privacy risks such as recon-
struction attacks. To address these issues, we propose a novel training
method based on an adversarial network that aims to reduce the repre-
sentation bias induced by the lack of diversity among training samples.
Unlike similar approaches that use a known bias predictor as the adver-
sarial signal, our method mitigates multiple unknown biases, acting as an
effective regularization term that reduces the validation gap while also
removing non-essential features. This feature selection further improves
privacy by preventing the model from being repurposed or used to re-
trieve information about training or inferred samples, as demonstrated
on the IMDb-Face dataset where the method achieves approximately a
6.7% improvement in accuracy and enhances robustness against recon-
struction attacks by about 174%.
Publication Reference
Distributed Computing and Artificial Intelligence, 22nd International Conference (DCAI 2025). In: Lecture Notes in Networks and Systems, vol. X. Springer, (year 2025).
Year
2025-06