Unified Adversarial Training for Bias Mitigation and Privacy Preservation

Abstract
Human-Centered Machine Learning (HCML) models often face challenges due to inherent biases related to population variability and limited access to large datasets. This results in algorithms that fail to generalize and accommodate out-of-distribution samples, thereby hinder- ing real-world applications. Additionally, standard training procedures tend to make neural networks vulnerable to privacy risks such as recon- struction attacks. To address these issues, we propose a novel training method based on an adversarial network that aims to reduce the repre- sentation bias induced by the lack of diversity among training samples. Unlike similar approaches that use a known bias predictor as the adver- sarial signal, our method mitigates multiple unknown biases, acting as an effective regularization term that reduces the validation gap while also removing non-essential features. This feature selection further improves privacy by preventing the model from being repurposed or used to re- trieve information about training or inferred samples, as demonstrated on the IMDb-Face dataset where the method achieves approximately a 6.7% improvement in accuracy and enhances robustness against recon- struction attacks by about 174%.
Publication Reference
Distributed Computing and Artificial Intelligence, 22nd International Conference (DCAI 2025). In: Lecture Notes in Networks and Systems, vol. X. Springer, (year 2025).
Year
2025-06
Sponsors