Bias Detection and Mitigation Techniques in Data Science Pipelines: An Empirical Evaluation
Main Article Content
Abstract
Background: Failure to consider algorithmic bias can result in discriminatory outcomes in machine learning systems, particularly when these models operate in high-stakes decision-making environments. Although numerous bias mitigation techniques have been proposed, most studies treat fairness assessment as a post hoc evaluation. This gap highlights the need for a lifecycle-oriented framework to examine interconnected bias and fairness mechanisms.
Aims: This study aims to conduct an empirical investigation of bias propagation across the data science continuum within a structured bias-processing framework.
Methods: The proposed framework was tested on benchmark datasets containing sensitive attributes. Three predictive models were implemented: Logistic Regression, Random Forest, and Gradient Boosting. Fairness was evaluated using Demographic Parity, Equal Opportunity, and Average Odds metrics. Predictive modeling techniques were further employed to interpret fairness outcomes. Bias mitigation strategies were applied at both data and model levels, including fairness-regularized optimization and hybrid approaches. Sensitivity analysis was conducted to examine the trade-off between fairness constraints and model loss.
Result: The empirical findings indicate that most disparities originate from bias embedded in the data rather than from model architecture. Data-level bias mitigation reduced disparity by 28%. The fairness-regularized optimization approach reduced disparity by 35%. The hybrid mitigation strategy achieved a demographic disparity reduction of 40–45%, with an accuracy decrease of no more than 2%. Sensitivity analysis revealed non-linear tensions between fairness constraints and optimization loss, demonstrating that early-stage bias mitigation stabilizes fairness without significantly increasing performance trade-offs.
Conclusion: This study extends both theoretical and practical understanding of lifecycle bias propagation in machine learning systems. The findings emphasize the importance of addressing bias at early stages of the data science pipeline to achieve stable and sustainable fairness outcomes. By integrating fairness engineering throughout the lifecycle, the proposed framework contributes to more robust and ethically aligned AI systems.
Article Details
Copyright (c) 2026 Deshinta Arrova Dewi, Ugochi Okengwu, Zakka Ugih Rizqi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Ahmad, A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems : Combination , implementation and evaluation. Expert Systems With Applications, 244(May 2023), 122778. https://doi.org/10.1016/j.eswa.2023.122778
Belenguer, L. (2022). AI bias : exploring discriminatory algorithmic decision ‑ making models and the application of possible machine ‑ centric solutions adapted from the pharmaceutical industry. AI and Ethics, 2(4), 771–787. https://doi.org/10.1007/s43681-022-00138-8
Brondolo, E., Kaur, A., & Seavey, R. (2023). Anti-Racism Efforts in Healthcare : A Selective Review From a Social Cognitive Perspective. Policy Insights from the Behavioral and Brain Sciences, 10(2), 160–170. https://doi.org/10.1177/23727322231193963
Chen, P., Wu, L., & Wang, L. (2023). AI Fairness in Data Management and Analytics : A Review on Challenges , Methodologies and Applications. Applied Sciences, 13(18), 10258. https://doi.org/10.3390/app131810258
Chen, Z., Zhang, J. I. E. M., Sarro, F., & Harman, M. (2023). A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers. ACM Transactions on Software Engineering and Methodology, 32(4), 1–30. https://doi.org/10.1145/3583561
Chowdhury, S. (2025). Shaping an adaptive approach to address the ambiguity of fairness in AI : Theory , framework , and illustrations. Cambridge Forum on AI: Law and Governance, 1, 1–17. https://doi.org/10.1017/cfl.2025.7
Das, T., & Pablo, J. (2024). Fairness issues , current approaches , and challenges in machine learning models. In International Journal of Machine Learning and Cybernetics (Vol. 15, Issue 8). Springer Berlin Heidelberg. https://doi.org/10.1007/s13042-023-02083-2
Egede, L. E., Walker, R. J., & Williams, J. S. (2023). and Social Determinants of Health : a Vision for the Future. Journal of General Internal Medicine, 39, 487–491. https://doi.org/10.1007/s11606-023-08426-7
Emami, S., & Martínez, G. (2025). Condensed ‑ gradient boosting. International Journal of Machine Learning and Cybernetics, 16(1), 687–701. https://doi.org/10.1007/s13042-024-02279-0
Fermanian, J.-D., Guégan, D., & Liu, X. (2025). Fair learning by model averaging. Risk and Decision Analysis, 11(1–2), 20–49. https://doi.org/10.1177/15697371251321734
Franklin, G., Stephens, R., Piracha, M., Tiosano, S., Lehouillier, F., Koppel, R., & Elkin, P. L. (2024). The Sociodemographic Biases in Machine Learning Algorithms : A Biomedical Informatics Perspective. Life, 14(6), 1–15. https://doi.org/10.3390/life14060652
González-sendino, R., Serrano, E., & Bajo, J. (2024). Mitigating bias in artificial intelligence : Fair data generation via causal models for transparent and explainable decision-making. Future Generation Computer Systems, 155, 384–401. https://doi.org/10.1016/j.future.2024.02.023
Lalor, J. P., Abbasi, A., Oketch, K., Dame, N., & Dame, N. (2024). Should Fairness be a Metric or a Model ? A Model-based Framework for Assessing Bias in Machine Learning Pipelines. ACM Transactions on Information Systems, 42(4), 1–41. https://doi.org/10.1145/3641276
Mangal, M., & Pardos, Z. A. (2024). Implementing equitable and intersectionality- aware ML in education : A practical guide. British Journal of Educational Technology, 55(5), 1833–2418. https://doi.org/10.1111/bjet.13484
Natras, R., Soja, B., & Schmidt, M. (2022). Ensemble Machine Learning of Random Forest , AdaBoost and XGBoost for Vertical Total Electron Content Forecasting. Remote Sensing, 14(15), 1–34. https://doi.org/10.3390/rs14153547
Rahimi, S. A., Shrivastava, R., & Brown-johnson, A. (2024). EDAI Framework for Integrating Equity , Diversity , and Inclusion Throughout the Lifecycle of AI to Improve Health and Oral Health Care : Qualitative Study Corresponding Author : Journal of Medical Internet Research, 26(1), 1–14. https://doi.org/10.2196/63356
Rojas, J. C., Fahrenbach, J., Makhni, S., Williams, J. S., Umscheid, C. A., & Chin, M. H. (2022). Framework for Integrating Equity Into Machine Learning Models. Chest Journal, 161(6), p1621-1627. https://doi.org/10.1016/j.chest.2022.02.001
Rômulo, J., Vieira, D. C., Barboza, F., & Cajueiro, D. (2025). Towards Fair AI : Mitigating Bias in Credit Decisions — A Systematic Literature Review. Journal of Risk and Financial Management, 18(5), 228. https://doi.org/10.3390/jrfm18050228
Skaiky, A. ali, Ali, H. M. S., Mohammed, A., & Mahdi, Z. A. (2025). Comprehensive Bias Mitigation in AI: Evaluating Pre-Processing, In-Processing, and Post-Processing Techniques for Fair Decision-Making. IEEE 4th International Conference on Computing and Machine Intelligence (ICMI). https://doi.org/10.1109/ICMI65310.2025.11141086
Tang, W., Liu, J., Zhou, Y., & Ding, Z. (2024). Causality-Guided Counterfactual Debiasing for Anomaly Detection of Cyber-Physical Systems. IEEE Transactions on Industrial Informatics, 20(3), 4582–4593. https://doi.org/10.1109/TII.2023.3326544
Trigo, A., Stein, N., & Belfo, F. P. (2024). Strategies to improve fairness in artificial intelligence:A systematic literature review. Education for Information, 40(3), 323–346. https://doi.org/10.3233/EFI-240045
Wan, M., Zha, D., Liu, N., & Zou, N. A. (2023). In-Processing Modeling Techniques for Machine Learning Fairness : A Survey. ACM Transactions on Knowledge Discovery from Data, 17(3), 1–17. https://doi.org/10.1145/3551390
Wang, Y., & Singh, L. (2024). Impact on bias mitigation algorithms to variations in inferred sensitive attribute uncertainty. Frontiers in Artificial Intelligence, 8, 1520330. https://doi.org/10.3389/frai.2025.1520330
Xinying, V. C., & Hooker, J. N. (2023). A guide to formulating fairness in an optimization model. Annals of Operations Research, 326(1), 581–619. https://doi.org/10.1007/s10479-023-05264-y
Zhou, N., Zhang, Z., Nair, V. N., & Singhal, H. (2022). Bias, Fairness and Accountability with Artificial Intelligence and Machine Learning Algorithms. International Statistical Review, 90(144). https://doi.org/10.1111/insr.12492
Deshinta Arrova Dewi