Bias Detection and Mitigation Techniques in Data Science Pipelines: An Empirical Evaluation

Deshinta Arrova Dewi; Ugochi Okengwu; Zakka Ugih  Rizqi

doi:10.58723/ijaaiml.v3i1.655

PDF

Issue

Vol. 3 No. 1 (2026): International Journal of Advances in Artificial Intelligence and Machine Learning

Published: Mar 31, 2026

Keywords:

Bias Mitigation,
Data Science Pipelines,
Ethical Machine Learning,
Fair AI,
Model Evaluation

Deshinta Arrova Dewi

INTI International University,

https://orcid.org/0000-0003-1488-7696

Ugochi Okengwu

Computer Science Department, Faculty of Computing, University of Port Harcourt,

https://orcid.org/0000-0003-1695-0660

Zakka Ugih Rizqi

Department of Materials and Production, Aalborg University,

https://orcid.org/0000-0003-2986-9503

Abstract

Background: Failure to consider algorithmic bias can result in discriminatory outcomes in machine learning systems, particularly when these models operate in high-stakes decision-making environments. Although numerous bias mitigation techniques have been proposed, most studies treat fairness assessment as a post hoc evaluation. This gap highlights the need for a lifecycle-oriented framework to examine interconnected bias and fairness mechanisms.
Aims: This study aims to conduct an empirical investigation of bias propagation across the data science continuum within a structured bias-processing framework.
Methods: The proposed framework was tested on benchmark datasets containing sensitive attributes. Three predictive models were implemented: Logistic Regression, Random Forest, and Gradient Boosting. Fairness was evaluated using Demographic Parity, Equal Opportunity, and Average Odds metrics. Predictive modeling techniques were further employed to interpret fairness outcomes. Bias mitigation strategies were applied at both data and model levels, including fairness-regularized optimization and hybrid approaches. Sensitivity analysis was conducted to examine the trade-off between fairness constraints and model loss.
Result: The empirical findings indicate that most disparities originate from bias embedded in the data rather than from model architecture. Data-level bias mitigation reduced disparity by 28%. The fairness-regularized optimization approach reduced disparity by 35%. The hybrid mitigation strategy achieved a demographic disparity reduction of 40–45%, with an accuracy decrease of no more than 2%. Sensitivity analysis revealed non-linear tensions between fairness constraints and optimization loss, demonstrating that early-stage bias mitigation stabilizes fairness without significantly increasing performance trade-offs.
Conclusion: This study extends both theoretical and practical understanding of lifecycle bias propagation in machine learning systems. The findings emphasize the importance of addressing bias at early stages of the data science pipeline to achieve stable and sustainable fairness outcomes. By integrating fairness engineering throughout the lifecycle, the proposed framework contributes to more robust and ethically aligned AI systems.

How to Cite

Dewi, D. A., Okengwu, U., & Rizqi, Z. U. (2026). Bias Detection and Mitigation Techniques in Data Science Pipelines: An Empirical Evaluation. International Journal of Advances in Artificial Intelligence and Machine Learning, 3(1), 34–43. https://doi.org/10.58723/ijaaiml.v3i1.655

Section

Articles

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

Ahmad, A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems : Combination , implementation and evaluation. Expert Systems With Applications, 244(May 2023), 122778. https://doi.org/10.1016/j.eswa.2023.122778

Belenguer, L. (2022). AI bias : exploring discriminatory algorithmic decision ‑ making models and the application of possible machine ‑ centric solutions adapted from the pharmaceutical industry. AI and Ethics, 2(4), 771–787. https://doi.org/10.1007/s43681-022-00138-8

Brondolo, E., Kaur, A., & Seavey, R. (2023). Anti-Racism Efforts in Healthcare : A Selective Review From a Social Cognitive Perspective. Policy Insights from the Behavioral and Brain Sciences, 10(2), 160–170. https://doi.org/10.1177/23727322231193963

Chen, P., Wu, L., & Wang, L. (2023). AI Fairness in Data Management and Analytics : A Review on Challenges , Methodologies and Applications. Applied Sciences, 13(18), 10258. https://doi.org/10.3390/app131810258

Chen, Z., Zhang, J. I. E. M., Sarro, F., & Harman, M. (2023). A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers. ACM Transactions on Software Engineering and Methodology, 32(4), 1–30. https://doi.org/10.1145/3583561

Chowdhury, S. (2025). Shaping an adaptive approach to address the ambiguity of fairness in AI : Theory , framework , and illustrations. Cambridge Forum on AI: Law and Governance, 1, 1–17. https://doi.org/10.1017/cfl.2025.7

Das, T., & Pablo, J. (2024). Fairness issues , current approaches , and challenges in machine learning models. In International Journal of Machine Learning and Cybernetics (Vol. 15, Issue 8). Springer Berlin Heidelberg. https://doi.org/10.1007/s13042-023-02083-2

Egede, L. E., Walker, R. J., & Williams, J. S. (2023). and Social Determinants of Health : a Vision for the Future. Journal of General Internal Medicine, 39, 487–491. https://doi.org/10.1007/s11606-023-08426-7

Emami, S., & Martínez, G. (2025). Condensed ‑ gradient boosting. International Journal of Machine Learning and Cybernetics, 16(1), 687–701. https://doi.org/10.1007/s13042-024-02279-0

Fermanian, J.-D., Guégan, D., & Liu, X. (2025). Fair learning by model averaging. Risk and Decision Analysis, 11(1–2), 20–49. https://doi.org/10.1177/15697371251321734

Franklin, G., Stephens, R., Piracha, M., Tiosano, S., Lehouillier, F., Koppel, R., & Elkin, P. L. (2024). The Sociodemographic Biases in Machine Learning Algorithms : A Biomedical Informatics Perspective. Life, 14(6), 1–15. https://doi.org/10.3390/life14060652

González-sendino, R., Serrano, E., & Bajo, J. (2024). Mitigating bias in artificial intelligence : Fair data generation via causal models for transparent and explainable decision-making. Future Generation Computer Systems, 155, 384–401. https://doi.org/10.1016/j.future.2024.02.023

Lalor, J. P., Abbasi, A., Oketch, K., Dame, N., & Dame, N. (2024). Should Fairness be a Metric or a Model ? A Model-based Framework for Assessing Bias in Machine Learning Pipelines. ACM Transactions on Information Systems, 42(4), 1–41. https://doi.org/10.1145/3641276

Mangal, M., & Pardos, Z. A. (2024). Implementing equitable and intersectionality- aware ML in education : A practical guide. British Journal of Educational Technology, 55(5), 1833–2418. https://doi.org/10.1111/bjet.13484

Natras, R., Soja, B., & Schmidt, M. (2022). Ensemble Machine Learning of Random Forest , AdaBoost and XGBoost for Vertical Total Electron Content Forecasting. Remote Sensing, 14(15), 1–34. https://doi.org/10.3390/rs14153547

Rahimi, S. A., Shrivastava, R., & Brown-johnson, A. (2024). EDAI Framework for Integrating Equity , Diversity , and Inclusion Throughout the Lifecycle of AI to Improve Health and Oral Health Care : Qualitative Study Corresponding Author : Journal of Medical Internet Research, 26(1), 1–14. https://doi.org/10.2196/63356

Rojas, J. C., Fahrenbach, J., Makhni, S., Williams, J. S., Umscheid, C. A., & Chin, M. H. (2022). Framework for Integrating Equity Into Machine Learning Models. Chest Journal, 161(6), p1621-1627. https://doi.org/10.1016/j.chest.2022.02.001

Rômulo, J., Vieira, D. C., Barboza, F., & Cajueiro, D. (2025). Towards Fair AI : Mitigating Bias in Credit Decisions — A Systematic Literature Review. Journal of Risk and Financial Management, 18(5), 228. https://doi.org/10.3390/jrfm18050228

Skaiky, A. ali, Ali, H. M. S., Mohammed, A., & Mahdi, Z. A. (2025). Comprehensive Bias Mitigation in AI: Evaluating Pre-Processing, In-Processing, and Post-Processing Techniques for Fair Decision-Making. IEEE 4th International Conference on Computing and Machine Intelligence (ICMI). https://doi.org/10.1109/ICMI65310.2025.11141086

Tang, W., Liu, J., Zhou, Y., & Ding, Z. (2024). Causality-Guided Counterfactual Debiasing for Anomaly Detection of Cyber-Physical Systems. IEEE Transactions on Industrial Informatics, 20(3), 4582–4593. https://doi.org/10.1109/TII.2023.3326544

Trigo, A., Stein, N., & Belfo, F. P. (2024). Strategies to improve fairness in artificial intelligence:A systematic literature review. Education for Information, 40(3), 323–346. https://doi.org/10.3233/EFI-240045

Wan, M., Zha, D., Liu, N., & Zou, N. A. (2023). In-Processing Modeling Techniques for Machine Learning Fairness : A Survey. ACM Transactions on Knowledge Discovery from Data, 17(3), 1–17. https://doi.org/10.1145/3551390

Wang, Y., & Singh, L. (2024). Impact on bias mitigation algorithms to variations in inferred sensitive attribute uncertainty. Frontiers in Artificial Intelligence, 8, 1520330. https://doi.org/10.3389/frai.2025.1520330

Xinying, V. C., & Hooker, J. N. (2023). A guide to formulating fairness in an optimization model. Annals of Operations Research, 326(1), 581–619. https://doi.org/10.1007/s10479-023-05264-y

Zhou, N., Zhang, Z., Nair, V. N., & Singhal, H. (2022). Bias, Fairness and Accountability with Artificial Intelligence and Machine Learning Algorithms. International Statistical Review, 90(144). https://doi.org/10.1111/insr.12492

Total 16 Author's Countries
		(14)
		(9)
		(4)
		(3)
		(3)
		(2)
		(2)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
Total 7 Reviewer's Countries
		(33)
		(6)
		(2)
		(1)
		(1)
		(1)
		(1)
Total 10 Editor's Countries
		(8)
		(2)
		(2)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)

Article Sidebar

Main Article Content

Abstract

Article Details

References