Failure Mode Analysis of Machine Learning Models in Realistic Data Deployment Scenarios
Main Article Content
Abstract
Background: Machine learning models frequently demonstrate strong performance under controlled benchmark evaluations. However, such evaluations often fail to capture hidden vulnerabilities that emerge under realistic deployment conditions. In real-world environments, models are exposed to stressors such as label corruption, feature noise, distributional shifts, and operational constraints, including reduced computational precision and increased latency. These conditions can induce performance degradation and structural instability, highlighting the need for a systematic robustness evaluation framework that goes beyond conventional accuracy metrics.
Aims: This paper aims to introduce a formalized Failure Mode Analysis Protocol (FMAP) for evaluating machine learning model robustness under realistic operational stressors. The study reconceptualizes robustness evaluation as a distribution-based process, where model deployment itself generates a new distribution over time.
Methods: The proposed FMAP framework evaluates model behavior under progressively adverse conditions, including symmetric label corruption, additive feature noise, distributional shifts, and operational constraints such as reduced numerical precision and increased inference latency. Experiments were conducted across diverse tabular and image benchmark datasets using representative model architectures, including linear models, ensemble methods, margin-based models, and deep neural networks.
Result: The experiments reveal distinct robustness profiles across model architectures when exposed to escalating stress conditions. Operational constraints and compositional limitations were shown to induce measurable degradation patterns, including instability and output collapse under extreme stress. The findings demonstrate that model failure is not solely a function of predictive accuracy loss but is closely linked to operational constraints and evolving distributional conditions. The distribution-based evaluation framework effectively captures early-stage degradation and full failure transitions.
Conclusion: This study establishes a structured protocol for analyzing machine learning failure modes under realistic deployment scenarios. By framing robustness evaluation as a distribution-based process, the FMAP approach provides a systematic method for identifying operational risks and structural vulnerabilities.
Article Details
Copyright (c) 2026 Lau Meng Cheng, Amel Zulfukar Hassan Adlan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Abdelkader, M. M., & Csámer, Á. (2025). Comparative assessment of machine learning models for landslide susceptibility mapping : a focus on validation and accuracy. Natural Hazards, 121(9), 10299–10321. https://doi.org/10.1007/s11069-025-07197-0
Ahmed, E. (2024). Student Performance Prediction Using Machine Learning Algorithms. Applied Computational Intelligence and Soft Computing, 1. https://doi.org/10.1155/2024/4067721
An, J., Hu, X., Gong, L., Zou, Z., & Zheng, L.-R. (2024). Fuzzy reliability evaluation and machine learning-based fault prediction of wind turbines. Journal of Industrial Information Integration, 40, 100606. https://doi.org/10.1016/j.jii.2024.100606
Bauer, J. C., Trattnig, S., Vieltorf, F., & Daub, R. (2026). Handling data drift in deep learning-based quality monitoring : evaluating calibration methods using the example of friction stir welding. Journal of Intelligent Manufacturing, 37(2), 759–774. https://doi.org/10.1007/s10845-025-02569-6
Cabrera, Á. A., Fu, E., Bertucci, D., Holstein, K., Hong, J. I., & Perer, A. (2023). Zeno : An Interactive Framework for Behavioral Evaluation of Machine Learning. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23), April 23â•fi28, 2023, Hamburg, Germany (Vol. 1, Issue 1). Association for Computing Machinery. https://doi.org/10.1145/3544548.3581268
Chen, W., Yang, K., Yu, Z., Shi, Y., & Chen, C. L. P. (2024). A survey on imbalanced learning : latest research , applications and future directions (Vol. 123). https://doi.org/10.1007/s10462-024-10759-6
Dong, S., Wang, Q., Sahri, S., Palpanas, T., & Srivastava, D. (2024). Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines. Proceedings of the VLDB Endowment, 11(17), 3072–3081. https://doi.org/10.14778/3681954.3681984
Faddi, Z., Mata, K. da, Silva, P., Nagaraju, V., Ghosh, S., Kul, G., & Fiondella, L. (2025). Quantitative assessment of machine learning reliability and resilience. Risk Analysis, 45(4), 790–807. https://doi.org/10.1111/risa.14666
Giacobazzi, R., Mastroeni, I., & Perantoni, E. (2024). Adversities in Abstract Interpretation : Accommodating Robustness by Abstract Interpretation. ACM Transactions on Programming Languages and Systems, 46(2), 1–31. https://doi.org/10.1145/3649309
Habbal, A., A, M. K. A., & Abuzaraida, M. A. (2024). Artificial Intelligence Trust, Risk and Security Management (AI TRiSM): Frameworks, applications, challenges and future research directions. Expert Systems with Applications, 240, 122442. https://doi.org/10.1016/j.eswa.2023.122442
Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., & Huang, K. (2024). Interpreting Black ‑ Box Models : A Review on Explainable Artificial Intelligence. Cognitive Computation, 16(1), 45–74. https://doi.org/10.1007/s12559-023-10179-8
Huang, G., Xiao, L., Pedrycz, W., Zhang, G., & Martinez, L. (2023). Failure Mode and Effect Analysis Using T-Spherical Fuzzy Maximizing Deviation and Combined Comparison Solution Methods. IEEE Transactions on Reliability, 72(2), 552–573. https://doi.org/10.1109/TR.2022.3194057
Ige, A. B., Adepoju, P. A., Akinade, A. O., & Afolabi, A. I. (2025). Machine Learning in Industrial Applications : An In-Depth Review and Future Directions Rec. International Journal of Multidisciplinary Research and Growth Evaluation, 6(1), 36–44. https://doi.org/10.54660/.IJMRGE.2025.6.1.36-44
Jung, J., Ko, Y., So, H., Lee, K., & Shrivastava, A. (2022). Root cause analysis of soft-error-induced failures from hardware and software perspectives. Journal of Systems Architecture, 130, 102652. https://doi.org/10.1016/j.sysarc.2022.102652
Li, Y., Zhang, C., Qi, H., & Lyu, S. (2024). AdaNI: Adaptive Noise Injection to improve adversarial robustness. Computer Vision and Image Understanding, 283, 103855. https://doi.org/10.1016/j.cviu.2023.103855
Liao, L., Li, H., Shang, W., & Ma, L. (2022). An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks. ACM Transactions on Software Engineering and Methodology (TOSEM), 31(3), 1–40. https://doi.org/10.1145/350669
Monfort-lanzas, P., Rungger, K., Madersbacher, L., & Hackl, H. (2025). Machine learning to dissect perturbations in complex cellular systems. Computational and Structural Biotechnology Journal, 27, 832–842. https://doi.org/10.1016/j.csbj.2025.02.028
Mumuni, A., & Mumuni, F. (2022). Data augmentation : A comprehensive survey of modern approaches. Array, 16, 100258. https://doi.org/10.1016/j.array.2022.100258
Ott, F., Rügamer, D., Heublein, L., & Mutschler, C. (2022). Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift. MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, 15(22), 5934–5943. https://doi.org/10.1145/3503161.3548167
Qiu, Q., Maillart, L. M., Prokopyev, O. A., & Cui, L. (2023). Optimal Condition-Based Mission Abort Decisions. IEEE Transactions on Reliability, 72(1), 408–425. https://doi.org/10.1109/TR.2022.3172377
Ramesh, J. V. N., Sonker, A., Indumathi, G., Balakrishnan, D., Nimma, D., & Karthik, J. (2025). Bayesian neural networks for probabilistic modeling of thermal dynamics in multiscale tissue engineering scaffolds. Journal of Thermal Biology, 130, 104134. https://doi.org/10.1016/j.jtherbio.2025.104134
Salvi, M., Seoni, S., Campagner, A., Gertych, A., Acharya, U. R., Molinari, F., & Cabitza, F. (2025). Explainability and uncertainty : Two sides of the same coin for enhancing the interpretability of deep learning models in healthcare. International Journal of Medical Informatics, 197, 105846. https://doi.org/10.1016/j.ijmedinf.2025.105846
Smith, P. J., & Spencer, A. L. (2024). Use of Human-Automation Taxonomies for System Modeling. Journal of Cognitive Engineering and Decision Making, 18(4), 286–292. https://doi.org/10.1177/15553434241234157
Theng, D., & Bhoyar, K. K. (2024). Feature selection techniques for machine learning : a survey of more than two decades of research. In Knowledge and Information Systems (Vol. 66, Issue 3). Springer London. https://doi.org/10.1007/s10115-023-02010-5
Tripathi, H., & Pandey, C. K. (2025). Enhancing Security Against Adversarial Attacks Using Robust Machine Learning. International Journal of Advanced Engineering and Nano Technology, 12(1), 1–4. https://doi.org/10.35940/ijaent.A0485.12010125
Truong, H., Truong-Huu, T., & Cao, T.-D. (2023). Making distributed edge machine learning for resource-constrained communities and environments smarter : contexts and challenges. Journal of Reliable Intelligent Environments, 9(2), 119–134. https://doi.org/10.1007/s40860-022-00176-3
Wongkaew, W., Muanyoksakul, W., Ngamkhanong, C., & Sresakoolchai, J. (2024). Data driven machine learning prognostics of buckling failure modes in ballasted railway track. Discover Applied Sciences, 6(4), 121. https://doi.org/10.1007/s42452-024-05885-3
Zhou, L., Schellaert, W., Martínez-plumed, F., Moros-daval, Y., Ferri, C., & Hernández-orallo, J. (2024). Larger and more instructable language models become less reliable. Nature, 634, 61–68. https://doi.org/10.1038/s41586-024-07930-y
Lau Meng Cheng