Failure Mode Analysis of Machine Learning Models in Realistic Data Deployment Scenarios

Main Article Content

  Lau Meng Cheng
  Amel Zulfukar Hassan Adlan

Abstract

Background: Machine learning models frequently demonstrate strong performance under controlled benchmark evaluations. However, such evaluations often fail to capture hidden vulnerabilities that emerge under realistic deployment conditions. In real-world environments, models are exposed to stressors such as label corruption, feature noise, distributional shifts, and operational constraints, including reduced computational precision and increased latency. These conditions can induce performance degradation and structural instability, highlighting the need for a systematic robustness evaluation framework that goes beyond conventional accuracy metrics.
Aims: This paper aims to introduce a formalized Failure Mode Analysis Protocol (FMAP) for evaluating machine learning model robustness under realistic operational stressors. The study reconceptualizes robustness evaluation as a distribution-based process, where model deployment itself generates a new distribution over time.
Methods: The proposed FMAP framework evaluates model behavior under progressively adverse conditions, including symmetric label corruption, additive feature noise, distributional shifts, and operational constraints such as reduced numerical precision and increased inference latency. Experiments were conducted across diverse tabular and image benchmark datasets using representative model architectures, including linear models, ensemble methods, margin-based models, and deep neural networks.
Result: The experiments reveal distinct robustness profiles across model architectures when exposed to escalating stress conditions. Operational constraints and compositional limitations were shown to induce measurable degradation patterns, including instability and output collapse under extreme stress. The findings demonstrate that model failure is not solely a function of predictive accuracy loss but is closely linked to operational constraints and evolving distributional conditions. The distribution-based evaluation framework effectively captures early-stage degradation and full failure transitions.
Conclusion: This study establishes a structured protocol for analyzing machine learning failure modes under realistic deployment scenarios. By framing robustness evaluation as a distribution-based process, the FMAP approach provides a systematic method for identifying operational risks and structural vulnerabilities.

Article Details

How to Cite
Meng Cheng, L., & Hassan Adlan, A. Z. (2026). Failure Mode Analysis of Machine Learning Models in Realistic Data Deployment Scenarios. International Journal of Advances in Artificial Intelligence and Machine Learning, 3(1), 54–66. https://doi.org/10.58723/ijaaiml.v3i1.651
Section
Articles

References

Abdelkader, M. M., & Csámer, Á. (2025). Comparative assessment of machine learning models for landslide susceptibility mapping : a focus on validation and accuracy. Natural Hazards, 121(9), 10299–10321. https://doi.org/10.1007/s11069-025-07197-0

Ahmed, E. (2024). Student Performance Prediction Using Machine Learning Algorithms. Applied Computational Intelligence and Soft Computing, 1. https://doi.org/10.1155/2024/4067721

An, J., Hu, X., Gong, L., Zou, Z., & Zheng, L.-R. (2024). Fuzzy reliability evaluation and machine learning-based fault prediction of wind turbines. Journal of Industrial Information Integration, 40, 100606. https://doi.org/10.1016/j.jii.2024.100606

Bauer, J. C., Trattnig, S., Vieltorf, F., & Daub, R. (2026). Handling data drift in deep learning-based quality monitoring : evaluating calibration methods using the example of friction stir welding. Journal of Intelligent Manufacturing, 37(2), 759–774. https://doi.org/10.1007/s10845-025-02569-6

Cabrera, Á. A., Fu, E., Bertucci, D., Holstein, K., Hong, J. I., & Perer, A. (2023). Zeno : An Interactive Framework for Behavioral Evaluation of Machine Learning. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23), April 23â•fi28, 2023, Hamburg, Germany (Vol. 1, Issue 1). Association for Computing Machinery. https://doi.org/10.1145/3544548.3581268

Chen, W., Yang, K., Yu, Z., Shi, Y., & Chen, C. L. P. (2024). A survey on imbalanced learning : latest research , applications and future directions (Vol. 123). https://doi.org/10.1007/s10462-024-10759-6

Dong, S., Wang, Q., Sahri, S., Palpanas, T., & Srivastava, D. (2024). Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines. Proceedings of the VLDB Endowment, 11(17), 3072–3081. https://doi.org/10.14778/3681954.3681984

Faddi, Z., Mata, K. da, Silva, P., Nagaraju, V., Ghosh, S., Kul, G., & Fiondella, L. (2025). Quantitative assessment of machine learning reliability and resilience. Risk Analysis, 45(4), 790–807. https://doi.org/10.1111/risa.14666

Giacobazzi, R., Mastroeni, I., & Perantoni, E. (2024). Adversities in Abstract Interpretation : Accommodating Robustness by Abstract Interpretation. ACM Transactions on Programming Languages and Systems, 46(2), 1–31. https://doi.org/10.1145/3649309

Habbal, A., A, M. K. A., & Abuzaraida, M. A. (2024). Artificial Intelligence Trust, Risk and Security Management (AI TRiSM): Frameworks, applications, challenges and future research directions. Expert Systems with Applications, 240, 122442. https://doi.org/10.1016/j.eswa.2023.122442

Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., & Huang, K. (2024). Interpreting Black ‑ Box Models : A Review on Explainable Artificial Intelligence. Cognitive Computation, 16(1), 45–74. https://doi.org/10.1007/s12559-023-10179-8

Huang, G., Xiao, L., Pedrycz, W., Zhang, G., & Martinez, L. (2023). Failure Mode and Effect Analysis Using T-Spherical Fuzzy Maximizing Deviation and Combined Comparison Solution Methods. IEEE Transactions on Reliability, 72(2), 552–573. https://doi.org/10.1109/TR.2022.3194057

Ige, A. B., Adepoju, P. A., Akinade, A. O., & Afolabi, A. I. (2025). Machine Learning in Industrial Applications : An In-Depth Review and Future Directions Rec. International Journal of Multidisciplinary Research and Growth Evaluation, 6(1), 36–44. https://doi.org/10.54660/.IJMRGE.2025.6.1.36-44

Jung, J., Ko, Y., So, H., Lee, K., & Shrivastava, A. (2022). Root cause analysis of soft-error-induced failures from hardware and software perspectives. Journal of Systems Architecture, 130, 102652. https://doi.org/10.1016/j.sysarc.2022.102652

Li, Y., Zhang, C., Qi, H., & Lyu, S. (2024). AdaNI: Adaptive Noise Injection to improve adversarial robustness. Computer Vision and Image Understanding, 283, 103855. https://doi.org/10.1016/j.cviu.2023.103855

Liao, L., Li, H., Shang, W., & Ma, L. (2022). An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks. ACM Transactions on Software Engineering and Methodology (TOSEM), 31(3), 1–40. https://doi.org/10.1145/350669

Monfort-lanzas, P., Rungger, K., Madersbacher, L., & Hackl, H. (2025). Machine learning to dissect perturbations in complex cellular systems. Computational and Structural Biotechnology Journal, 27, 832–842. https://doi.org/10.1016/j.csbj.2025.02.028

Mumuni, A., & Mumuni, F. (2022). Data augmentation : A comprehensive survey of modern approaches. Array, 16, 100258. https://doi.org/10.1016/j.array.2022.100258

Ott, F., Rügamer, D., Heublein, L., & Mutschler, C. (2022). Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift. MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, 15(22), 5934–5943. https://doi.org/10.1145/3503161.3548167

Qiu, Q., Maillart, L. M., Prokopyev, O. A., & Cui, L. (2023). Optimal Condition-Based Mission Abort Decisions. IEEE Transactions on Reliability, 72(1), 408–425. https://doi.org/10.1109/TR.2022.3172377

Ramesh, J. V. N., Sonker, A., Indumathi, G., Balakrishnan, D., Nimma, D., & Karthik, J. (2025). Bayesian neural networks for probabilistic modeling of thermal dynamics in multiscale tissue engineering scaffolds. Journal of Thermal Biology, 130, 104134. https://doi.org/10.1016/j.jtherbio.2025.104134

Salvi, M., Seoni, S., Campagner, A., Gertych, A., Acharya, U. R., Molinari, F., & Cabitza, F. (2025). Explainability and uncertainty : Two sides of the same coin for enhancing the interpretability of deep learning models in healthcare. International Journal of Medical Informatics, 197, 105846. https://doi.org/10.1016/j.ijmedinf.2025.105846

Smith, P. J., & Spencer, A. L. (2024). Use of Human-Automation Taxonomies for System Modeling. Journal of Cognitive Engineering and Decision Making, 18(4), 286–292. https://doi.org/10.1177/15553434241234157

Theng, D., & Bhoyar, K. K. (2024). Feature selection techniques for machine learning : a survey of more than two decades of research. In Knowledge and Information Systems (Vol. 66, Issue 3). Springer London. https://doi.org/10.1007/s10115-023-02010-5

Tripathi, H., & Pandey, C. K. (2025). Enhancing Security Against Adversarial Attacks Using Robust Machine Learning. International Journal of Advanced Engineering and Nano Technology, 12(1), 1–4. https://doi.org/10.35940/ijaent.A0485.12010125

Truong, H., Truong-Huu, T., & Cao, T.-D. (2023). Making distributed edge machine learning for resource-constrained communities and environments smarter : contexts and challenges. Journal of Reliable Intelligent Environments, 9(2), 119–134. https://doi.org/10.1007/s40860-022-00176-3

Wongkaew, W., Muanyoksakul, W., Ngamkhanong, C., & Sresakoolchai, J. (2024). Data driven machine learning prognostics of buckling failure modes in ballasted railway track. Discover Applied Sciences, 6(4), 121. https://doi.org/10.1007/s42452-024-05885-3

Zhou, L., Schellaert, W., Martínez-plumed, F., Moros-daval, Y., Ferri, C., & Hernández-orallo, J. (2024). Larger and more instructable language models become less reliable. Nature, 634, 61–68. https://doi.org/10.1038/s41586-024-07930-y