Speech Analysis Language Identification and Translation
Main Article Content
Abstract
Background of study: The increasing globalization of communication has intensified the need for systems capable of automatically identifying spoken languages and providing accurate, real-time translation. With advancements in speech processing and machine learning, an integrated framework for speech analysis, language identification, and translation has become both feasible and necessary.
Aims: This paper aims to develop and evaluate a comprehensive system that performs automatic speech preprocessing, language identification, speech recognition, and machine translation. The study focuses on designing a multilingual pipeline capable of detecting multiple languages, converting speech to text, and translating the output into a target language with high accuracy and usability.
Methods: A multilingual speech corpus comprising recordings in English, Spanish, French, and Mandarin was used. Audio underwent preprocessing, feature extraction using MFCCs and spectrograms, and language identification using CNN-based MFCC classifiers as well as i-vector and x-vector models. Speech recognition was conducted using pre-trained ASR systems such as Whisper and DeepSpeech, followed by neural machine translation (NMT). System performance was evaluated through accuracy, precision, recall, BLEU scores, real-time factor (RTF), and user experience assessments.
Result: The proposed system demonstrated strong performance across the LID, ASR, and translation components. CNN-based language identification achieved high accuracy across multilingual inputs, while ASR models produced coherent transcriptions suitable for downstream translation. Translation evaluation using BLEU scores and qualitative human review confirmed that the pipeline maintained contextual accuracy. The system also showed robustness across varying speakers, accents, and noise conditions.
Conclusion: The integrated Speech Analysis, Language Identification, and Translation system provides an effective solution for overcoming language barriers in real-time communication. By combining noise-reduced audio preprocessing, reliable language detection, and accurate translation, the system offers a user-friendly platform suitable for multilingual applications. Future improvements include expanding the language set, enhancing robustness against dialectal variation, and deploying the model on lightweight edge devices for real-time applications.
Article Details
Copyright (c) 2025 G. Ramya, N. Chandra lekha , P. Pranathi, P. Sahana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Abdurrahman, A. I., & Zahra, A. (2021). Spoken language identification using i-vectors , x-vectors , PLDA and logistic regression. Bulletin of Electrical Engineering and Informatics, 10(4), 2237–2244. https://doi.org/10.11591/eei.v10i4.2893
Ahlawat, H., Aggarwal, N., & Gupta, D. (2025). International Journal of Cognitive Computing in Engineering Automatic Speech Recognition : A survey of deep learning techniques and approaches. International Journal of Cognitive Computing in Engineering, 6(January), 201–237. https://doi.org/10.1016/j.ijcce.2024.12.007
Alashban, A. A., Qamhan, M. A., Meftah, A. H., & Alotaibi, Y. A. (2022). applied sciences Spoken Language Identification System Using Convolutional Recurrent Neural Network. Applied Sciences, 12(18), 9181. https://doi.org/10.3390/app12189181
Ali, S., Tanweer, S., Khalid, S. S., & Rao, N. (2021). Mel Frequency Cepstral Coefficient : A Review. Conference: Proceedings of the 2nd International Conference on ICT for Digital, Smart, and Sustainable Development, 1–10. https://doi.org/10.4108/eai.27-2-2020.2303173
Amiri, S. M. H. (2025). Beyond language barriers : Multilingual NLP and voice recognition for global connectivity. International Journal of Science and Research Archive, 15(02), 406–419. https://doi.org/10.2139/ssrn.5254434
Aysa, Z., Ablimit, M., & Hamdulla, A. (2023). applied sciences Multi-Scale Feature Learning for Language Identification of Overlapped Speech. Applied Sciences, 13(7), 4235. https://doi.org/10.3390/app13074235
Bhatti, M. A., & Alzahrani, S. A. (2023). Navigating Linguistic Barriers : Exploring the Experiences of Host National Connectedness Among Multilingual Individuals. Eurasian Journal of Applied Linguistics, 9(3), 96–112. https://doi.org/10.32601/ejal
Datta, G., Joshi, N., & Gupta, K. (2022). Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score. In Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-031-20980-2_14
Gondi, S., & Pratap, V. (2021). Performance and Efficiency Evaluation of ASR Inference on the Edge. Sustainability, 13(22), 1–15. https://doi.org/10.3390/su132212392
Gong, Y., Khurana, S., Karlinsky, L., & Glass, J. (2023). Whisper-AT : Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers ESC-50 Class-wise F1-Score. ArXiv, 2798–2802. https://doi.org/10.48550/arXiv.2307.03183
Hansen, J. H. L., Bokshi, M., & Khorram, S. (2020). Speech variability : A cross-language study on acoustic variations of speaking versus untrained singing. The Journal of the Acoustical Society of America, 148(2), 829–844. https://doi.org/10.1121/10.0001526
Hollands, S., Blackburn, D., & Christensen, H. (2022). Evaluating the Performance of State-of-the-Art ASR Systems on Non-Native English using Corpora with Extensive Language Background Variation. Interspeech, 3958–3962. https://doi.org/10.21437/Interspeech.2022-10433
Kulkarni, S. V., & Pal, S. (2024). A Review on Language-Independent Search on Speech and its Applications. IEEE Access, 12, 194182–194202. https://doi.org/10.1109/ACCESS.2024.3520394
Mandal, A., Pal, S., Dutta, I., Bhattacharya, M., & Naskar, S. K. (2025). Is Attention always needed ? A case study on language identification from speech. Natural Language Processing, 31(2), 250–276. https://doi.org/10.1017/nlp.2024.22
Palivela, H., Narvekar, M., Asirvatham, D., Bhusan, S., Member, S., & Agarwal, U. (2025). Code-Switching ASR for Low-Resource Indic Languages : A Hindi-Marathi Case Study. IEEE Access, 13, 9171–9198. https://doi.org/10.1109/ACCESS.2025.3527745
Senapati, C., & Roy, U. (2025). Multilingual ASR Model for Kudmali Voice Recognition. International Journal of Computer Applications, 186(64), 27–35. https://doi.org/10.5120/ijca2025924462
Sharrab, Y. O., Attar, H., Eljinini, M. A. H., & Al-omary, Y. (2025). Advancements in Speech Recognition : A Systematic Review of Deep Learning Transformer Models , Trends , Innovations , and Future Directions. IEEE Access, 13, 46925–46940. https://doi.org/10.1109/ACCESS.2025.3550855
Shaughnessy, D. O. (2025). Spoken language identification : An overview of past and present research trends. Speech Communication, 167(November 2023), 103167. https://doi.org/10.1016/j.specom.2024.103167
Singh, G., Sharma, S., Kumar, V., Kaur, M., Baz, M., & Masud, M. (2021). Spoken Language Identification Using Deep Learning. Computational Intelligence and Neuroscience, 12. https://doi.org/10.1155/2021/5123671
Xu, H. (2024). Improving English Speech Recognition System Accuracy Using Machine Learning. ACM International Conference Proceeding Series, 73–78. https://doi.org/10.1145/3703187.3703200
Yadav, A., Raj, A., Anand, S., Kumar, V., & Kumar, A. (2024). Deep Audio Classifier : An Artificial Neural Network Approach. Soft Computing Fusion with Applications, 1(2), 103–112. https://doi.org/10.22105/scfa.v1i2.35
Yousif, S. T., & Mahmmod, B. M. (2025). Speech Enhancement Algorithms : A Systematic Literature Review. Algorithms, 18(5), 272. https://doi.org/10.3390/a18050272
Zaman, K., Sah, M., Direkoglu, C., & Unoki, M. (2023). A Survey of Audio Classification Using Deep Learning. IEEE Access, 11(September), 106620–106649. https://doi.org/10.1109/ACCESS.2023.3318015
Zayyanu, M., & Ahmed, U. (2024). Bridging Linguistic Divides : The Impact of AI-powered Translation Systems on Communication Equity and Inclusion. Journal of Translation and Language Studies, 5(2), 20–30. https://doi.org/10.48185/jtls.v5i2.1065
Zhao, H., Chen, H., Yang, F. A. N., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., & Du, M. (2024). Explainability for Large Language Models : A Survey. ACM Transactions on Intelligent Systems and Technology, 15(2), 1–38. https://doi.org/10.1145/3639372
G. Ramya