Speech Analysis Language Identification and Translation

G. Ramya; N. Chandra lekha; P. Pranathi; P. Sahana

doi:10.58723/ijaaiml.v2i3.461

PDF

Issue

Vol. 2 No. 3 (2025): International Journal of Advances in Artificial Intelligence and Machine Learning

Published: Nov 30, 2025

Keywords:

Fundamental frequency,
Phonetics,
Speech Analysis,
Spectrogram

G. Ramya

Vignan's Institute of Management and Technology for Women,

https://orcid.org/0009-0000-0273-2844

N. Chandra lekha

Vignan's Institute of Management and Technology for Women,

https://orcid.org/0009-0007-5892-5873

P. Pranathi

Vignan's Institute of Management and Technology for Women,

https://orcid.org/0009-0008-2630-0286

P. Sahana

Vignan's Institute of Management and Technology for Women,

https://orcid.org/0009-0004-5838-207X

Abstract

Background of study: The increasing globalization of communication has intensified the need for systems capable of automatically identifying spoken languages and providing accurate, real-time translation. With advancements in speech processing and machine learning, an integrated framework for speech analysis, language identification, and translation has become both feasible and necessary.
Aims: This paper aims to develop and evaluate a comprehensive system that performs automatic speech preprocessing, language identification, speech recognition, and machine translation. The study focuses on designing a multilingual pipeline capable of detecting multiple languages, converting speech to text, and translating the output into a target language with high accuracy and usability.
Methods: A multilingual speech corpus comprising recordings in English, Spanish, French, and Mandarin was used. Audio underwent preprocessing, feature extraction using MFCCs and spectrograms, and language identification using CNN-based MFCC classifiers as well as i-vector and x-vector models. Speech recognition was conducted using pre-trained ASR systems such as Whisper and DeepSpeech, followed by neural machine translation (NMT). System performance was evaluated through accuracy, precision, recall, BLEU scores, real-time factor (RTF), and user experience assessments.
Result: The proposed system demonstrated strong performance across the LID, ASR, and translation components. CNN-based language identification achieved high accuracy across multilingual inputs, while ASR models produced coherent transcriptions suitable for downstream translation. Translation evaluation using BLEU scores and qualitative human review confirmed that the pipeline maintained contextual accuracy. The system also showed robustness across varying speakers, accents, and noise conditions.
Conclusion: The integrated Speech Analysis, Language Identification, and Translation system provides an effective solution for overcoming language barriers in real-time communication. By combining noise-reduced audio preprocessing, reliable language detection, and accurate translation, the system offers a user-friendly platform suitable for multilingual applications. Future improvements include expanding the language set, enhancing robustness against dialectal variation, and deploying the model on lightweight edge devices for real-time applications.

How to Cite

Ramya, G., lekha , N. C., Pranathi, P., & Sahana, P. (2025). Speech Analysis Language Identification and Translation. International Journal of Advances in Artificial Intelligence and Machine Learning, 2(3), 208–216. https://doi.org/10.58723/ijaaiml.v2i3.461

Section

Articles

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

Abdurrahman, A. I., & Zahra, A. (2021). Spoken language identification using i-vectors , x-vectors , PLDA and logistic regression. Bulletin of Electrical Engineering and Informatics, 10(4), 2237–2244. https://doi.org/10.11591/eei.v10i4.2893

Ahlawat, H., Aggarwal, N., & Gupta, D. (2025). International Journal of Cognitive Computing in Engineering Automatic Speech Recognition : A survey of deep learning techniques and approaches. International Journal of Cognitive Computing in Engineering, 6(January), 201–237. https://doi.org/10.1016/j.ijcce.2024.12.007

Alashban, A. A., Qamhan, M. A., Meftah, A. H., & Alotaibi, Y. A. (2022). applied sciences Spoken Language Identification System Using Convolutional Recurrent Neural Network. Applied Sciences, 12(18), 9181. https://doi.org/10.3390/app12189181

Ali, S., Tanweer, S., Khalid, S. S., & Rao, N. (2021). Mel Frequency Cepstral Coefficient : A Review. Conference: Proceedings of the 2nd International Conference on ICT for Digital, Smart, and Sustainable Development, 1–10. https://doi.org/10.4108/eai.27-2-2020.2303173

Amiri, S. M. H. (2025). Beyond language barriers : Multilingual NLP and voice recognition for global connectivity. International Journal of Science and Research Archive, 15(02), 406–419. https://doi.org/10.2139/ssrn.5254434

Aysa, Z., Ablimit, M., & Hamdulla, A. (2023). applied sciences Multi-Scale Feature Learning for Language Identification of Overlapped Speech. Applied Sciences, 13(7), 4235. https://doi.org/10.3390/app13074235

Bhatti, M. A., & Alzahrani, S. A. (2023). Navigating Linguistic Barriers : Exploring the Experiences of Host National Connectedness Among Multilingual Individuals. Eurasian Journal of Applied Linguistics, 9(3), 96–112. https://doi.org/10.32601/ejal

Datta, G., Joshi, N., & Gupta, K. (2022). Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score. In Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-031-20980-2_14

Gondi, S., & Pratap, V. (2021). Performance and Efficiency Evaluation of ASR Inference on the Edge. Sustainability, 13(22), 1–15. https://doi.org/10.3390/su132212392

Gong, Y., Khurana, S., Karlinsky, L., & Glass, J. (2023). Whisper-AT : Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers ESC-50 Class-wise F1-Score. ArXiv, 2798–2802. https://doi.org/10.48550/arXiv.2307.03183

Hansen, J. H. L., Bokshi, M., & Khorram, S. (2020). Speech variability : A cross-language study on acoustic variations of speaking versus untrained singing. The Journal of the Acoustical Society of America, 148(2), 829–844. https://doi.org/10.1121/10.0001526

Hollands, S., Blackburn, D., & Christensen, H. (2022). Evaluating the Performance of State-of-the-Art ASR Systems on Non-Native English using Corpora with Extensive Language Background Variation. Interspeech, 3958–3962. https://doi.org/10.21437/Interspeech.2022-10433

Kulkarni, S. V., & Pal, S. (2024). A Review on Language-Independent Search on Speech and its Applications. IEEE Access, 12, 194182–194202. https://doi.org/10.1109/ACCESS.2024.3520394

Mandal, A., Pal, S., Dutta, I., Bhattacharya, M., & Naskar, S. K. (2025). Is Attention always needed ? A case study on language identification from speech. Natural Language Processing, 31(2), 250–276. https://doi.org/10.1017/nlp.2024.22

Palivela, H., Narvekar, M., Asirvatham, D., Bhusan, S., Member, S., & Agarwal, U. (2025). Code-Switching ASR for Low-Resource Indic Languages : A Hindi-Marathi Case Study. IEEE Access, 13, 9171–9198. https://doi.org/10.1109/ACCESS.2025.3527745

Senapati, C., & Roy, U. (2025). Multilingual ASR Model for Kudmali Voice Recognition. International Journal of Computer Applications, 186(64), 27–35. https://doi.org/10.5120/ijca2025924462

Sharrab, Y. O., Attar, H., Eljinini, M. A. H., & Al-omary, Y. (2025). Advancements in Speech Recognition : A Systematic Review of Deep Learning Transformer Models , Trends , Innovations , and Future Directions. IEEE Access, 13, 46925–46940. https://doi.org/10.1109/ACCESS.2025.3550855

Shaughnessy, D. O. (2025). Spoken language identification : An overview of past and present research trends. Speech Communication, 167(November 2023), 103167. https://doi.org/10.1016/j.specom.2024.103167

Singh, G., Sharma, S., Kumar, V., Kaur, M., Baz, M., & Masud, M. (2021). Spoken Language Identification Using Deep Learning. Computational Intelligence and Neuroscience, 12. https://doi.org/10.1155/2021/5123671

Xu, H. (2024). Improving English Speech Recognition System Accuracy Using Machine Learning. ACM International Conference Proceeding Series, 73–78. https://doi.org/10.1145/3703187.3703200

Yadav, A., Raj, A., Anand, S., Kumar, V., & Kumar, A. (2024). Deep Audio Classifier : An Artificial Neural Network Approach. Soft Computing Fusion with Applications, 1(2), 103–112. https://doi.org/10.22105/scfa.v1i2.35

Yousif, S. T., & Mahmmod, B. M. (2025). Speech Enhancement Algorithms : A Systematic Literature Review. Algorithms, 18(5), 272. https://doi.org/10.3390/a18050272

Zaman, K., Sah, M., Direkoglu, C., & Unoki, M. (2023). A Survey of Audio Classification Using Deep Learning. IEEE Access, 11(September), 106620–106649. https://doi.org/10.1109/ACCESS.2023.3318015

Zayyanu, M., & Ahmed, U. (2024). Bridging Linguistic Divides : The Impact of AI-powered Translation Systems on Communication Equity and Inclusion. Journal of Translation and Language Studies, 5(2), 20–30. https://doi.org/10.48185/jtls.v5i2.1065

Zhao, H., Chen, H., Yang, F. A. N., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., & Du, M. (2024). Explainability for Large Language Models : A Survey. ACM Transactions on Intelligent Systems and Technology, 15(2), 1–38. https://doi.org/10.1145/3639372

Total 16 Author's Countries
		(14)
		(9)
		(4)
		(3)
		(3)
		(2)
		(2)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
Total 7 Reviewer's Countries
		(33)
		(6)
		(2)
		(1)
		(1)
		(1)
		(1)
Total 10 Editor's Countries
		(8)
		(2)
		(2)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)
		(1)

Article Sidebar

Main Article Content

Abstract

Article Details

References