Decoding Marathi emotions: Enhanced speech emotion recognition through deep belief network-support vector machine integration

© 2025 by the Publisher. Licensee AccScience Publishing, USA. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC BY-NC 4.0) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Download PDF

Cite

XML

HTML

Abstract

Speech emotion recognition in Marathi presents considerable hurdles due to the language’s distinct grammatical and emotional characteristics. This paper presents a robust methodology for classifying emotions in Marathi speech utilizing advanced signal processing, feature extraction, and machine learning techniques. The method entails collecting diverse Marathi speech samples and using pre-processing steps such as pre-emphasis and voice activity detection to improve signal quality. Speech signals are segmented using the Hamming window to reduce discontinuities, and features such as Mel-frequency cepstral coefficients, pitch, intensity, and spectral properties are retrieved. For classification, an attentive deep belief network is paired with a support vector machine, which uses attention techniques and batch normalization to improve performance and reduce overfitting. The suggested approach surpasses existing models, with 98% accuracy, 98% F1-score, 99% specificity, 99% sensitivity, 98% precision, and 98% recall.

Keywords

Speech Emotion Recognition

Voice Activity Detection

Mel-Frequency Cepstral Coefficient

Deep Belief Network

Support Vector Machine

References

Abdel-Hamid, L., Shaker, N.H., Emara, I. (2020). Analysis of linguistic and prosodic features of bilingual Arabic-English speakers for speech emotion recognition. IEEE Access, 8, 72957–72970.

Abdusalomov, A., Kutlimuratov, A., Nasimov, R., & Whangbo, T.K. (2023). Improved speech emotion recognition focusing on high-level data representations and swift feature extraction calculation. Computers, Materials and Continua, 77(3), 2915–2933.

Akçay, M.B., & Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56–76.

Akinpelu, S., & Viriri, S. (2024). Deep learning framework for speech emotion classification: A survey of the state-of-the-art. IEEE Access, 12, 152152.

Alam Monisha, S.T., & Sultana, S. (2022). A review of the advancement in speech emotion recognition for indo-aryan and dravidian languages. In: Advances in Human-Computer Interaction. Wiley, Hoboken.

Alluhaidan, A.S., Saidani, O., Jahangir, R.,Nauman, M.A., & Neffati, O.S. (2023). Speech emotion recognition through hybrid features and convolutional neural network. Applied Sciences, 13(8), 4750.

Amartya, J.G.M., & Kumar, S.M. (2022). Speech emotion recognition in machine learning to improve accuracy using novel support vector machine and compared with decision tree algorithm. Journal of Pharmaceutical Negative Results, 185–192.

Arul, V.H. (2021). Deep learning methods for data classification. In: Artificial Intelligence in Data Mining. Academic Press, p87–108

Bachate, R.P., Sharma, A., Singh, A., Aly, A.A., Alghtani, A.H., & Le, D.N. (2022). Enhanced marathi speech recognition facilitated by grasshopper optimisation-based recurrent neural network. Computer Systems Science and Engineering, 43(2), 439–454.

Bhangale, K., & Kothandaraman, M. (2023). Speech emotion recognition based on multiple acoustic features and deep convolutional neural network. Electronics, 12(4), 839.

Byun, S.W., & Lee, S.P. (2021). A study on a speech emotion recognition system with effective acoustic features using deep learning algorithms. Applied Sciences, 11(4), 1890.

Chai, J., Zeng, H., Li, A., & Ngai, E.W. (2021). Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications, 6, 100134.

Chaudhari, P., Nandeshwar, P., Bansal, S., & Kumar, N.(2023). MahaEmoSen: Towards Emotion-aware Multimodal Marathi Sentiment Analysis. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(9), 1–24.

Er, M.B. (2020). A novel approach for classification of speech emotions based on deep and acoustic features. IEEE Access, 8, 221640–221653.

Farooq, M., Hussain, F., Baloch, N.K., Raja, F.R., Yu, H., & Zikria, Y.B. (2020). Impact of feature selection algorithm on speech emotion recognition using deep convolutional neural network. Sensors, 20, 6008.

Hammed, F.A., & George, L. (2023). Using speech signal for emotion recognition using hybrid features with SVM classifier. Wasit Journal of Computer and Mathematics Science, 2(1), 27–38.

Harhare, T., & Shah, M. (2021). Linear mixed effect modelling for analyzing prosodic parameters for marathi language emotions. International Journal of Advanced Computer Science and Applications, 12(12).

Kaur, K., & Singh, P. (2023). Comparison of various feature selection algorithms in speech emotion recognition. AIUB Journal of Science and Engineering (AJSE), 22(2), 125–131.

Kawade, R., & Jagtap, S. (2024). Indian cross corpus speech emotion recognition using multiple spectral-temporal-voice quality acoustic features and deep convolution neural network. Revue d’Intelligence Artificielle, 38(3), 913–927.

Kishor, B., Mohanaprasad, K. (2022). Speech emotion recognition using mel frequency log spectrogram and deep convolutional neural network. In: Futuristic Communication and Network Technologies. Springer, Singapore, p241–250.

Kok, C.L., Ho, C.K., Tan, F.K., & Koh, Y.Y. (2024). Machine learning-based feature extraction and classification of emg signals for intuitive prosthetic control. Applied Sciences, 14(13), 5784.

Li, R., Zhao, J., & Jin, Q. (2021). Speech Emotion Recognition Via Multi-Level Cross-Modal Distillation. In: Proceedings of Interspeech, p4488–4492.

Li, Z., Huang, H., Zhang, Z., & Shi, G. (2022). Manifold-based multi-deep belief network for feature extraction of hyperspectral image. Remote Sensing, 14(6), 1484.

Lieskovská, E., Jakubec, M., Jarina, R., & Chmulík, M. (2021). A review on speech emotion recognition using deep learning and attention mechanism. Electronics, 10(10), 1163.

Luna-Jiménez, C., Kleinlein, R., Griol, D., Callejas, Z., Montero, J.M., & Fernández-Martínez, F. (2021). A proposal for multimodal emotion recognition using aural transformers and action units on raves dataset. Applied Sciences, 12(1), 327.

Madanian, S., Chen, T., Adeleye, O., Templeton, J.M.,Poellabauer, C., Parry, D., & Schneider, S.L. (2023). Speech emotion recognition using machine learning-a systematic review. Intelligent Systems with Applications, 20, 200266.

Oh, S., & Kim, D.K. (2022). Comparative analysis of emotion classification based on facial expression and physiological signals using deep learning. Applied Sciences, 12(3), 1286.

Padman, S., & Magare, D. (2022). Regional language speech emotion detection using deep neural network. ITM Web of Conferences, 44, 03071.

Papala, G., Ransing, A., & Jain, P. (2023). Sentiment analysis and speaker diarization in hindi and marathi using finetuned whisper: Sentiment analysis in Hindi and Marathi. Scalable Computing: Practice and Experience, 24(4), 835–846.

Sajjad, M., & Kwon, S. (2020). Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access, 8, 79861–79875.

Shah, F.M., Ranjan, A., Yadav, J., Deepak, A. (2021). A survey of speech emotion recognition in the natural environment. Digital Signal Process, 110, 102951.

Singh, Y.B., & Goel, S. (2021). An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning. Multimedia Tools and Applications, 80(9), 14001–14018.

Sonawane, S., & Kulkarni, N. (2020). Speech emotion recognition based on MFCC and convolutional neural network. International Journal of Advance Scientific Research and Engineering Trends, 5, 18–22.

Subbarao, M.V., Terlapu, S.K., Geethika, N., & Harika, K.D. (2021). Speech emotion recognition using k-nearest neighbor classifiers. In: Recent Advances in Artificial Intelligence and Data Engineering: Select Proceedings of AIDE. Springer Verlag, Singapore, p123–131.

Tiwari, P., Dehdashti, S., Obeid, A.K., Marttinen, P., & Bruza, P. (2022). Kernel method based on non-linear coherent states in quantum feature space. Journal of Physics A: Mathematical and Theoretical, 55(35), 355301.

Yang, Z., Zhou, S., Zhang, L., & Serikawa, S. (2024). Optimizing Speech Emotion Recognition with Hilbert Curve and convolutional neural network. Cognitive Robotics, 4, 30–41.

Zaidi, S.A.M., Latif, S., & Qadi, J. (2023). Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers. [arXiv Preprint].

Previous article in this issue

Next article in this issue

International Journal of Systematic Innovation, Electronic ISSN: 2077-8767 Print ISSN: 2077-7973, Published by AccScience Publishing

Publisher's Core Philosophy

We are committed to support the scientific community by publishing impactful research and enhancing communication among scientists. At AccScience Publishing, we are continuously looking for ways to accelerate scientific progress and to strive for transparency and open communication, making knowledge freely accessible without barrier.

9 Raffles Place, Republic Plaza 1 #06-00 Singapore 048619

+65 8182 1586

editorial@accscience.com