[1] T. Qi, S. Wang, C. Lu and T. Song, "PromptEVC: Controllable Emotional Voice Conversion with Natural Language Prompts," in Interspeech, Rotterdam, The Netherlands, 2025.
[2] K. Zhou, B. Sisman, R. Liu and H. Li, "Emotional voice conversion: Theory, databases and ESD," Speech Communication, vol. 137, pp. 1-18, 2022.
[3] L. R. Murray and J. L. Arnott, "Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion," The Journal of the Acoustical Society of America, 1993.
[4] K. Waghmare, S. Kayte and B. Gawali, "Analysis of Pitch and Duration in Speech Synthesis using PSOLA," Communications on Applied Electronics (CAE), vol. 4, no. 4, pp. 10-18, 2016.
[5] P. Y. Oudeyer, "The production and recognition of emotions in speech: features and algorithms," International Journal of Human-Computer Studies, vol. 59, pp. 157-183, 2003.
[6] S. S. Sadeghi, H. Khotanlou and M. R. Mahand, "Automatic Persian Text Emotion Detection using Cognitive Linguistic and Deep Learning," Journal of Artificial Intelligence and Data Mining (JAIDM), vol. 9, no. 2, pp. 169-179, 2021.
[7] N. Esfandian, "Phoneme Classification using Temporal Tracking of Speech Clusters in spectro- temporal domain," International Journal of Engineering (IJE), IJE Transactions A: Basics, vol. 33, no. 1, pp. 105-111, 2020.
[8] M. Aliabadi, R. Golmohammadi, M. Mansoorizadeh, H. Khotanlou and A. O. Hamadani, "An empirical technique for predicting noise exposure level in the typical embroidery workrooms using artificial neural networks," Applied Acoustics, vol. 74, p. 364–374, 2013.
[9] M. Karami Mollaei and M. Eshaghi, "A NEW ALGORITHM FOR VOICE ACTIVITY DETECTION BASED ON WAVELET PACKETS," International Journal of Engineering(IJE), IJE Transactions A: Basics, vol. 22, no. 3, pp. 225-232, 2009.
[10] A. A. Kiaei and H. Khotanlou, "Segmentation of Medical Images using Mean Value Guided Contour," Medical Image Analysis, 2017.
[11] S. Özaydın, "Examination of Energy Based Voice Activity Detection Algorithms for Noisy Speech Signals," European Journal of Science and Technology, pp. 157-163, 2019.
[12] K. Aghajani and I. Esmaili Paeen Afrakoti, "Speech Emotion Recognition Using Scalogram Based Deep Structure," International Journal of Engineering (IJE), IJE TRANSACTIONS B: Applications, vol. 33, no. 2, pp. 285-292, 2020.
[13] Y. Stylianou, "VOICE TRANSFORMATION: A SURVEY," in IEEE International Conference on Acoustics Speech and Signal Processing, 2009.
[14] O. Turk and L. M. Arslan, "Robust processing techniques for voice conversion," Computer Speech and Language, vol. 20, p. 441–467, 2006.
[15] L. Mary, Extraction of Prosody for Automatic Speaker, Language, Emotion and Speech Recognition, 2 ed., SpringerBriefs in Speech Technology, 2019.
[16] D. Ververidis and C. Kotropoulos, "Emotional speech classification using Gaussian mixture models," in 2005 IEEE International Symposium on Circuits and Systems, Kobe, Japan, 2005.
[17] D. Verma, S. K. Barnwal, A. Barve, M. K. J. Kannan, R. Gupta and R. Swaminathan, "Multimodal Sentiment Sensing and Emotion Recognition Based on Cognitive Computing Using Hidden Markov Model with Extreme Learning Machine," International Journal of Communication Networks and Information Security, vol. 14, no. 2, pp. 155-167, 2022.
[18] M. K. Reddy and K. S. Rao, "Excitation Modeling Method Based on Inverse Filtering for HMM-Based Speech Synthesis," Machine Intelligence and Signal Analysis, vol. 748, pp. 85-91, 2019.
[19] J. B. Singh and P. K. Lehana, "Emotional speech analysis using harmonic plus noise model and Gaussian mixture model," International Journal of Speech Technology, vol. 22, p. 483–496, 2019.
[20] S. Karimi and M. H. Sedaaghi, "How to categorize emotional speech signals with respect to the speaker’s degree of emotional intensity," Turkish Journal of Electrical Engineering & Computer Sciences, vol. 24, p. 1306–1324, 2016.
[21] J. Holmes and W. Holmes, Speech Synthesis and Recognition, Second ed., London: Taylor & Francis, 2001.
[22] L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Boston: Now the essence of knowledge, 2007.
[23] M. Mansoorizadeh and N. Moghaddam Charkari, "Multimodal information fusion application to human emotion recognition from face and speech," Multimedia Tools and Applications, vol. 49, p. 277–297, 2010.
[24] A. V. Oppenheim, A. S. Willsky and H. Nawab, Signals and systems, New Jersey: Prentice-Hall, 1996.
[25] S. Hadiyoso, I. D. Irawati and A. Rizal, "Epileptic Electroencephalogram Classification using Relative Wavelet Sub-band Energy and Wavelet Entropy," International Journal of Engineering(IJE), Transactions A: Basics, vol. 34, no. 1, pp. 75-81, 2021.
[26] M. Jalil, A. Butt and A. Malik, "Short-Time Energy, Magnitude, Zero Crossing Rate and Autocorrelation Measurement for Discriminating Voiced and Unvoiced segments of Speech Signals," in International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), Konya, Turkey, 2013.
[27] M. Hamidi and M. Mansoorizade, "EMOTION RECOGNITION FROM PERSIAN SPEECH WITH NEURAL NETWORK," International Journal of Artificial Intelligence & Applications (IJAIA), vol. 3, no. 5, pp. 107-112, 2012.
[28] H.K. Vydana, S.R. Kadiri and A.K. Vuppala, "Vowel-Based Non-uniform Prosody Modification for Emotion Conversion," Circuits, Systems and Signal Processing, vol. 35, p. 1643–1663, 2016.
[29] P. R. Hill, Audio and Speech Processing with MATLAB, New York: CRC Press, 2019.
[30] N. Keshtiari, M. Kuhlmann, M. Eslami and G. Klann-Delius, "Recognizing emotional speech in Persian: A validated database of Persian emotional speech (Persian ESD)," Behavior Research Methods, vol. 47, p. 275–294, 2015.
[31] S. H. Mohammadi and A. Kain, "An overview of voice conversion systems," Speech Communication, vol. 88, p. 65–82, 2017.
[32] K. S. Rao, Predicting Prosody from Text for Text-to-Speech Synthesis, New York: Springer Briefs in Electrical and Computer Engineering, 2012.
[33] M. Nikzar, H. Khotanlou and M. Dezfoulian, "THE RELATIONSHIP BETWEEN THE NUMBER OF EXTREMA OF COMPOUND SINUSOIDAL SIGNALS AND ITS HIGH-FREQUENCY COMPONENT," Journal of Mahani Mathematical Research (JMMR), vol. 13, no. 1, pp. 181-195, 2023.
[34] A. Salarpour and H. Khotanlou, "An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering," International Journal of Engineering (IJE), IJE TRANSACTIONS B: Applications, vol. 31, no. 2, pp. 250-262, 2018.
[35] R. C. Streijl, S. Winkler and D. S. Hands, "Mean Opinion Score (MOS) revisited:Methods and applications, limitations and alternatives," Multimedia Systems, vol. 22, p. 213–227, 2016.