[1] P. Mehta and S. Pandya, "A review on sentiment analysis methodologies, practices and applications," International Journal of Scientific and Technology Research, vol. 9, no. 2, pp. 601-609, 2020.
[2] A. Gandhi, K. Adhvaryu, S. Poria, E. Cambria, and A. Hussain, "Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions," Information Fusion, vol. 91, pp. 424-444, 2023, doi: 10.1016/j.inffus.2022.09.025.
[3] S. K. D'mello and J. Kory, "A review and meta-analysis of multimodal affect detection systems," ACM computing surveys (CSUR), vol. 47, no. 3, pp. 1-36, 2015, doi: 10.1145/2682899.
[4] E. Cambria, D. Hazarika, S. Poria, A. Hussain, and R. Subramanyam, "Benchmarking multimodal sentiment analysis," in International conference on computational linguistics and intelligent text processing, 2017: Springer, pp. 166-179, doi: 10.1007/978-3-319-77116-8_13.
[5] K. Dashtipour, M. Gogate, E. Cambria, and A. Hussain, "A novel context-aware multimodal framework for persian sentiment analysis," Neurocomputing, vol. 457, pp. 377-388, 2021, doi: 10.1016/j.neucom.2021.02.020.
[6] A. Naseri Karimvand, S. Nemati, R. Salehi Ghegeni, and M. E. Basiri, "Multimodal Sentiment Analysis of Social Media Posts Using Deep Neural Networks," International Journal of Web Research, vol. 4, no. 1, pp. 1-9, 2021, doi: 10.22133/ijwr.2021.289091.1096.
[7] M. Zareinejad and Z. Beheshtifard, "Aks-Nazar: Introducing a Persian-English Dataset for Multimodal Sentiment Analysis," presented at the International Conference on Artificial Intelligence and Future Civilization, Tehran, 2025. [Online]. Available: https://civilica.com/doc/2195786.
[8] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[9] G. E. Hinton, A. Krizhevsky, and S. D. Wang, "Transforming auto-encoders," in International conference on artificial neural networks, 2011: Springer, pp. 44-51, doi: 10.1007/978-3-642-21735-7_6.
[10] Y. Wang, A. Sun, J. Han, Y. Liu, and X. Zhu, "Sentiment analysis by capsules," in Proceedings of the 2018 world wide web conference, 2018, pp. 1165-1174, doi: 10.1145/3178876.3186015.
[11] Y. Cheng et al., "Sentiment analysis using multi-head attention capsules with multi-channel CNN and bidirectional GRU," IEEE Access, vol. 9, pp. 60383-60395, 2021, doi: 10.1109/ACCESS.2021.3073988.
[12] L. Sun, X. Li, B. Zhang, Y. Ye, and B. Xu, "Learning stance classification with recurrent neural capsule network," in Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part I 8, 2019: Springer, pp. 277-289, doi: 10.1007/978-3-030-32233-5_22.
[13] M. Memari, S. M. Nejad, A. P. Rabiei, M. Eisaei, and S. Hesaraki, "BERTCaps: BERT Capsule for Persian Multi-Domain Sentiment Analysis," arXiv preprint arXiv:2412.05591, 2024, doi: 10.48550/arXiv.2412.05591.
[14] A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020, doi: 10.48550/arXiv.2010.11929.
[15] S. Taylor and F. Fauzi, "Multimodal Sentiment Analysis for the Malay Language: New Corpus using CNN-based Framework," ACM Transactions on Asian and Low-Resource Language Information Processing, 2024, doi: 10.1145/3703445.
[16] P. Chaudhari, P. Nandeshwar, S. Bansal, and N. Kumar, "MahaEmoSen: Towards Emotion-aware Multimodal Marathi Sentiment Analysis," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 9, pp. 1-24, 2023, doi: 10.1145/3618057.
[17] R. Das and T. D. Singh, "Image–text multimodal sentiment analysis framework of assamese news articles using late fusion," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 6, pp. 1-30, 2023, doi: 10.1145/3584861.
[18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.0480 2018, doi: 10.48550/arXiv.1810.04805.
[19] M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, "Parsbert: Transformer-based model for persian language understanding," Neural Processing Letters, vol. 53, pp. 3831-3847, 2021, doi: 10.1007/s11063-021-10528-4.
[20] M. Oquab et al., "Dinov2: Learning robust visual features without supervision," arXiv preprint arXiv:2304.07193, 2023, doi: 10.48550/arXiv.2304.07193.
[21] T. Baltrušaitis, C. Ahuja, and L.-P. Morency, "Multimodal machine learning: A survey and taxonomy," IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 2, pp. 423-443, 2018, doi: 10.1109/TPAMI.2018.2798607.
[22] L. Zhu, Z. Zhu, C. Zhang, Y. Xu, and X. Kong, "Multimodal sentiment analysis based on fusion methods: A survey," Information Fusion, vol. 95, pp. 306-325, 2023, doi: 10.1016/j.inffus.2023.02.028.
[23] M. T. Ribeiro, S. Singh, and C. Guestrin, "" Why should i trust you?" Explaining the predictions of any classifier," in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135-1144, doi: 10.1145/2939672.2939778.
[24] S. A. Hicks et al., "On evaluation metrics for medical applications of artificial intelligence," Scientific reports, vol. 12, no.1, p. 5979, 2022, doi: 10.1038/s41598-022-09954-8.
[25] N. Rodis, C. Sardianos, P. Radoglou-Grammatikis, P. Sarigiannidis, I. Varlamis, and G. T. Papadopoulos, "Multimodal explainable artificial intelligence: A comprehensive review of methodological advances and future research directions," IEEe Access, 2024, doi: 10.1109/ACCESS.2024.3467062.