[1] World Health Organization (WHO), “Depressive disorder (depression),” WHO Fact Sheets, Aug. 29, 2025. [Online]. Available:https://www.who.int/news-room/fact-sheets/detail/depression
[2] W. Guo, J. Wang, and S. Wang, “Deep multimodal representation learning: a survey,” IEEE Access, vol. 7, pp. 63373–63394, 2019, doi: 10.1109/ACCESS.2019.2916887.
[3] P. Xu, X. Zhu, and D. A. Clifton, “Multimodal learning with transformers: a survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 12113–12132, 2023, doi: 10.1109/TPAMI.2023.3275156.
[4] D. Ailyn, “Multimodal data fusion techniques,” 2024. [Online]. Available: https://www.researchgate.net/publication/383887675_Multimodal_Data_Fusion_Techniques
[5] T. Meng, X. Jing, Z. Yan, and W. Pedrycz, “A survey on machine learning for data fusion,” Information Fusion, vol. 57, pp. 115–129, 2020, doi: 10.1016/j.inffus.2019.12.001.
[6] M. Pawłowski, A. Wróblewska, and S. Sysko-Romańczuk, “Effective techniques for multimodal data fusion: a comparative analysis,” Sensors, vol. 23, p. 2381, 2023, doi: 10.3390/s23052381.
[7] D. Lee, S. Park, J. Kang, D. Choi, and J. Han, “Cross-lingual suicidal-oriented word embedding toward suicide prevention,” in Findings of the Association for Computational Linguistics: EMNLP 2020, R. Cotterell, S. Eger, and S. Wiseman, Eds., Online, Nov. 2020, pp. 2208-2217, doi: 10.18653/v1/2020.findings-emnlp.200.
[8] B. G. Bokolo and Q. Liu, “Deep learning-based depression detection from social media: comparative evaluation of ML and transformer techniques,” *Electronics*, vol. 12, no. 21, p. 4396, 2023, doi: 10.3390/electronics12214396.
[9] L. Zhu, Z. Zhu, C. Zhang, Y. Xu, and X. Kong, “Multimodal sentiment analysis based on fusion methods: a survey,” *Information Fusion*, vol. 95, pp. 306–325, 2023, doi: 10.1016/j.inffus.2023.02.028.
[10] G. Coppersmith, M. Dredze, and C. Harman, “Quantifying mental health signals in Twitter,” in Proc. Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Baltimore, MD, USA, Jun. 27, 2014, pp. 51–60, doi: 10.3115/v1/W14-3207.
[11] M. Deshpande and V. Rao, “Depression detection using emotion artificial intelligence,” in *Proc. 19th IEEE Int. Conf. Intelligent Sustainable Systems*, Palladam, Tirupur, India, Dec. 2017, pp. 858-862, doi: 10.1109/ISS1.2017.8389299.
[12] M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz, “Predicting depression via social media,” in Proc. Int. AAAI Conf. Web and Social Media, Limassol, Cyprus, Jun. 5–8, 2013, vol. 7, pp. 128–137, doi: 10.1609/icwsm.v7i1.14432.
[13] A. Murarka, B. Radhakrishnan, and S. Ravichandran, “Classification of mental illnesses on social media using RoBERTa,” in Proc. 12th Int. Workshop on Health Text Mining and Information Analysis, E. Holderness et al., Eds., Online, 2021, pp. 59–68. [Online]. Available: https://aclanthology.org/2021.louhi-1.7/
[14] S. Yang, L. Cui, L. Wang, T. Wang, and J. You, “Enhancing multimodal depression diagnosis through representation learning and knowledge transfer,” *Heliyon*, vol. 10, no. 4, p. e25959, Feb. 2024, doi: 10.1016/j.heliyon.2024.e25959.
[15] T. Baltrušaitis, C. Ahuja, and L. Morency, “Multimodal machine learning: a survey and taxonomy,” *IEEE Trans. Pattern Anal. Mach. Intell.*, vol. 41, no. 2, pp. 423–443, 2019, doi: 10.1109/TPAMI.2018.2798607.
[16] S. Yang, L. Cui, L. Wang, T. Wang, and J. You, “Cross-modal contrastive learning for multimodal sentiment recognition,” *Applied Intelligence*, vol. 54, pp. 4260–4276, 2024, doi: 10.1007/s10489-024-05355-8.
[17] M. Fang, S. Peng, Y. Liang, C.-C. Hung, and S. Liu, “A multimodal fusion model with multi-level attention mechanism for depression detection,” *Biomedical Signal Processing and Control*, vol. 82, p. 104561, Apr. 2023, doi: 10.1016/j.bspc.2022.104561.
[18] Y. Wang, Z. Wang, C. Li, Y. Zhang, and H. Wang, “Online social network individual depression detection using a multitask heterogeneous modality fusion approach,” *Information Sciences*, vol. 609, pp. 727–749, 2022, doi: 10.1016/j.ins.2022.07.109.
[19] R. Wu, H. Wang, H.-T. Chen, and G. Carneiro, “Deep multimodal learning with missing modality: a survey,” *arXiv preprint* arXiv:2409.07825, 2024.
[20] L. Cai, Z. Wang, H. Gao, D. Shen, and S. Ji, “Deep adversarial learning for multi-modality missing data completion,” in *Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining*, pp. 1158–1166, 2018, doi: 10.1145/3219819.3219963.
[21] M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “SMIL: Multimodal learning with severely missing modality,” in *Proc. 35th AAAI Conf. Artificial Intelligence (AAAI ’21)*, vol. 35, no. 3, pp. 2302–2310, 2021, doi: 10.1609/AAAI.V35I3.16330.
[22] H. Wang, Y. Chen, C. Ma, J. Avery, L. Hull, and G. Carneiro, “Multi-modal learning with missing modality via shared-specific feature modelling,” in Proc. 2023 IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 15878–15887, 2023, doi: 10.1109/CVPR56347.2023.
[23] M. K. Reza, A. Prater-Bennette, and M. S. Asif, “Robust multimodal learning with missing modalities via parameter-efficient adaptation,” *arXiv preprint* arXiv:2310.03986, 2024.
[24] Q. Wang, L. Zhan, P. Thompson, and J. Zhou, “Multimodal learning with incomplete modalities by knowledge distillation,” in *Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining*, 2020, pp. 1828-1838, doi: 10.1145/3394486.3403234.
[25] X. Chen, S. Kornblith, M. Noroozi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” in *Proc. 37th Int. Conf. Machine Learning (ICML)*, 2020, pp. 1597–1607, doi: 10.48550/arXiv.2002.05709.
[26] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in *Proc. Int. Conf. Mach. Learn. (ICML)*, vol. 139, pp. 8748–8763, 2021, doi: 10.48550/arXiv.2103.00020.
[27] A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised learning,” *arXiv preprint* arXiv:2010.05113, 2021.
[28] G. Shen, J. Jia, L. Nie, F. Feng, C. Zhang, T. Hu, T.-S. Chua, and W. Zhu, “Depression detection via harvesting social media: a multimodal dictionary learning solution,” in *Proc. 26th Int. Joint Conf. Artif. Intell. (IJCAI)*, 2017, pp. 3838–3844, doi: 10.24963/ijcai.2017/535.
[29] C. Zhang, F. Nian, and J. Lee, “Toward robust multimodal learning using multimodal foundational models,” *arXiv preprint* arXiv:2401.13697, 2024, doi: 10.48550/arXiv.2401.13697.