ﻣﺪﻳﺮﻳﺖ ﻭﺟﻮﻩ ﮔﻢﺷﺪﻩ ﻣﺒﺘﻨﻲ ﺑﺮ ﻳﺎﺩﮔﻴﺮﻱ ﺗﻘﺎﺑﻠﻲ ﺩﺭ ﺍﺩﻏﺎﻡ ﺩﺍﺩﻩﻫﺎﻱ ﭼﻨﺪﻭﺟﻬﻲ ﺑﺮﺍﻱ ﺗﺸﺨﻴﺺ ﺍﻓﺴﺮﺩﮔﻲ در شبکه‌های ﺍﺟﺘﻤﺎﻋﻲ

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشجوی کارشناسی ارشد علوم کامپیوتر، دانشکده ریاضی، آمار و علوم کامپیوتر، دانشگاه تهران، تهران، ایران

2 استادیار گروه علوم کامپیوتر، دانشکده ریاضی، آمار و علوم کامپیوتر، دانشگاه تهران، تهران، ایران

3 استادیار دانشکده ریاضی، آمار و علوم کامپیوتر، دانشگاه سیستان و بلوچستان، سیستان و بلوچستان، ایران

چکیده

ﺗﺤﻠﻴﻞ داده‌های شبکه‌های ﺍﺟﺘﻤﺎﻋﻲ ﺍﻫﻤﻴﺖ ﺑﻨﻴﺎﺩﻳﻨﻲ ﺩﺭ ﺍﺳﺘﺨﺮﺍﺝ ﺍﻟﮕﻮﻫﺎﻱ ﺭﻓﺘﺎﺭﻱ ﻛﺎﺭﺑﺮﺍﻥ ﺩﺍﺭﺩ. مدل‌های ﭼﻨﺪﻭﺟﻬﻲ ﻛﻪ ﺗﺮﻛﻴﺒﻲ ﺍﺯ ﺍﻃﻼﻋﺎﺕ ﻣﺘﻨﻲ، ﺗﺼﻮﻳﺮﻱ ﻭ ﺳﺎﻳﺮ ﻣﻨﺎﺑﻊ ﺭﺍ ﺑﻪﻛﺎﺭ می‌گیرند، ﺍﺑﺰﺍﺭﻫﺎﻱ ﻣﻨﺎﺳﺒﻲ ﺑﺮﺍﻱ ﺍﻳﻦ ﺩﺳﺖ ﺍﺯ ﺗﺤﻠﻴﻞﻫﺎ ﻫﺴﺘﻨﺪ. ﺑﺎﺍﻳﻦﺣﺎﻝ، ﭼﺎﻟﺸﻲ ﺍﺳﺎﺳﻲ ﺩﺭ ﺍﻳﻦ ﻣﺪﻝﻫﺎ، ﻓﻘﺪﺍﻥ ﺑﺮﺧﻲ ﻭﺟﻮﻩ ﺩﺭ ﺑﺨﺸﻲ ﺍﺯ ﻧﻤﻮﻧﻪﻫﺎﻱ ﺩﺍﺩﻩ ﺍﺳﺖ؛ ﺑﺮﺍﻱ ﻧﻤﻮﻧﻪ، ﻛﺎﺭﺑﺮﻱ ﻣﻤﻜﻦ ﺍﺳﺖ ﺗﻨﻬﺎ ﻣﺘﻦ ﻣﻨﺘﺸﺮ ﻛﻨﺪ ﻭ ﻫﻴﭻ ﺗﺼﻮﻳﺮﻱ ﺑﻪ ﺍﺷﺘﺮﺍﻙ ﻧﮕﺬﺍﺭﺩ. ﺍﻳﻦ ﻣﺴﺌﻠﻪ ﺳﺒﺐ می‌شود ﻣﺪﻝﻫﺎﻱ ﭼﻨﺪﻭﺟﻬﻲ ﻧﺘﻮﺍﻧﻨﺪ ﺍﺯ ﺗﻤﺎﻣﻲ ﺍﻃﻼﻋﺎﺕ ﻣﻮﺟﻮﺩ ﺑﻪﻃﻮﺭ ﻛﺎﻣﻞ ﺑﻬﺮﻩﺑﺮﺩﺍﺭﻱ ﻛﻨﻨﺪ. ﺩﺭ ﺍﻳﻦ ﻣﻘﺎﻟﻪ، ﺭﻭﺷﻲ ﺑﺮﺍﻱ ﺑﻬﺮﻩﮔﻴﺮﻱ ﺍﺯ ﺩﺍﺩﻩﻫﺎﻱ ﻧﺎﻗﺺ ﺩﺭ ﻣﺪﻝﻫﺎﻱ ﭼﻨﺪﻭﺟﻬﻲ ﺍﺭﺍﺋﻪ ﺷﺪﻩ ﺍﺳﺖ. ﺍﺑﺘﺪﺍ، مدل‌های ﺗﻚﻭﺟﻬﻲ ﺑﺮﺍﻱ ﭘﺮﺩﺍﺯﺵ ﻫﺮ ﻭﺟﻪ ﺑﻪﺻﻮﺭﺕ ﻣﺴﺘﻘﻞ ﺁﻣﻮﺯﺵ ﺩﺍﺩﻩ ﺷﺪﻧﺪ. ﺳﭙﺲ، ﻳﻚ ﺭﻣﺰﮔﺬﺍﺭ ﻣﺒﺘﻨﻲ ﺑﺮ ﻳﺎﺩﮔﻴﺮﻱ ﺗﻘﺎﺑﻠﻲ ﻃﺮﺍﺣﻲ ﻭ ﺁﻣﻮﺯﺵ ﺩﺍﺩﻩ ﺷﺪ ﻛﻪ ﻫﺪﻑ ﺁﻥ، ﺑﺮﺁﻭﺭﺩ ﺑﺮﺩﺍﺭ ﻭﻳﮋﮔﻲ ﻭﺟﻮﻩ ﮔﻢﺷﺪﻩ ﺑﺎ ﺍﺗﻜﺎ ﺑﻪ ﻭﻳﮋﮔﻲﻫﺎﻱ ﻭﺟﻮﻩ ﻣﻮﺟﻮﺩ ﺍﺳﺖ. ﺩﺭ ﻧﻬﺎﻳﺖ، داده‌های ﻣﺘﻨﻲ ﻭ ﺗﺼﻮﻳﺮﻱ ) ﻭﺍﻗﻌﻲ ﻳﺎ ﺑﺎﺯﺳﺎﺯﻱﺷﺪﻩ (ﺩﺭ ﻳﻚ ﻣﺪﻝ ﭼﻨﺪﻭﺟﻬﻲ ﺍﺩﻏﺎﻡ ﺷﺪﻩ ﻭ ﺑﺮﺍﻱ ﺗﺤﻠﻴﻞ ﺭﻓﺘﺎﺭ ﻛﺎﺭﺑﺮﺍﻥ ﻣﻮﺭﺩ ﺍﺳﺘﻔﺎﺩﻩ ﻗﺮﺍﺭ ﮔﺮﻓﺖ. ﻧﺘﺎﻳﺞ ﺁﺯﻣﺎﻳﺶﻫﺎ ﺑﺮ ﺭﻭﻱ ﺩﺍﺩﮔﺎﻥ MDDL و بر پایه‌ی معیارهای دقت و F1 نشان می‌دهند که مدل چندوجهی پیشنهادی، با دقت 17/90 درصد و امتیاز F1 برابر با 64/90درصد، عملکرد بهتری نسبت به مدل‌های تک‌وجهی مبتنی بر ﻣﺘﻦ (87/87ﺩﺭﺻﺪ ﺩﻗﺖ) ﻭ ﺗﺼﻮﻳﺮ (37/73ﺩﺭﺻﺪ ﺩﻗﺖ) ﺩﺍﺭﺩ. ﺍﻳﻦ ﻧﺘﺎﻳﺞ ﻣﺆﻳﺪ ﺁﻥ ﺍﺳﺖ ﻛﻪ ﻣﺪﻝ ﭘﻴﺸﻨﻬﺎﺩﻱ، ﺑﺪﻭﻥ ﻧﻴﺎﺯ ﺑﻪ ﺣﺬﻑ ﺩﺍﺩﻩﻫﺎﻱ ﻧﺎﻗﺺ، ﺍﺯ ﺍﻃﻼﻋﺎﺕ ﻣﻮﺟﻮﺩ ﺑﻪﻃﻮﺭ ﻣﺆﺛﺮ ﺑﻬﺮﻩﺑﺮﺩﺍﺭﻱ می‌کند.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Multimodal models, missing modality, multimodal data fusion, contrastive learning, social media

نویسندگان [English]

  • Hamed Marvi 1
  • Abolfazl Nadi 2
  • Mohammad Mehdi Keikha 3
1 MS.c Student, Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
2 Assistant Professor, Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
3 Assistant Professor, School of Mathematics, Statistics and Computer Science, University of Sistan & Baluchestan, Sistan & Baluchestan, Iran
چکیده [English]

The analysis of social network data plays a fundamental role in uncovering users’ behavioral patterns. Multimodal models that combine textual, visual, and other information sources are effective tools for such analyses. However, a major challenge in these models is the absence of certain modalities in parts of the dataset; for instance, a user may post only text without sharing any images. This issue prevents multimodal models from fully exploiting all available information.In this paper, a method is proposed for leveraging incomplete data within multimodal models. First, unimodal models are trained independently to process each modality. Then, a contrastive learning–based encoder is designed and trained to estimate the feature vector of the missing modality using the features of the available modalities.Finally, textual and visual data—either real or reconstructed—are fused in a multimodal model to analyze user behavior. Experimental results on the MDDL dataset, based on accuracy and F1-score metrics, demonstrate that the proposed multimodal model achieves superior performance, reaching an accuracy of 90.17% and an F1-score of 90.64%, outperforming unimodal text-based (87.87% accuracy) and image-based (73.37% accuracy) models. These findings confirm that the proposed model effectively utilizes available information without discarding incomplete data.

کلیدواژه‌ها [English]

  • Multimodal Models
  • Missing Modality
  • Multimodal Data Fusion
  • Contrastive Learning
  • Social Media
[1]     World Health Organization (WHO), “Depressive disorder (depression),” WHO Fact Sheets, Aug. 29, 2025. [Online]. Available:https://www.who.int/news-room/fact-sheets/detail/depression
[2]     W. Guo, J. Wang, and S. Wang, “Deep multimodal representation learning: a survey,” IEEE Access, vol. 7, pp. 63373–63394, 2019, doi: 10.1109/ACCESS.2019.2916887.
[3]     P. Xu, X. Zhu, and D. A. Clifton, “Multimodal learning with transformers: a survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 12113–12132, 2023, doi: 10.1109/TPAMI.2023.3275156.
[4]     D. Ailyn, “Multimodal data fusion techniques,” 2024. [Online]. Available: https://www.researchgate.net/publication/383887675_Multimodal_Data_Fusion_Techniques
[5]     T. Meng, X. Jing, Z. Yan, and W. Pedrycz, “A survey on machine learning for data fusion,” Information Fusion, vol. 57, pp. 115–129, 2020, doi: 10.1016/j.inffus.2019.12.001.
[6]     M. Pawłowski, A. Wróblewska, and S. Sysko-Romańczuk, “Effective techniques for multimodal data fusion: a comparative analysis,” Sensors, vol. 23, p. 2381, 2023, doi: 10.3390/s23052381.
[7]     D. Lee, S. Park, J. Kang, D. Choi, and J. Han, “Cross-lingual suicidal-oriented word embedding toward suicide prevention,” in Findings of the Association for Computational Linguistics: EMNLP 2020, R. Cotterell, S. Eger, and S. Wiseman, Eds., Online, Nov. 2020, pp. 2208-2217, doi: 10.18653/v1/2020.findings-emnlp.200.
[8]     B. G. Bokolo and Q. Liu, “Deep learning-based depression detection from social media: comparative evaluation of ML and transformer techniques,” *Electronics*, vol. 12, no. 21, p. 4396, 2023, doi: 10.3390/electronics12214396.
[9]     L. Zhu, Z. Zhu, C. Zhang, Y. Xu, and X. Kong, “Multimodal sentiment analysis based on fusion methods: a survey,” *Information Fusion*, vol. 95, pp. 306–325, 2023, doi: 10.1016/j.inffus.2023.02.028.
[10] G. Coppersmith, M. Dredze, and C. Harman, “Quantifying mental health signals in Twitter,” in Proc. Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Baltimore, MD, USA, Jun. 27, 2014, pp. 51–60, doi: 10.3115/v1/W14-3207.
[11] M. Deshpande and V. Rao, “Depression detection using emotion artificial intelligence,” in *Proc. 19th IEEE Int. Conf. Intelligent Sustainable Systems*, Palladam, Tirupur, India, Dec. 2017, pp. 858-862, doi: 10.1109/ISS1.2017.8389299.
[12] M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz, “Predicting depression via social media,” in Proc. Int. AAAI Conf. Web and Social Media, Limassol, Cyprus, Jun. 5–8, 2013, vol. 7, pp. 128–137, doi: 10.1609/icwsm.v7i1.14432.
[13] A. Murarka, B. Radhakrishnan, and S. Ravichandran, “Classification of mental illnesses on social media using RoBERTa,” in Proc. 12th Int. Workshop on Health Text Mining and Information Analysis, E. Holderness et al., Eds., Online, 2021, pp. 59–68. [Online]. Available: https://aclanthology.org/2021.louhi-1.7/
[14] S. Yang, L. Cui, L. Wang, T. Wang, and J. You, “Enhancing multimodal depression diagnosis through representation learning and knowledge transfer,” *Heliyon*, vol. 10, no. 4, p. e25959, Feb. 2024, doi: 10.1016/j.heliyon.2024.e25959.
[15] T. Baltrušaitis, C. Ahuja, and L. Morency, “Multimodal machine learning: a survey and taxonomy,” *IEEE Trans. Pattern Anal. Mach. Intell.*, vol. 41, no. 2, pp. 423–443, 2019, doi: 10.1109/TPAMI.2018.2798607.
[16] S. Yang, L. Cui, L. Wang, T. Wang, and J. You, “Cross-modal contrastive learning for multimodal sentiment recognition,” *Applied Intelligence*, vol. 54, pp. 4260–4276, 2024, doi: 10.1007/s10489-024-05355-8.
[17] M. Fang, S. Peng, Y. Liang, C.-C. Hung, and S. Liu, “A multimodal fusion model with multi-level attention mechanism for depression detection,” *Biomedical Signal Processing and Control*, vol. 82, p. 104561, Apr. 2023, doi: 10.1016/j.bspc.2022.104561.
[18] Y. Wang, Z. Wang, C. Li, Y. Zhang, and H. Wang, “Online social network individual depression detection using a multitask heterogeneous modality fusion approach,” *Information Sciences*, vol. 609, pp. 727–749, 2022, doi: 10.1016/j.ins.2022.07.109.
[19] R. Wu, H. Wang, H.-T. Chen, and G. Carneiro, “Deep multimodal learning with missing modality: a survey,” *arXiv preprint* arXiv:2409.07825, 2024.
[20] L. Cai, Z. Wang, H. Gao, D. Shen, and S. Ji, “Deep adversarial learning for multi-modality missing data completion,” in *Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining*, pp. 1158–1166, 2018, doi: 10.1145/3219819.3219963.
[21] M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “SMIL: Multimodal learning with severely missing modality,” in *Proc. 35th AAAI Conf. Artificial Intelligence (AAAI ’21)*, vol. 35, no. 3, pp. 2302–2310, 2021, doi: 10.1609/AAAI.V35I3.16330.
[22] H. Wang, Y. Chen, C. Ma, J. Avery, L. Hull, and G. Carneiro, “Multi-modal learning with missing modality via shared-specific feature modelling,” in Proc. 2023 IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 15878–15887, 2023, doi: 10.1109/CVPR56347.2023.
[23] M. K. Reza, A. Prater-Bennette, and M. S. Asif, “Robust multimodal learning with missing modalities via parameter-efficient adaptation,” *arXiv preprint* arXiv:2310.03986, 2024.
[24] Q. Wang, L. Zhan, P. Thompson, and J. Zhou, “Multimodal learning with incomplete modalities by knowledge distillation,” in *Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining*, 2020, pp. 1828-1838, doi: 10.1145/3394486.3403234.
[25] X. Chen, S. Kornblith, M. Noroozi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” in *Proc. 37th Int. Conf. Machine Learning (ICML)*, 2020, pp. 1597–1607, doi: 10.48550/arXiv.2002.05709.
[26] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in *Proc. Int. Conf. Mach. Learn. (ICML)*, vol. 139, pp. 8748–8763, 2021, doi: 10.48550/arXiv.2103.00020.
[27] A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised learning,” *arXiv preprint* arXiv:2010.05113, 2021.
[28] G. Shen, J. Jia, L. Nie, F. Feng, C. Zhang, T. Hu, T.-S. Chua, and W. Zhu, “Depression detection via harvesting social media: a multimodal dictionary learning solution,” in *Proc. 26th Int. Joint Conf. Artif. Intell. (IJCAI)*, 2017, pp. 3838–3844, doi: 10.24963/ijcai.2017/535.
[29] C. Zhang, F. Nian, and J. Lee, “Toward robust multimodal learning using multimodal foundational models,” *arXiv preprint* arXiv:2401.13697, 2024, doi: 10.48550/arXiv.2401.13697.