ارائه یک روش ترکیبی داده‌افزایی برای تحلیل احساسات با تمرکز بر رفع چالش‌های متون غیررسمی فارسی

مهرپرور, غزل; نوفرستی, سمیرا

doi:10.22034/abmir.2025.23361.1141

ارائه یک روش ترکیبی داده‌افزایی برای تحلیل احساسات با تمرکز بر رفع چالش‌های متون غیررسمی فارسی

نوع مقاله : مقاله پژوهشی

نویسندگان

¹ دانشجوی کارشناسی ارشد، گروه مهندسی کامپیوتر، دانشکده مهندسی برق و کامپیوتر، دانشگاه سیستان و بلوچستان، زاهدان، ایران

² گروه مهندسی فناوری اطلاعات، دانشکده مهندسی برق و کامپیوتر - دانشگاه سیستان و بلوچستان

10.22034/abmir.2025.23361.1141

چکیده

با افزایش روزافزون حجم نظرات کاربران در شبکه‌های اجتماعی و بسترهای خدماتی، تحلیل احساسات به یکی از ابزارهای کلیدی در استخراج بینش از داده‌های متنی تبدیل شده است. با این حال، چالش‌هایی نظیر کمبود داده‌های برچسب خورده و پیچیدگی‌های زبانی به‌ویژه در زبان فارسی، عملکرد مدل‌های یادگیری عمیق را با محدودیت‌هایی مواجه کرده است. در این مقاله، روشی برای داده‌افزایی با هدف بهبود تحلیل احساسات متون فارسی ارائه شده است که تمرکز آن بر تولید داده‌های مصنوعی با تکیه بر چالش‌های مطرح در تحلیل نظرات شبکه‌های اجتماعی از جمله کوتاه بودن، زبان عامیانه و اشتباهات املایی و دستوری فراوان در این نوع متون است. همچنین روشی مبتنی بر داده‌افزایی برای بهبود تحلیل احساسات بین‌دامنه‌ای پیشنهاد شده است. نتایج ارزیابی‌های انجام‌گرفته نشان می‌دهد که روش پیشنهادی در دامنه‌های توئیتر و فیلم معیار F1 طبقه‌بند شبکه عصبی پیچشی برای تحلیل احساسات را به ترتیب 3/6 و 2/8 درصد بهبود داده است. همچنین دو روش پیشنهادی برای داده‌افزایی بین‌دامنه‌ای (غنی‌سازی با واژه‌های دامنه مقصد و داده‌افزایی با ChatGPT)، توانسته‌اند معیار F1 مدل آموزش دیده بر روی دامنه توئیتر را در تحلیل احساسات دامنه هتل به میزان 8/9 و 6/14 درصد و در دامنه فیلم به میزان 6/2 و 4/0 درصد افزایش دهند.

کلیدواژه‌ها

موضوعات

یادگیری عمیق

عنوان مقاله [English]

A Text Augmentation Approach for Sentiment Analysis with a Focus on Persian Text Processing Challenges

نویسندگان [English]

Ghazal Mehrparvar ¹
Samira Noferesti ²

¹ MS Student, Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Sistan and Baluchestan, Zahedan, Iran

² Associate Professor, Department of Information Technology Engineering, Faculty of Electrical and Computer Engineering, University of Sistan and Baluchestan, Zahedan, Iran

چکیده [English]

With the increasing volume of user comments on social networks and service platforms, sentiment analysis has become a key tool for extracting insights from textual data. However, challenges such as the scarcity of labeled data and linguistic complexities, particularly in the Persian language, have imposed limitations on the performance of deep learning models. In this paper, we propose a data augmentation method aimed at improving sentiment analysis of Persian texts. This approach focuses on generating synthetic data by addressing challenges inherent to social media reviews, including their brevity, colloquial language, and frequent spelling and grammatical errors. Additionally, a data augmentation technique for enhancing cross-domain sentiment analysis is introduced. Evaluation results demonstrate that the proposed data augmentation method improves the F1 score of a convolutional neural network classifier for sentiment analysis by 6.8% on the Twitter domain. Furthermore, in cross-domain sentiment analysis, the proposed method increases the F1 score of the CNN model trained on Twitter data by 14.8% when tested on a hotel review dataset.

کلیدواژه‌ها [English]

Augmentation
Sentiment analysis
Persian language challenges
Cross-domain sentiment analysis
Deep learning

مراجع

[1] L. Zhang, and B. Liu. “Sentiment analysis and opinion mining,” Encyclopedia of Machine Learning and Data Science, Springer, New York, pp. 1-13, 2023, doi: 10.1007/978-1-4899-7502-7_907-2.

[2] M. M. Agüero-Torales, J. I. A. Salas, and A. G. López-Herrera, “Deep learning and multilingual sentiment analysis on social media data: An overview,” Applied Soft Computing, vol. 107, p. 107373, 2021, doi: 10.1016/j.asoc.2021.107373.

[3] N. A. Sharma, A. S. Ali, and M. A. Kabir, “A review of sentiment analysis: tasks, applications, and deep learning techniques,” International Journal of Data Science and Analytics, pp. 1–38, 2024, doi: 10.1007/s41060-024-00594-x.

[4] T. Body, X. Tao, Y. Li, L. Li, and N. Zhong, “Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models,” Expert Systems with Applications, vol. 178, p. 115033, 2021, doi: 10.1016/j.eswa.2021.115033.

[5] M. Mir, and S. Noferesti, “Using data augmentation techniques for sentiment analysis of users’ opinions on reopening of schools during the COVID-19 epidemic,” Signal and Data Processing, vol. 21, no. 2, pp. 3–14, 2024, doi: 10.61186/jsdp.21.2.3 [In Persian].

[6] J. Wei, and K. Zou, “EDA: Easy data augmentation techniques for boosting performance on text classification tasks,” in Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, China, 2019, pp. 6382–6388, doi: 10.18653/v1/D19-1670.

[7] H. Youneszadeh Haghighi, and S. Noferesti, “Text augmentation based on operation weighting using genetic algorithm,” Scientia Iranica, 2025, doi: 10.24200/sci.2025.65358.9440.

[8] G. Li, H. Wang, Y. Ding, K. Zhou, and X. Yan, “Data augmentation for aspect-based sentiment analysis,” International Journal of Machine Learning and Cybernetics, vol. 14, no. 1, pp. 125–133, 2023.

[9] X. Wu, “Conditional BERT contextual augmentation,” in Computational Science – ICCS 2019, Lecture Notes in Computer Science, Y. Shi et al., Eds. Cham, Switzerland: Springer, 2019, pp. 98–113, doi: 10.1007/978-3-030-22747-0_7.

[10] R. Nair, R. P. Singh, D. Gupta, and P. Kumar, “Evaluating the impact of text data augmentation on text classification tasks using DistilBERT,” Procedia Computer Science, vol. 235, pp. 102–111, 2024, doi: 10.1016/j.procs.2024.04.013.

[11] H. Dai et al., “Auggpt: Leveraging ChatGPT for text data augmentation,” IEEE Transactions on Big Data, 2025, doi: 10.1109/TBDATA.2025.3536934.

[12] M. Mohammadi, M. R. Amin, and S. Tavakoli, “Boosting Sentiment Analysis in Persian through a GAN-Based Synthetic Data Augmentation Method,” in Proc. of the 1st Workshop on NLP for Languages Using Arabic Script, 2025, pp. 54-63.

[13] L. Xu, H. Xie, S. J. Qin, F. L. Wang, and X. Tao, “Exploring ChatGPT-based augmentation strategies for contrastive aspect-based sentiment analysis,” IEEE Intelligent Systems, vol. 40, no. 1, pp. 69–76, 2025, doi: 10.1109/MIS.2024.3508432.

[14] G. A. Miller, “WordNet: A lexical database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.

[15] M. Xu, Q. Zhong, and J. Liu, “LLM-as-an-Augmentor: Improving the data augmentation for aspect-based sentiment analysis with large language models,” in Poster Volume II, Computational Intelligence and Intelligent Computing (ICIC 2024), Cham, Switzerland: Springer, 2024.

[16] S. Kwon and Y. Lee, “Explainability-based mix-up approach for text data augmentation,” ACM Transactions on Knowledge Discovery from Data, vol. 17, no. 1, pp. 1–14, 2023, doi: 10.1145/3533048.

[17] Taheri, A. Zamanifar, and A. Farhadi, “Enhancing aspect-based sentiment analysis using data augmentation based on back-translation,” International Journal of Data Science and Analytics, vol. 19, pp. 1-26, 2024. doi:10.1007/s41060-024-00622-w.

[18] “Dataheart”. [Online]. Available: (accessed Dec. 2024).

[19] “Dataheart”. [Online]. Available: http://dataheart.ir/article/3362/ (accessed Aug. 2025).

[20] R. Dehkharghani. “Labeled-Sentences.” [Online].Available:

http://myweb.sabanciuniv.edu/rdehkharghani/files/2018/08/Labeled-Sentences.xlsx (accessed April. 2025).

دوره 3، شماره 1
بهار و تابستان 1404
شهریور 1404
صفحه 111-126

تعداد مشاهده مقاله: 219
تعداد دریافت فایل اصل مقاله: 128

ارائه یک روش ترکیبی داده‌افزایی برای تحلیل احساسات با تمرکز بر رفع چالش‌های متون غیررسمی فارسی

A Text Augmentation Approach for Sentiment Analysis with a Focus on Persian Text Processing Challenges

مراجع

دوره 3، شماره 1
بهار و تابستان 1404
شهریور 1404
صفحه 111-126

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

ارائه یک روش ترکیبی داده‌افزایی برای تحلیل احساسات با تمرکز بر رفع چالش‌های متون غیررسمی فارسی

A Text Augmentation Approach for Sentiment Analysis with a Focus on Persian Text Processing Challenges

مراجع

دوره 3، شماره 1بهار و تابستان 1404شهریور 1404صفحه 111-126

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 3، شماره 1
بهار و تابستان 1404
شهریور 1404
صفحه 111-126