مروری بر روش‌های انتخاب ویژگی نیمه‌نظارتی تُنک مبتنی بر گراف

نوع مقاله : مقاله پژوهشی

نویسنده

گروه مهندسی کامپیوتر، دانشکده فنی و مهندسی، دانشگاه اردکان، اردکان، ایران

چکیده

در برخی از کاربردهای دنیای واقعی، داده‌هایی با ابعاد بالا وجود دارند که چالش‌های محاسباتی زیادی را ایجاد کرده‌اند. یکی از تکنیک‌های موثر برای کاهش ابعاد داده‌ها، انتخاب ویژگی است که با انتخاب زیرمجموعه مناسبی از ویژگی‌ها باعث سادگی مدل و بهبود کارایی آن می‌شود. در بسیاری از این کاربردها، برچسب زدن داده‌ها امری زمان‌بر و پرهزینه است که باعث می‌شود داده‌های برچسب‌دار کمی وجود داشته باشند و حجم عظیمی از داده‌های بدون برچسب در دسترس باشند. در چنین کاربردهایی، روش‌های انتخاب ویژگی نیمه‌نظارتی می‌توانند با استفاده از اطلاعات برچسب داده‌های برچسب‌دار و اطلاعات توزیع و ساختار هندسی داده‌های برچسب‌دار و بدون برچسب، فرایند انتخاب ویژگی را انجام دهند. در اکثر روش‌های انتخاب ویژگی نیمه‌نظارتی، با ایجاد یک گراف همسایگی، ویژگی‌های مناسب از طریق بررسی توانایی‌ آن‌ها در حفظ ساختار هندسی گراف ارزیابی می‌شوند. در روش‌های کلاسیک انتخاب ویژگی نیمه‌نظارتی مبتنی بر گراف، ویژگی‌ها به صورت جداگانه ارزیابی می‌شوند و همبستگی بین ویژگی‌ها در هنگام انتخاب ویژگی در نظر گرفته نمی‌شود. روش‌های انتخاب ویژگی تُنک با در نظر گرفتن همبستگی بین ویژگی‌ها، ماتریس انتقال بهینه تُنک برای انتخاب ویژگی‌ را محاسبه می‌نمایند. در این مقاله با بررسی روش‌های یادگیری نیمه‌نظارتی، مروری بر روش‌های انتخاب ویژگی نیمه‌نظارتی تُنک مبتنی بر گراف انجام می‌شود که با استفاده از عبارت تنظیم مبتنی بر مدل‌های تُنک و با ایجاد گراف همسایگی، ویژگی‌های مناسب را انتخاب می‌کنند. این روش‌ها ضمن برطرف کردن مشکل روش‌های انتخاب ویژگی کلاسیک، با ایجاد یک گراف همسایگی از داده‌ها ماتریس انتقال بهینه تُنک برای انتخاب ویژگی را محاسبه می‌نمایند.

کلیدواژه‌ها


عنوان مقاله [English]

A review on graph-based semi-supervised sparse feature selection methods

نویسنده [English]

  • Razieh Sheikhpour
Department of Computer Engineering, Faculty of Engineering, Ardakan University. Ardakan, Iran
چکیده [English]

In some real-world applications, there is high-dimensional data which has led to many computational challenges. Feature selection is an effective technique for data dimensionality reduction, which simplifies the model and improves its performance by selecting the appropriate subset of features. In many of these applications, labeling of data is costly and time consuming, leaving little labeled data available and large amounts of unlabeled data available. In such applications, semi-supervised feature selection methods perform the feature selection process using the information of labeled data, and the distribution and geometric structure of labeled and unlabeled data. In most semi-supervised feature selection methods, a neighborhood graph is created and the importance of features is evaluated via their ability to maintain the geometric structure of the graph. In classical graph-based semi-supervised feature selection methods, the features are evaluated one by one and the correlation between features is not considered in feature selection process. To overcome this problem, sparse feature selection methods have been presented which consider the correlation between features, and calculate the optimal sparse transformation matrix for feature selection. In this paper, we investigate the semi-supervised learning methods, and review the graph-based semi- supervised sparse feature selection methods which select the appropriate features using the graph created by the labeled and unlabeled data, and the sparse regularization term. These methods solve the problem of classical semi-supervised methods by considering the correlation between features, create a neighborhood graph using the labeled and unlabeled data, calculate the graph Laplacian matrix, and compute the optimal sparse transformation matrix for feature selection.

کلیدواژه‌ها [English]

  • Semi-supervised feature selection
  • Semi-supervised learning
  • Sparse models
  • Graph Laplacian
  • Hu, Y. Zhang, D. Gong, Multiobjective Particle Swarm Optimization for Feature Selection With Fuzzy Cost, IEEE TRANSACTIONS ON CYBERNETICS. 51 (2021) 874–888.
  • Dhiman, D. Oliva, A. Kaur, K.K. Singh, S. Vimal, A. Sharma, K. Cengiz, BEPO: A novel binary emperor penguin optimizer for automatic feature selection, Knowledge-Based Systems. 211 (2021).
  • Feng, Q. Zhou, Q. Gu, X. Tan, G. Cheng, X. Lu, J. Shi, L. Ma, DMT: Dynamic mutual training for semi-supervised learning, Pattern Recognition. 130 (2022).
  • Huynh, A. Nibali, Z. He, Semi-supervised learning for medical image classification using imbalanced training data, Computer Methods and Programs in Biomedicine. 216 (2022).
  • Chapelle, B. Schölkopf, A. Zien, Semi-supervised learning, MIT press Cambridge, 2006.
  • Quintero-Gull, J. Aguilar, LAMDA-HSCC: A semi-supervised learning algorithm based on the multivariate data analysis, Expert Systems with Applications. 202 (2022).
  • Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, Journal of Machine Learning Research. 7 (2006) 2399–2434.
  • Razieh Sheikhpour; Mehdi Agha Sarrama; Sajjad Gharaghani; Mohammad Ali Zare Chahookia, A Survey on semi-supervised feature selection methods, Pattern Recognition. 64 (2017) 141–158.
  • Cheng, W. Deng, C. Fu, Y. Wang, Z. Qin, Graph-based semi-supervised feature selection with application to automatic spam image identification, in: Computer Science for Environmental Engineering and EcoInformatics, Springer, 2011: pp. 259–264.
  • Zhao, K. Lu, X. He, Locality sensitive semi-supervised feature selection, Neurocomputing. 71 (2008) 1842–1849.
  • Doquire, M. Verleysen, A graph Laplacian based approach to semi-supervised feature selection for regression problems, Neurocomputing. 121 (2013) 5–13.
  • Shi, Q. Ruan, G. An, Sparse feature selection based on graph Laplacian for web image annotation, Image and Vision Computing. 32 (2014) 189–201.
  • Zeng, X. Wang, J. Zhang, Q. Wu, Semi-supervised feature selection based on local discriminative information, Neurocomputing. 173 (2016) 102–109.
  • Sheikhpour, M.A. Sarram, S. Gharaghani, M.A.Z. Chahooki, A robust graph-based semi-supervised sparse feature selection method, Information Sciences. 531 (2020) 13–30.
  • Chen, G. Yuan, F. Nie, Z. Ming, Semi-supervised Feature Selection via Sparse Rescaled Linear Square Regression, IEEE Transactions on Knowledge and Data Engineering. 32 (2018) 165–176.
  • Ma, F. Nie, Y. Yang, J.R.R. Uijlings, N. Sebe, S. Member, A.G. Hauptmann, Discriminating joint feature analysis for multimedia data understanding, IEEE TRANSACTIONS ON MULTIMEDIA. 14 (2012) 1662–1672.
  • Zhu, A.B. Goldberg, Introduction to semi-supervised learning, 2009.
  • Leistner, Semi-supervised ensemble methods for computer vision, PhD Thesis, Graz University of Technology, 2010.
  • Zuo, L. Li, C. Chen, The graph based semi-supervised algorithm with ℓ1-regularizer, Neurocomputing. 149 (2015) 966–974.
  • Zhou, M. Li, Semi-supervised regression with Co-training, in: International Joint Conference on Arti¯cial Intelligence (IJCAI’05), 2005: pp. 908–913.
  • Bellal, H. Elghazel, A. Aussem, A semi-supervised feature ranking method with ensemble learning, Pattern Recognition Letters. 33 (2012) 1426–1433.
  • Song, X. Yang, Z. Xu, I. King, Graph-Based Semi-Supervised Learning: A Comprehensive Review, IEEE Transactions on Neural Networks and Learning Systems. (2022).
  • Ghazvininejad, M. Mahdieh, H.R. Rabiee, P.K. Roshan, M.H. Rohban, Isograph: Neighbourhood graph construction based on geodesic distance for semi-supervised learning, in: Proceedings - IEEE International Conference on Data Mining, ICDM, 2011: pp. 191–200.
  • Pourdamghani, H.R. Rabiee, F. Faghri, M.H. Rohban, Graph based semi-supervised human pose estimation: When the output space comes to help, Pattern Recognition Letters. 33 (2012) 1529–1535.
  • Zhong, X. Chen, F. Nie, J. Zhexue, Adaptive discriminant analysis for semi-supervised feature selection, Information Sciences. 566 (2021) 178–194.
  • Tubishat, S. Ja, M. Alswaitti, S. Mirjalili, Dynamic Salp Swarm Algorithm for Feature Selection, Expert Systems with Applications. 164 (2021) 113873.
  • Benabdeslem, M. Hindawi, Constrained laplacian score for semi-supervised feature selection, in: Machine Learning and Knowledge Discovery in Databases, Springer, 2011: pp. 204–218.
  • Han, Y. Yang, Y. Yan, Z. Ma, N. Sebe, S. Member, Semisupervised feature selection via spline regression for video semantic recognition, IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS,. 26 (2015) 252–264.
  • Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological). 58 (1996) 267–288.
  • Foucart, M.-J. Lai, Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0< q<1, Applied and Computational Harmonic Analysis. 26 (2009) 395–407.
  • Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Processing Letters. 14 (2007) 707–710.
  • Shi, Q. Ruan, S. Member, G. An, R. Zhao, Hessian semi-supervised sparse feature selection based on L21/2-matrix norm, IEEE Transactions on Multimedia. 17 (2015) 16–28.
  • Sheikhpour, M.A. Sarram, E. Sheikhpour, Semi-supervised sparse feature selection via graph Laplacian based scatter matrix for regression problems, Information Sciences. 468 (2018) 14–28.
  • Lai, H. Chen, W. Li, T. Li, J. Wan, Semi-supervised feature selection via adaptive structure learning and constrained graph learning, Knowledge-Based Systems. 251 (2022) 109243.
  • Fan, X. Zhang, J. Hu, N. Gu, D. Tao, Adaptive Data Structure Regularized Multiclass Discriminative Feature Selection, IEEE Transactions on Neural Networks and Learning Systems. (2021) 1–14.
  • Li, Y. Zhang, R. Zhang, Semisupervised Feature Selection via Generalized Uncorrelated Constraint and Manifold Embedding, IEEE Transactions on Neural Networks and Learning Systems. (2021).