Analysis and Identification of Key Features for Classifying Surface Defects in Steel Plates Using Machine Learning Models and Data Balancing Techniques

Document Type : Original Article

Authors

1 Assistant Professor, Department of Electrical and Computer Engineering, University of Hormozgan, Bandar Abbas, Hormozgan, Iran

2 B.Sc. Graduate, Department of Computer, Islamic Azad University, Bandar Abbas, Iran

3 Lecturer, Department of Materials and Metallurgy, Islamic Azad University, Minab, Iran

10.22034/abmir.2025.23555.1155

Abstract

In this study, the main challenges in detecting surface defects of steel sheets—including data imbalance, feature overlap, and the difficulty of identifying influential features—were investigated. The primary objective of the research was to identify the most effective features for improving the accuracy of defect classification models such as scratch, stain, and bump detection. To achieve this, a three-stage analysis was conducted. The first and second stages involved applying the Random Forest and Extreme Gradient Boosting (XGBoost) algorithms to the raw data, while the third stage applied the same algorithms to data balanced using the SMOTE technique. The results showed that employing SMOTE increased the accuracy of the XGBoost model from 79.18% to 91.7%, and that of the Random Forest model from 80.21% to 91.0%. The main innovation of this study lies in integrating machine learning methods with data balancing and numerical feature-importance analysis. Furthermore, common features such as geometric dimension parameters, spatial indices, and brightness characteristics were identified as stable indicators that can serve as the foundation for designing intelligent quality-control systems in the steel industry. The considerable variation in the list of important features across algorithms and data conditions demonstrated that feature selection in this problem is not fixed but depends on the learning method and data preprocessing approach. This methodology not only provides deeper scientific insights into steel defect detection but also supports more precise decision-making in selecting key features for similar industrial applications.

Keywords

Main Subjects


[1]     Y. Zhang, S. Shen, and S. Xu, “Strip steel surface defect detection based on lightweight YOLOv5,” Frontiers in Neurorobotics, vol. 17, Oct. 2023, doi: https://doi.org/10.3389/fnbot.2023.1263739.
[2]     Y. Zhan and F. Feng, "Detection and Identification of Strip Surface Defects Based on Deep Learning," 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), Guangzhou, China, 2022, pp. 402-405, doi: 10.1109/MLISE57402.2022.00086.
[3]     X. Lv, F. Duan, J. Jiang, X. Fu, and L. Gan, “Deep Metallic Surface Defect Detection: The New Benchmark and Detection Network,” Sensors, vol. 20, no. 6, p. 1562, Mar. 2020, doi: https://doi.org/10.3390/s20061562.
[4]     V. Vasan, N. V. Sridharan, Sugumaran Vaithiyanathan, and Mohammadreza Aghaei, “Detection and Classification of Surface Defects on Hot-Rolled Steel using Vision Transformers,” Heliyon, vol. 10, no. 19, pp. e38498–e38498, Oct. 2024, doi: https://doi.org/10.1016/j.heliyon.2024.e38498.
[5]     Y. Wang et al., “A steel defect detection method based on edge feature extraction via the Sobel operator,” Scientific Reports, vol. 14, no. 1, Nov. 2024, doi: https://doi.org/10.1038/s41598-024-79205-5.
[6]     O. C. Han and U. Kutbay, “Detection of Defects on Metal Surfaces Based on Deep Learning,” Applied Sciences, vol. 15, no. 3, p. 1406, Jan. 2025, doi: https://doi.org/10.3390/app15031406.
[7]     Y. Jiang, “Surface defect detection of steel based on improved YOLOv5 algorithm,” Mathematical Biosciences and Engineering, vol. 20, no. 11, pp. 19858–19870, Jan. 2023, doi: https://doi.org/10.3934/mbe.2023879.
[8]     A. Ashwin, Edmond, B. Gao, and Wai Lok Woo, “A Review and Benchmark on State-of-the-Art Steel Defects Detection,” SN Computer Science, vol. 5, no. 1, Dec. 2023, doi: https://doi.org/10.1007/s42979-023-02436-2.
[9]     E. C. ÖZKAT, “A method to classify steel plate faults based on ensemble learning,” Journal of Materials and Mechatronics: A, vol. 3, no. 2, Oct. 2022, doi: https://doi.org/10.55546/jmm.1161542.
[10] A. Kharal, "Explainable artificial intelligence based fault diagnosis and insight harvesting for steel plates manufacturing," arXiv preprint arXiv:2008.04448, 2020.
[11] M. Buscema, S. Terzi, and W. Tastle, "Steel Plates Faults," UCI Machine Learning Repository, 2010. [Online]. Available: https://doi.org/10.24432/C5J88N. [Accessed: Nov. 13, 2025].
[12] A. Dorbane, F. Harrou, and Y. Sun, "Detecting Faulty Steel Plates Using Machine Learning," in Advances in Computing and Data Sciences, S. Silakari, P. P, S. S, and B. Panda, Eds., Cham, Switzerland: Springer Nature, 2024, pp. 321–333. doi: 10.1007/978-3-031-70906-7_27. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-031-70906-7_27.
[13] M. Fernandes, J. M. Corchado, and G. Marreiros, “Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review,” Applied Intelligence, vol. 52, no. 12, Mar. 2022, doi: https://doi.org/10.1007/s10489-022-03344-3.
[14] Sadegh Mosharrafzadeh, Bahman Ravaei, and Ehsanollah Koozegar, “Diagnosis of Diabetes Using a Random Forest Algorithm,” JOURNAL OF DIABETES AND METABOLIC DISORDERS, vol. 21, no. 2, pp. 92–100, 2021, [Online]. Available: https://sid.ir/paper/415460/en.
[15] N. Venkatesan and G. Priya, "A Study of Random Forest Algorithm with Implementation Using WEKA," International Journal of Innovative Research in Computer Science and Engineering, vol. 1, no. 6, pp. 156–162, 2015. [Online]. Available:https://www.ioirp.com/Doc/IJIRCSE/i6/JCSE242.pdf.
[16] S. Bakhtiari, " Malware Detection Using Data Mining and Xgboost and Random Forest Algorithms," Journal of Information and communication Technology in policing (JICTP), vol. 3, no. 1, pp. 55-68, Apr. 2020, doi:https://doi.org/10.22034/pitc.2022.1267935.1130.
[17] T. Chen and C. Guestrin, “XGBoost: a Scalable Tree Boosting System,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, vol. 1, no. 1, pp. 785–794, Aug. 2016, doi: https://doi.org/10.1145/2939672.2939785.
[18] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, no. 16, pp. 321–357, Jun. 2002, doi: https://doi.org/10.1613/jair.953.