SSLA-Net: معماری یادگیری معنایی ـ مکانی برای شناسایی مقاوم نقص‌های سطحی فولاد

نوع مقاله : مقاله پژوهشی

نویسندگان

1 استادیار گروه مهندسی کامپیوتر، دانشگاه سیستان و بلوچستان، زاهدان، ایران

2 دکتری هوش مصنوعی، دانشکده مهندسی کامپیوتر، پردیس فنی و مهندسی، دانشگاه یزد، یزد، ایران

3 استادیارگروه مهندسی فناوری اطلاعات، دانشگاه سیستان و بلوچستان، زاهدان، ایران

10.22034/abmir.2025.23624.1166

چکیده

در صنایع فولاد، شناسایی دقیق و سریع نقص‌های سطحی نقشی حیاتی در تضمین کیفیت و کاهش ضایعات ایفا می‌کند. بااین‌حال، شباهت ظاهری میان الگوهای بافتی و پیچیدگی نواحی معیوب، کارایی روش‌های متداول بینایی ماشین را با محدودیت مواجه کرده است. در این پژوهش، یک چارچوب ترکیبی چندوجهی برای تشخیص و طبقه‌بندی نقص‌های سطحی فولاد ارائه شده است که با تلفیق مدل تشخیص ناحیه‌ای Faster R-CNN و نمایش‌های معنایی مدل CLIP از طریق ماژول توجه چندسری، ارتباط میان ویژگی‌های بصری و مفهومی را به‌صورت هم‌زمان مدل‌سازی می‌کند. این هم‌افزایی موجب افزایش دقت در شناسایی نواحی کوچک و پراکنده و همچنین بهبود تفکیک‌پذیری کلاس‌ها در فضای ویژگی شده است. در طراحی مدل، سه تابع ضرر مکمل شامل ضرر تشخیص ناحیه‌ای، ضرر هم‌ترازی ناحیه–متن و ضرر سطح تصویر به‌کار رفته‌اند تا یادگیری هم‌زمان سطوح مکانی و معنایی امکان‌پذیر گردد. نتایج تجربی روی مجموعه‌داده‌ی استاندارد NEU-DET نشان می‌دهند که مدل پیشنهادی به دقت ۱۰۰٪ در طبقه‌بندی سطح تصویر و %04/78mAP@50= در سطح ناحیه‌ای دست یافته است. این عملکرد برجسته نشان می‌دهد که یادگیری چندوجهی مبتنی بر CLIP می‌تواند دقت و پایداری تشخیص نقص را در شرایط واقعی صنعتی به‌طور چشمگیری ارتقا دهد.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

SSLA-Net: Semantic–Spatial Learning Architecture for Robust Steel Surface Defect Detection

نویسندگان [English]

  • Masoumeh Rezaei 1
  • Mansoureh Rezaei 2
  • Mehri Rajaei 3
1 Assistant Professor, Department of Computer Engineering, University of Sistan and Baluchestan, Zahedan, Iran
2 PhD in Artificial Intelligence, Computer Engineering Department, Yazd University, Yazd, Iran
3 Assistant Professor, Department of Information Technology Engineering, University of Sistan and Baluchestan, Zahedan, Iran
چکیده [English]

In the steel industry, accurate and rapid detection of surface defects plays a vital role in ensuring product quality and reducing waste. However, the visual similarity between texture patterns and the complexity of defective regions limit the effectiveness of conventional machine vision methods. In this study, a multimodal hybrid framework is proposed for the detection and classification of steel surface defects. The proposed model integrates the region-based detection capability of Faster R-CNN with the semantic representations of CLIP, connected through a multi-head attention module that jointly models the relationship between visual and conceptual features. This synergy enhances the accuracy in detecting small and scattered defects and improves class separability in the feature space. The architecture employs three complementary loss functions — region detection loss, region–text alignment loss, and image-level loss — to enable simultaneous learning of spatial and semantic information. Experimental results on the NEU-DET benchmark dataset demonstrate that the proposed model achieves 100% accuracy in image-level classification and 78.04% mAP@50 in region-level detection. These promising results indicate that CLIP-based multimodal learning can significantly enhance the accuracy and robustness of defect detection in real industrial environments

کلیدواژه‌ها [English]

  • Multimodal Learning
  • Surface Defect Detection
  • Multi-Head Attention
  • Semantic–Spatial Representation
  • Industrial Vision
[1]     X. Wen, J. Shan, Y. He, and K. Song, “Steel Surface Defect Recognition: A Survey,” Coatings, vol. 13, no. 1, art. no. 17, 2022, doi: 10.3390/coatings13010017.
[2]     O. C. Han and U. Kutbay, “Detection of Defects on Metal Surfaces Based on Deep Learning,” Applied Sciences, vol. 15, no. 3, art. no. 1406, 2025, doi: 10.3390/app15031406.
[3]     [3] Y. He, K. Song, Q. Meng, and Y. Yan, “An end-to-end steel surface defect detection approach via fusing multiple hierarchical features,” IEEE Trans. 
[4]     Instrum. Meas., vol. 69, no. 4, pp. 1493–1504, Apr. 2020, doi: 10.1109/TIM.2019.2913672.
[5]     X. Lv, F. Duan, J. J. Jiang, X. Fu, and L. Gan, “Deep metallic surface defect detection: The new benchmark and detection network,” Sensors, vol. 20, no. 6, p. 1562, Mar. 2020, doi: 10.3390/s20061562.
[6]     S. Ashrafi, M. Rahim, H. Li, and A. Zhang, “Steel surface defect detection and segmentation using deep neural networks,” Results in Engineering, vol. 25, art. no. 103972, 2025, doi: 10.1016/j.rineng.2025.103972.
[7]     X. Chen, L. Wang, Y. Zhao, and J. Liu, “HCT-Det: A High-Accuracy End-to-End Model for Steel Defect Detection Based on Hierarchical CNN–Transformer Features,” Sensors, vol. 25, no. 5, art. no. 1333, 2025, doi: 10.3390/s25051333.
[8]     H. Song, “RSTD-YOLOv7: A steel surface defect detection based on improved YOLOv7,” Scientific Reports, vol. 15, art. no. 19649, 2025, doi: 10.1038/s41598-025-04811-w.
[9]     Z. Yang and Y. Liu, “A steel surface defect detection method based on improved RetinaNet,” Scientific Reports, vol. 15, art. no. 6045, 2025, doi: 10.1038/s41598-025-88527-x.
I.         U. Khan, N. Aslam, M. Aboulnour, A. Bashamakh, F. Alghool, N. Alsuwayan, R. Alturaif, H. Gull, S. Z. Iqbal, and T. Hussain, "Deep Learning-Based Surface Defect Detection in Steel Products Using Convolutional Neural Networks,"  Mathematical Modelling of Engineering Problems, vol. 11, no. 11, pp. 3006–3014, Nov. 2024, doi: 10.18280/mmep.111113.
[10] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” arXiv preprint arXiv:1905.11946, 2019.
[11] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4510–4520.
[12] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Vancouver, Canada, Jun. 2023, pp. 7464–7475. doi: 10.1109/CVPR52729.2023.00721.
[13] L. Yang, R.-Y. Zhang, L. Li, and X. Xie, “SimAM: A simple, parameter-free attention module for convolutional neural networks,” in Proc. 38th Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 11863–11874. [Online]. Available: https://doi.org/10.48550/arXiv.2103.06264
[14] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 10012–10022. doi: 10.1109/ICCV48922.2021.00986 
[15] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, vol. 9351, Springer, 2015, pp. 234–241. [Online]. Available: https://doi.org/10.1007/978-3-319-24574-4_28
[16] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 3431–3440. [Online]. Available: https://doi.org/10.48550/arXiv.1411.4038
[17] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 2117–2125. [Online]. Available: https://doi.org/10.48550/arXiv.1612.03144 
[18] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. CVPR, 2016, pp. 770–778. [Online]. Available: https://arxiv.org/abs/1512.03385
A.      Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004.10934, 2020. [Online]. Available: https://doi.org/10.48550/arXiv.2004.10934
[19] Grishin, B. Bardintsev, and Severstal, “Severstal: Steel Defect Detection,” Kaggle Competition Dataset, 2019. [Online]. Available: https://www.kaggle.com/c/severstal-steel-defect-detection
[20] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal Loss for Dense Object Detection,” in *Proc. IEEE Int. Conf. Comput. Vis. (ICCV)*, 2017, pp. 2980–2988. [Online]. Available: https://doi.org/10.48550/arXiv.1708.02002
[21] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable Convolutional Networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 764–773. [Online]. Available: https://doi.org/10.48550/arXiv.1703.06211
[22] Q. Hou, D. Zhou, and J. Feng, “Coordinate Attention for Efficient Mobile Network Design,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 13713–13722. [Online]. Available: https://doi.org/10.48550/arXiv.2103.02907
[23] Q. Li, X. Jia, J. Zhou, L. Shen, and J. Duan, “Rediscovering BCE Loss for Uniform Classification,” arXiv preprint arXiv:2403.07289, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2403.07289
[24] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, Jun. 2017. [Online]. Available: https://doi.org/10.1109/TPAMI.2016.2577031
A.      Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, et al., “Learning Transferable Visual Models From Natural Language Supervision,” in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 8748–8763. [Online]. Available: https://arxiv.org/abs/2103.00020
[25] Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), May 2021. [Online]. Available: https://arxiv.org/abs/2010.11929
[26] Zhang, X. Bai, and X. Zhang, “Metal plate surface defect detection method based on attention mechanism,” Comput. Integr. Manuf. Syst., vol. 29, no. 3, pp. xx–xx, 2023.
[27] Z. L. Lv et al., “Lightweight adaptive activation convolution network based defect detection on polished metal surfaces,” Engineering Applications of Artificial Intelligence, vol. 133, p. 108482, 2024. doi: 10.1016/j.engappai.2024.108482
[28] M. Siddiqui, Y. Muhammad, and H. Ahn, “Faster metallic surface defect detection using deep learning with channel shuffling,” arXiv preprint, arXiv: 2406.14582, 2024.
[29] Li et al., “NGD-YOLO: An improved real-time steel surface defect detection algorithm,” Electronics, vol. 14, no. 14, p. 2859, 2025.
[30] Li et al., “Steel surface defect detection method based on improved YOLOX,” IEEE Access, vol. 12, pp. 37643–37652, 2024.
[31] S. Zhou, Y. Cai, Z. Zhang, and J. Yin, “MESC-DETR: An improved RT-DETR algorithm for steel surface defect detection”, Electronics, vol. 14, no. 11, p. 2232, 2025.
[32] Z. Li, X. Wei, M. Hassaballah, Y. Li, and X. Jiang, “A deep learning model for steel surface defect detection”, Complex & Intelligent Systems, vol. 10, no. 1, pp. 885–897, 2024.
A.      A. M. Ibrahim and J. R. Tapamo, “Transfer learning-based approach using new convolutional neural network classifier for steel surface defects classification,” Scientific African, vol. 23, p. e02066, 2024.
[33] T. Wi, M. Yang, S. Park, and J. Jeong, “D²-SPDM: Faster R-CNN-based defect detection and surface pixel defect mapping with label enhancement in steel manufacturing processes”, *Applied Sciences*, vol. 14, no. 21, p. 9836, 2024.
[34] S. Tian, P. Huang, H. Ma, J. Wang, X. Zhou, S. Zhang, J. Zhou, R. Huang, and Y. Li, “CASDD: Automatic surface defect detection using a complementary adversarial network,” *IEEE Sensors Journal*, vol. 22, no. 20, pp. 19583–19595, Oct. 2022. doi: 10.1109/JSEN.2022.3202179
[35] Chen, H. Lee, and M. Chen, “Steel surface defect detection method based on improved YOLOv9,” Scientific Reports, vol. 15, no. 1, p. 25098, 2025. DOI: 10.1038/s41598-025-10647-1
[36] L. Yi, G. Li, and M. Jiang, “An end-to-end steel strip surface defects recognition system based on convolutional neural networks,” *Steel Research International*, vol. 88, no. 2, p. 1600068, Feb. 2017. doi: 10.1002/srin.201600068
A.      A. M. Ibrahim and J. R. Tapamo, “Steel surface defect detection and classification using bag of visual words with BRISK,” in *Proc. Congress on Smart Computing Technologies*, Singapore, Dec. 2022, pp. 235–246. doi: 10.1007/978-981-99-2468-4_18
[37] M. Jeong, M. Yang, and J. Jeong, “Hybrid-DC: A hybrid framework using ResNet-50 and Vision Transformer for steel surface defect classification in the rolling process,” Electronics, vol. 13, no. 22, p. 4467, Nov. 2024. DOI: 10.3390/electronics13224467.
[38] S. T. K. Ayon, F. M. Siraj, and J. Uddin, “Steel Surface Defect Detection Using Learnable Memory Vision Transformer,” *Computers, Materials & Continua*, vol. 82, no. 1, 2025.
[39] A. Bouguettaya and H. Zarzour, “CNN-based hot-rolled steel strip surface defects classification: a comparative study between different pre-trained CNN models”, Int. J. Adv. Manuf. Technol., vol. 132, no. 1, pp. 399–419, 2024.