ارائه روشی نوین به منظور هم ترازی دادگان در الگوریتم‌های تلفیق تصاویر فروسرخ و RGB

نوع مقاله : مقاله پژوهشی

نویسندگان

1 کارشناسی ارشد، گروه مهندسی کامپیوتر، دانشکده فنی و مهندسی، دانشگاه شهرکرد، شهر شهرکرد، ایران

2 استادیار، گروه مهندسی کامپیوتر، دانشکده فنی و مهندسی، دانشگاه شهرکرد، شهر شهرکرد، ایران

چکیده

در سال‌های اخیر، تشخیص حرکت انسان به یکی از موضوعات مهم در حوزه بینایی ماشین تبدیل شده است. بااین‌حال، یکی از چالش‌های اساسی در این زمینه، استخراج ویژگی‌های مؤثر برای افزایش دقت تشخیص است. داده‌های ویدئویی فروسرخ و RGB معمولاً برای این منظور استفاده می‌شوند، اما هیچ‌کدام به‌تنهایی اطلاعات کاملی از صحنه ارائه نمی‌دهند. بنابراین، ترکیب این داده‌ها می‌تواند به استخراج ویژگی‌های دقیق‌تر منجر شود. یکی از راهکارهای مؤثر برای دستیابی به این هدف، استفاده از تکنیک‌های تلفیق اطلاعات است. بااین‌وجود، بیشتر مجموعه‌داده‌های تشخیص حرکت انسان برای تلفیق استانداردسازی نشده‌اند و داده‌ها به‌درستی با یکدیگر تراز نیستند. در این پژوهش، از مجموعه‌داده NTU RGB+D استفاده شده و با بهره‌گیری از تکنیک‌های مسائل معکوس و مختصات نقاط بدنی موجود در این مجموعه‌داده، روشی برای ترازسازی و برش داده‌های ویدئویی به‌منظور تلفیق دو نوع داده ویدئویی فروسرخ و RGB ارائه شده است. عملکرد روش پیشنهادی با استفاده از معیارهای EN، MI، SSIM و MS-SSIM مورد ارزیابی قرار گرفته است. نتایج به‌دست‌آمده نشان می‌دهند که مقادیر حاصله از (17/7)EN و (1/13)MI بیانگر حداکثر میزان انتقال و هم‌پوشانی اطلاعات هستند. همچنین، مقادیر (78/0)SSIM و (84/0) MS-SSIM نشان‌دهنده حفظ ساختار و کیفیت بالای داده‌های تلفیق‌شده هستند. این نتایج، کارایی روش پیشنهادی را در بهبود تلفیق داده‌های ویدئویی تأیید می‌کنند.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

A Novel Approach for Data Alignment in Infrared and RGB Image Fusion Algorithms

نویسندگان [English]

  • Raziyeh Razavi 1
  • Reza Rohani Sarvestani 2
1 MSc., Department of Computer Engineering, Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran
2 Assistant Professor, Department of Computer Engineering, Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran
چکیده [English]

In recent years, human action recognition has become a key topic in the field of computer vision. However, one of the main challenges in this area is extracting effective features to enhance recognition accuracy. Infrared and RGB video data are commonly used for this purpose, yet neither of them alone provides a comprehensive representation of the scene. Therefore, combining these data types can lead to more accurate feature extraction. One effective approach to achieving this goal is through information fusion techniques. However, most human motion recognition datasets are not standardized for fusion, and the data are not properly aligned with each other.
In this study, the NTU RGB+D dataset is utilized, and a method for aligning and cropping video data is proposed to fuse infrared and RGB video frames. This method leverages inverse problem-solving techniques and body joint coordinates available in the dataset. The performance of the proposed approach is evaluated using EN, MI, SSIM, and MS-SSIM metrics. The obtained results indicate that the values of EN (7/17) and MI (13/1) demonstrate maximum information transfer and overlap. Additionally, the SSIM (0/78) and MS-SSIM (0/84) values confirm the preservation of structure and high quality of the fused data. These findings validate the effectiveness of the proposed method in enhancing video data fusion.

کلیدواژه‌ها [English]

  • Image Fusion
  • Human Action Recognition
  • Data Alignment
[1]     K. Rani and R. Sharma, “Study of different image fusion algorithm,” Int. J. Emerg. Technol. Adv. Eng., vol. 3, no. 5, pp. 288–291, May 2013.
[2]     D. Mishra and B. Palkar, “Image fusion techniques: a review,” Int. J. Comput. Appl., vol. 130, no. 9, pp. 7–13, 2015, doi: 10.5120/ijca2015907084.
[3]     J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Inf. Fusion, vol. 45, pp. 153–178, 2019, doi: 10.1016/j.inffus.2018.02.004.
[4]     S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixel-level image fusion: A survey of the state of the art,” Inf. Fusion, vol. 33, pp. 100–112, 2017, doi: 10.1016/j.inffus.2016.05.004.
[5]     D. E. Nirmala and V. Vaidehi, “Comparison of Pixel-level and feature level image fusion methods,” in Proc. 2nd Int. Conf. Comput. Sustain. Global Dev. (INDIACom), New Delhi, India, Mar. 2015, pp. 743–748.
[6]     G. Xiao, D. P. Bavirisetti, G. Liu, and X. Zhang, “Decision-level image fusion,” Image Fusion, pp. 149–170, 2020.
[7]     R. Poppe, “A survey on vision-based human action recognition,” Image Vis. Comput., vol. 28, no. 6, pp. 976–990, 2010, doi: 10.1016/j.imavis.2009.11.014.
[8]     M. Karim, S. Khalid, A. Aleryani, J. Khan, I. Ullah, and Z. Ali, "Human action recognition systems: A review of the trends and state-of-the-art," IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3373199.
[9]     X. Jin, Q. Jiang, S. Yao, D. Zhou, R. Nie, J. Hai, and K. He, "A survey of infrared and visual image fusion methods," Infrared Phys. Technol., vol. 85, pp. 478-501, 2017, doi: 10.1016/j.infrared.2017.07.010.
[10] Shahroudy, J. Liu, T. T. Ng, and G. Wang, "Ntu rgb+d: A large scale dataset for 3d human activity analysis," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1010-1019, doi: 10.1109/CVPR.2016.115.
[11] He, Q. Liu, H. Li, and H. Wang, "Multimodal medical image fusion based on IHS and PCA," Procedia Eng., vol. 7, pp. 280-285, 2010, doi: 10.1016/j.proeng.2010.11.045.
[12] Lu, C. Miao, and H. Wang, "Pixel level image fusion based on linear structure tensor," in 2010 IEEE Youth Conf. Inf., Comput. Telecommun., 2010, pp. 303-306.
[13] U. Patil and U. Mudengudi, "Image fusion using hierarchical PCA," in 2011 Int. Conf. Image Inf. Process., 2011, pp. 1-6, doi: 10.1109/ICIIP.2011.6108966.
[14] W. He, W. Feng, Y. Peng, Q. Chen, G. Gu, and Z. Miao, "Multi-level image fusion and enhancement for target detection," Optik, vol. 126, no. 11–12, pp. 1203-1208, 2015, doi: 10.1016/j.ijleo.2015.02.092.
[15] Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Inf. Fusion, vol. 36, pp. 191-207, 2017, doi: 10.1016/j.inffus.2016.12.001.
[16] Z. Ahmad, A. Tabassum, L. Guan, and N. Khan, "ECG heart-beat classification using multimodal image fusion," in ICASSP 2021-2021 IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2021, pp. 1330-1334, doi: 10.1109/ICASSP39728.2021.9414709.
[17] L. Tang, X. Xiang, H. Zhang, M. Gong, and J. Ma, "DIVFusion: Darkness-free infrared and visible image fusion," Inf. Fusion, vol. 91, pp. 477-493, 2023, doi: 10.1016/j.inffus.2022.10.034.
[18] Y. Chen, L. Cheng, H. Wu, F. Mo, and Z. Chen, "Infrared and visible image fusion based on iterative differential thermal information filter," Opt. Lasers Eng., vol. 148, p. 106776, 2022, doi: 10.1016/j.optlaseng.2021.106776.
[19] H. Li and X. J. Wu, "DenseFuse: A fusion approach to infrared and visible images," IEEE Trans. Image Process., vol. 28, no. 5, pp. 2614-2623, 2019, doi: 10.1109/TIP.2019.2899946.
[20] Z. Zhao, S. Xu, C. Zhang, J. Liu, P. Li, and J. Zhang, "DIDFuse: Deep image decomposition for infrared and visible image fusion," arXiv preprint arXiv:2003.09210, 2020, doi: 10.24963/ijcai.2020/135.
[21] H. Li, X. J. Wu, and T. Durrani, "NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models," IEEE Trans. Instrum. Meas., vol. 69, no. 12, pp. 9645-9656, 2020, doi: 10.1109/TIM.2020.3005230.
[22] H. Li, X. J. Wu, and J. Kittler, "RFN-Nest: An end-to-end residual fusion network for infrared and visible images," Inf. Fusion, vol. 73, pp. 72-86, 2021, doi: 10.1016/j.inffus.2021.02.023.
[23] H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2Fusion: A unified unsupervised image fusion network," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 502-518, 2020, doi: 10.1109/TPAMI.2020.3012548.
[24] R. Dang, C. Liu, M. Liu, and Q. Chen, “Channel attention and multi-scale graph neural networks for skeleton-based action recognition,” AI Commun., vol. 35, no. 3, pp. 187–205, 2022, doi: 10.3233/AIC-210250.
[25] P. Fieguth, Statistical Image Processing and Multidimensional Modeling. Springer, 2010, doi: 10.1007/978-1-4419-7294-1.
[26] J. Hadamard, Lectures on Cauchy’s Problem in Linear Partial Differential Equations, vol. 15. Yale Univ. Press, 1923, doi: 10.1063/1.3061337.
[27] M. Zolfaghari, G. L. Oliveira, N. Sedaghat, and T. Brox, “Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2904–2913, doi: 10.1109/ICCV.2017.316.
[28] P. Wang, W. Li, J. Wan, P. Ogunbona, and X. Liu, “Cooperative training of deep aggregation networks for RGB-D action recognition,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.12228.
[29] F. Baradel, C. Wolf, and J. Mille, “Pose-conditioned spatio-temporal attention for human action recognition,” arXiv preprint arXiv:1703.10106, 2017, doi: 10.48550/arXiv.1703.10106.
[30] W. Ma, K. Wang, J. Li, S. X. Yang, J. Li, L. Song, and Q. Li, “Infrared and visible image fusion technology and application: A review,” Sensors, vol. 23, no. 2, p. 599, 2023, doi: 10.3390/s23020599.
[31] M. De Boissiere and R. Noumeir, “Infrared and 3D skeleton feature fusion for RGB-D action recognition,” IEEE Access, vol. 8, pp. 168297–168308, 2020, doi: 10.1109/ACCESS.2020.3023599.
[32] S. Hong, A. Ansari, G. Saavedra, and M. Martinez-Corral, “Full-parallax 3D display from stereo-hybrid 3D camera system,” Opt. Lasers Eng., vol. 103, pp. 46–54, 2018, doi: 10.1016/j.optlaseng.2017.11.010.
[33] G. Di Leo and A. Paolillo, “Uncertainty evaluation of camera model parameters,” in Proc. IEEE Int. Instrum. Meas. Technol. Conf. (I2MTC), May 2011, pp. 1–6, doi: 10.1109/IMTC.2011.5944307.
[34] M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robinson, “An overview of the HDF5 technology suite and its applications,” in Proc. EDBT/ICDT Workshop Array Databases, 2011, pp. 36–47, doi: 10.1145/1966895.1966900.
[35] S. Karim, G. Tong, J. Li, A. Qadir, U. Farooq, and Y. Yu, “Current advances and future perspectives of image fusion: A comprehensive review,” Inf. Fusion, vol. 90, pp. 185–217, 2023, doi: 10.1016/j.inffus.2022.09.019.