A Novel Approach for Data Alignment in Infrared and RGB Image Fusion Algorithms

Document Type : Original Article

Authors

1 MSc., Department of Computer Engineering, Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran

2 Assistant Professor, Department of Computer Engineering, Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran

Abstract

In recent years, human action recognition has become a key topic in the field of computer vision. However, one of the main challenges in this area is extracting effective features to enhance recognition accuracy. Infrared and RGB video data are commonly used for this purpose, yet neither of them alone provides a comprehensive representation of the scene. Therefore, combining these data types can lead to more accurate feature extraction. One effective approach to achieving this goal is through information fusion techniques. However, most human motion recognition datasets are not standardized for fusion, and the data are not properly aligned with each other.
In this study, the NTU RGB+D dataset is utilized, and a method for aligning and cropping video data is proposed to fuse infrared and RGB video frames. This method leverages inverse problem-solving techniques and body joint coordinates available in the dataset. The performance of the proposed approach is evaluated using EN, MI, SSIM, and MS-SSIM metrics. The obtained results indicate that the values of EN (7/17) and MI (13/1) demonstrate maximum information transfer and overlap. Additionally, the SSIM (0/78) and MS-SSIM (0/84) values confirm the preservation of structure and high quality of the fused data. These findings validate the effectiveness of the proposed method in enhancing video data fusion.

Keywords

Main Subjects


[1]     K. Rani and R. Sharma, “Study of different image fusion algorithm,” Int. J. Emerg. Technol. Adv. Eng., vol. 3, no. 5, pp. 288–291, May 2013.
[2]     D. Mishra and B. Palkar, “Image fusion techniques: a review,” Int. J. Comput. Appl., vol. 130, no. 9, pp. 7–13, 2015, doi: 10.5120/ijca2015907084.
[3]     J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Inf. Fusion, vol. 45, pp. 153–178, 2019, doi: 10.1016/j.inffus.2018.02.004.
[4]     S. Li, X. Kang, L. Fang, J. Hu, and H. Yin, “Pixel-level image fusion: A survey of the state of the art,” Inf. Fusion, vol. 33, pp. 100–112, 2017, doi: 10.1016/j.inffus.2016.05.004.
[5]     D. E. Nirmala and V. Vaidehi, “Comparison of Pixel-level and feature level image fusion methods,” in Proc. 2nd Int. Conf. Comput. Sustain. Global Dev. (INDIACom), New Delhi, India, Mar. 2015, pp. 743–748.
[6]     G. Xiao, D. P. Bavirisetti, G. Liu, and X. Zhang, “Decision-level image fusion,” Image Fusion, pp. 149–170, 2020.
[7]     R. Poppe, “A survey on vision-based human action recognition,” Image Vis. Comput., vol. 28, no. 6, pp. 976–990, 2010, doi: 10.1016/j.imavis.2009.11.014.
[8]     M. Karim, S. Khalid, A. Aleryani, J. Khan, I. Ullah, and Z. Ali, "Human action recognition systems: A review of the trends and state-of-the-art," IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3373199.
[9]     X. Jin, Q. Jiang, S. Yao, D. Zhou, R. Nie, J. Hai, and K. He, "A survey of infrared and visual image fusion methods," Infrared Phys. Technol., vol. 85, pp. 478-501, 2017, doi: 10.1016/j.infrared.2017.07.010.
[10] Shahroudy, J. Liu, T. T. Ng, and G. Wang, "Ntu rgb+d: A large scale dataset for 3d human activity analysis," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1010-1019, doi: 10.1109/CVPR.2016.115.
[11] He, Q. Liu, H. Li, and H. Wang, "Multimodal medical image fusion based on IHS and PCA," Procedia Eng., vol. 7, pp. 280-285, 2010, doi: 10.1016/j.proeng.2010.11.045.
[12] Lu, C. Miao, and H. Wang, "Pixel level image fusion based on linear structure tensor," in 2010 IEEE Youth Conf. Inf., Comput. Telecommun., 2010, pp. 303-306.
[13] U. Patil and U. Mudengudi, "Image fusion using hierarchical PCA," in 2011 Int. Conf. Image Inf. Process., 2011, pp. 1-6, doi: 10.1109/ICIIP.2011.6108966.
[14] W. He, W. Feng, Y. Peng, Q. Chen, G. Gu, and Z. Miao, "Multi-level image fusion and enhancement for target detection," Optik, vol. 126, no. 11–12, pp. 1203-1208, 2015, doi: 10.1016/j.ijleo.2015.02.092.
[15] Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Inf. Fusion, vol. 36, pp. 191-207, 2017, doi: 10.1016/j.inffus.2016.12.001.
[16] Z. Ahmad, A. Tabassum, L. Guan, and N. Khan, "ECG heart-beat classification using multimodal image fusion," in ICASSP 2021-2021 IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2021, pp. 1330-1334, doi: 10.1109/ICASSP39728.2021.9414709.
[17] L. Tang, X. Xiang, H. Zhang, M. Gong, and J. Ma, "DIVFusion: Darkness-free infrared and visible image fusion," Inf. Fusion, vol. 91, pp. 477-493, 2023, doi: 10.1016/j.inffus.2022.10.034.
[18] Y. Chen, L. Cheng, H. Wu, F. Mo, and Z. Chen, "Infrared and visible image fusion based on iterative differential thermal information filter," Opt. Lasers Eng., vol. 148, p. 106776, 2022, doi: 10.1016/j.optlaseng.2021.106776.
[19] H. Li and X. J. Wu, "DenseFuse: A fusion approach to infrared and visible images," IEEE Trans. Image Process., vol. 28, no. 5, pp. 2614-2623, 2019, doi: 10.1109/TIP.2019.2899946.
[20] Z. Zhao, S. Xu, C. Zhang, J. Liu, P. Li, and J. Zhang, "DIDFuse: Deep image decomposition for infrared and visible image fusion," arXiv preprint arXiv:2003.09210, 2020, doi: 10.24963/ijcai.2020/135.
[21] H. Li, X. J. Wu, and T. Durrani, "NestFuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models," IEEE Trans. Instrum. Meas., vol. 69, no. 12, pp. 9645-9656, 2020, doi: 10.1109/TIM.2020.3005230.
[22] H. Li, X. J. Wu, and J. Kittler, "RFN-Nest: An end-to-end residual fusion network for infrared and visible images," Inf. Fusion, vol. 73, pp. 72-86, 2021, doi: 10.1016/j.inffus.2021.02.023.
[23] H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2Fusion: A unified unsupervised image fusion network," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 502-518, 2020, doi: 10.1109/TPAMI.2020.3012548.
[24] R. Dang, C. Liu, M. Liu, and Q. Chen, “Channel attention and multi-scale graph neural networks for skeleton-based action recognition,” AI Commun., vol. 35, no. 3, pp. 187–205, 2022, doi: 10.3233/AIC-210250.
[25] P. Fieguth, Statistical Image Processing and Multidimensional Modeling. Springer, 2010, doi: 10.1007/978-1-4419-7294-1.
[26] J. Hadamard, Lectures on Cauchy’s Problem in Linear Partial Differential Equations, vol. 15. Yale Univ. Press, 1923, doi: 10.1063/1.3061337.
[27] M. Zolfaghari, G. L. Oliveira, N. Sedaghat, and T. Brox, “Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2904–2913, doi: 10.1109/ICCV.2017.316.
[28] P. Wang, W. Li, J. Wan, P. Ogunbona, and X. Liu, “Cooperative training of deep aggregation networks for RGB-D action recognition,” in Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.12228.
[29] F. Baradel, C. Wolf, and J. Mille, “Pose-conditioned spatio-temporal attention for human action recognition,” arXiv preprint arXiv:1703.10106, 2017, doi: 10.48550/arXiv.1703.10106.
[30] W. Ma, K. Wang, J. Li, S. X. Yang, J. Li, L. Song, and Q. Li, “Infrared and visible image fusion technology and application: A review,” Sensors, vol. 23, no. 2, p. 599, 2023, doi: 10.3390/s23020599.
[31] M. De Boissiere and R. Noumeir, “Infrared and 3D skeleton feature fusion for RGB-D action recognition,” IEEE Access, vol. 8, pp. 168297–168308, 2020, doi: 10.1109/ACCESS.2020.3023599.
[32] S. Hong, A. Ansari, G. Saavedra, and M. Martinez-Corral, “Full-parallax 3D display from stereo-hybrid 3D camera system,” Opt. Lasers Eng., vol. 103, pp. 46–54, 2018, doi: 10.1016/j.optlaseng.2017.11.010.
[33] G. Di Leo and A. Paolillo, “Uncertainty evaluation of camera model parameters,” in Proc. IEEE Int. Instrum. Meas. Technol. Conf. (I2MTC), May 2011, pp. 1–6, doi: 10.1109/IMTC.2011.5944307.
[34] M. Folk, G. Heber, Q. Koziol, E. Pourmal, and D. Robinson, “An overview of the HDF5 technology suite and its applications,” in Proc. EDBT/ICDT Workshop Array Databases, 2011, pp. 36–47, doi: 10.1145/1966895.1966900.
[35] S. Karim, G. Tong, J. Li, A. Qadir, U. Farooq, and Y. Yu, “Current advances and future perspectives of image fusion: A comprehensive review,” Inf. Fusion, vol. 90, pp. 185–217, 2023, doi: 10.1016/j.inffus.2022.09.019.