مروری بر روش‌های یادگیری عمیق برای تشخیص خشونت

نوع مقاله : مقاله پژوهشی

نویسندگان

بخش برق، دانشگاه شهید باهنر کرمان، کرمان، ایران

چکیده

با رشد بسیار سریع سیستم های نظارت ویدئویی برای نظارت بر رفتارهای انسانی، تقاضا برای چنین سیستم هایی که قادر به تشخیص رخدادهای خشونت آمیز به صورت خودکار باشند در حال افزایش می باشد. تشخیص خشونت یکی از حوزه های تحقیقاتی فعال در یادگیری ماشین و پردازش تصویر برای جذب محققان جدید است. در واقع، روش های تشخیص خشونت به دو دسته عمده تقسیم می شوند که عبارتند از: روش های یادگیری ماشینی سنتی و روش های یادگیری عمیق. در این مقاله، روش‌های یادگیری عمیق ارائه می شوند و گوناگونی روش ها و ساختارهای شبکه های عصبی عمیق در این خصوص بررسی می شوند. درابتدا روشهای سنتی و عمیق با یکدیگر مقایسه می شوند و برتری روشهای عمیق بر روشهای سنتی از جنبه های مختلف مورد بررسی قرار می گیرد. سپس ساختارهای مختلف شبکه های عمیق در خصوص تشخیص خشونت ارائه می گردد. علاوه بر این، مجموعه داده های موجود برای تحلیل خشونت در ویدئو ارائه می شوند. در نهایت، بحث در مورد تحقیقات انجام شده بیان می شود که می‌تواند برای گسترش کارهای آینده مفید باشند

کلیدواژه‌ها


عنوان مقاله [English]

A Review through Deep Learning Techniques for Violence Detection

چکیده [English]

With the rapid growth of video systems to monitor human behaviors, demands are increased on such systems which can detect violence events automatically. The violence detection is one of the active research area in machine learning and image processing to attract new researchers. The methods of violence detection are divided into two major categories which are traditional machine learning techniques and deep learning methods. In this article, deep learning methods have been reviewed and the variety of methods and structures of deep neural networks have been examined in this area. First, traditional and deep methods are compared with each other, and the superiority of deep methods over traditional methods is investigated from different aspects. Then, different structures of deep networks have been investigated regarding the detection of violence. Moreover, the available datasets for the analysis of violence in video are also introduced. Finally, it is discussed about the conducted research that can be useful for the development of future works.

کلیدواژه‌ها [English]

  • violence detection
  • deep learning
  • violent behavior detection
  • surveillance systems
  • machine learning
[1] O. Deniz, I. Serrano, G. Bueno, and T.-K. Kim, “Fast violence detection in video,” in 2014 International Conference on Computer Vision Theory and Applications (VISAPP), 2014, vol. 2, pp. 478–485.
[2] T. Hassner, Y. Itcher, and O. Kliper-Gross, “Violent flows: Real-time detection of violent crowd behavior,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 1–6, doi: 10.1109/CVPRW.2012.6239348.
[3] Y. Gao, H. Liu, X. Sun, C. Wang, and Y. Liu, “Violence detection using Oriented VIolent Flows,” Image and Vision Computing, vol. 48–49. Elsevier Ltd, pp. 37–41, Apr. 01, 2016, doi: 10.1016/j.imavis.2016.01.006.
[4] J. Mahmoodi and A. Salajeghe, “A classification method based on optical flow for violence detection,” Expert Syst. Appl., vol. 127, pp. 121–127, Aug. 2019, doi: 10.1016/j.eswa.2019.02.032.
[5] A. Ben Mabrouk and E. Zagrouba, “Spatio-temporal feature using optical flow based distribution for violence detection,” Pattern Recognit. Lett., vol. 92, pp. 62–67, Jun. 2017, doi: 10.1016/j.patrec.2017.04.015.
[6] E. Bermejo Nievas, O. Deniz Suarez, G. Bueno García, and R. Sukthankar, “Violence detection in video using computer vision techniques,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, vol. 6855 LNCS, no. PART 2, pp. 332–339, doi: 10.1007/978-3-642-23678-5_39.
[7] H. Wang, A. Kläser, C. Schmid, and C. L. Liu, “Action recognition by dense trajectories,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 3169–3176, 2011, doi: 10.1109/CVPR.2011.5995407.
[8] J. Mahmoodi, H. Nezamabadi-pour, and D. Abbasi-Moghadam, “Violence detection in videos using interest frame extraction and 3D convolutional neural network,” Multimed. Tools Appl., vol. 81, no. 15, pp. 20945–20961, 2022, doi: 10.1007/s11042-022-12532-9.
[9] S. M. Mohtavipour, M. Saeidi, and A. Arabsorkhi, “A multi-stream CNN for deep violence detection in video sequences using handcrafted features,” Vis. Comput., no. 0123456789, 2021, doi: 10.1007/s00371-021-02266-4.
[10] D. Tran, H. Wang, M. Feiszli, and L. Torresani, “Video Classification with Channel-Separated Convolutional Networks,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-October, pp. 5551–5560, Apr. 2019, doi: 10.1109/ICCV.2019.00565.
[11] M. S. Kang, R. H. Park, and H. M. Park, “Efficient Spatio-Temporal Modeling Methods for Real-Time Violence Recognition,” IEEE Access, vol. 9, pp. 76270–76285, 2021, doi: 10.1109/ACCESS.2021.3083273.
[12] G. M. Basavaraj and A. Kusagur, “Vision based surveillance system for detection of human fall,” in 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), 2017, pp. 1516–1520, doi: 10.1109/RTEICT.2017.8256851.
[13] P. A. Dhulekar, S. T. Gandhe, N. Sawale, V. Shinde, and S. Khute, “Surveillance System for Detection of Suspicious Human Activities at War Field,” in 2018 International Conference On Advances in Communication and Computing Technology (ICACCT), 2018, pp. 357–360, doi: 10.1109/ICACCT.2018.8529632.
[14] H. Wang and C. Schmid, “Action Recognition with Improved Trajectories,” in 2013 IEEE International Conference on Computer Vision, 2013, pp. 3551–3558, doi: 10.1109/ICCV.2013.441.
[15] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale Video Classification with Convolutional Neural Networks.” pp. 1725–1732, 2014, Accessed: Jan. 12, 2022. [Online]. Available: http://cs.stanford.edu/people/karpathy/deepvideo.
[16] K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks for Action Recognition in Videos,” Adv. Neural Inf. Process. Syst., vol. 1, no. January, pp. 568–576, Jun. 2014, Accessed: Apr. 25, 2021. [Online]. Available: http://arxiv.org/abs/1406.2199.
[17] J. Donahue et al., “Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 677–691, 2017, doi: 10.1109/TPAMI.2016.2599174.
[18] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 4489–4497, 2015, doi: 10.1109/ICCV.2015.510.
[19] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional Two-Stream Network Fusion for Video Action Recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-December, pp. 1933–1941, Apr. 2016, doi: 10.1109/CVPR.2016.213.
[20] Y. Zhu, Z. Lan, S. Newsam, and A. Hauptmann, “Hidden Two-Stream Convolutional Networks for Action Recognition,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11363 LNCS, pp. 363–378, Apr. 2017, doi: 10.1007/978-3-030-20893-6_23.
[21] J. Carreira and A. Zisserman, “Quo Vadis, action recognition? A new model and the kinetics dataset,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 4724–4733, 2017, doi: 10.1109/CVPR.2017.502.
[22] C. Ding, S. Fan, M. Zhu, W. Feng, and B. Jia, “Violence Detection in Video by Using 3D Convolutional Neural Networks,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8888, pp.
551–558, 2014, doi: 10.1007/978-3-319-14364-4_53.
[23] W. Song, D. Zhang, X. Zhao, J. Yu, R. Zheng, and A. Wang, “A novel violent video detection scheme based on modified 3D convolutional neural networks,” IEEE Access, vol. 7, pp. 39172–39179, 2019.
[24] F. U. M. Ullah, A. Ullah, K. Muhammad, I. U. Haq, and S. W. Baik, “Violence detection using spatiotemporal features with 3D convolutional neural network,” Sensors (Switzerland), vol. 19, no. 11, Jun. 2019, doi: 10.3390/s19112472.
[25] T. Z. Ehsan, M. Nahvi, and S. M. Mohtavipour, “DABA-Net: Deep Acceleration-Based AutoEncoder Network for Violence Detection in Surveillance Cameras,” Iran. Conf. Mach. Vis. Image Process. MVIP, vol. 2022-Febru, no. February, 2022, doi: 10.1109/MVIP53647.2022.9738791.
[26] A. S. Keçeli and A. Kaya, “Violent activity detection with transfer learning method,” Electron. Lett., vol. 53, no. 15, pp. 1047–1048, Jul. 2017, doi: 10.1049/el.2017.0970.
[27] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Communications of the ACM, 2017, vol. 60, no. 6, pp. 84–90, doi: 10.1145/3065386.
[28]  Z. Meng, J. Yuan, and Z. Li, “Trajectory-Pooled Deep Convolutional Networks for Violence Detection in Videos,” in Computer Vision Systems, 2017, vol. 10528 LNCS, pp. 437–447, doi: 10.1007/978-3-319-68345-4_39.
[29] L. Wang et al., “Temporal segment networks: Towards good practices for deep action recognition,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9912 LNCS, pp. 20–36, 2016, doi: 10.1007/978-3-319-46484-8_2.
[30] A. Traore, M. A. Akhloufi, A. Traoré, M. A. Akhloufi, A. Traore, and M. A. Akhloufi, “Violence Detection in Videos using Deep Recurrent and Convolutional Neural Networks,” in 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2020, vol. 2020-Octob, pp. 154–159, doi: 10.1109/SMC42975.2020.9282971.
[31] M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp. 10691–10700, 2019.
[32] A.-M. R. Abdali and R. F. Al-Tuma, “Robust Real-Time Violence Detection in Video Using CNN And LSTM,” in 2019 2nd Scientific Conference of Computer Sciences (SCCS), 2019, pp. 104–108, doi: 10.1109/SCCS.2019.8852616.
[33] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14, 2015.
[34] E. Ditsanthia, L. Pipanmaekaporn, and S. Kamonsantiroj, “Video Representation Learning for CCTV-Based Violence Detection,” in 2018 3rd Technology Innovation Management and Engineering Science International Conference (TIMES-iCON), 2018, pp. 1–5, doi: 10.1109/TIMES-iCON.2018.8621751.
[35] M. Asad, J. Yang, J. He, P. Shamsolmoali, and X. He, “Multi-frame feature-fusion-based model for violence detection,” Vis. Comput., vol. 37, no. 6, pp. 1415–1431, 2021, doi: 10.1007/s00371-020-01878-6.
[36] S. A. Sumon, R. Goni, N. Bin Hashem, T. Shahria, and R. M. Rahman, “Violence Detection by Pretrained Modules with Different Deep Learning Approaches,” Vietnam J. Comput. Sci., vol. 7, no. 1, pp. 19–40, Oct. 2020, doi: 10.1142/S2196888820500013.
[37] A. J. Naik and M. T. Gopalakrishna, “Deep-violence: individual person violent activity detection in video,” Multimed. Tools Appl., vol. 80, no. 12, pp. 18365–18380, 2021, doi: 10.1007/s11042-021-10682-w.
[38] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, 2005, vol. 2, pp. 1395-1402 Vol. 2, doi: 10.1109/ICCV.2005.28.
[39] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local SVM approach,” in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004, vol. 3, pp. 32-36 Vol.3, doi: 10.1109/ICPR.2004.1334462.
[40] S. Xie, C. Sun, J. Huang, Z. Tu, and K. Murphy, “Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11219 LNCS, no. 1, pp. 318–335, 2018, doi: 10.1007/978-3-030-01267-0_19.
[41] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” Adv. Neural Inf. Process. Syst., vol. 1, no. January, pp. 568–576, 2014.
[42] X. Xu, X. Wu, G. Wang, and H. Wang, “Violent Video Classification Based on Spatial-Temporal Cues Using Deep Learning,” in 2018 11th International Symposium on Computational Intelligence and Design (ISCID), 2018, vol. 01, pp. 319–322, doi: 10.1109/ISCID.2018.00079.
[43] Z. Dong, J. Qin, and Y. Wang, “Multi-stream Deep Networks for Person to Person Violence Detection in Videos,” Commun. Comput. Inf. Sci., vol. 662, pp. 517–531, 2016, doi: 10.1007/978-981-10-3002-4_43.
[44] H. Li, J. Wang, J. Han, J. Zhang, Y. Yang, and Y. Zhao, “A novel multi-stream method for violent interaction detection using deep learning,” Meas. Control (United Kingdom), vol. 53, no. 5–6, pp. 796–806, 2020, doi: 10.1177/0020294020902788.
[45] H. Weytjens and J. De Weerdt, “Process Outcome Prediction: CNN vs. LSTM (with Attention),” Lect. Notes Bus. Inf. Process., vol. 397, pp. 321–333, 2020, doi: 10.1007/978-3-030-66498-5_24.
[46] T. Senst, V. Eiselein, A. Kuhn, and T. Sikora, “Crowd Violence Detection Using Global Motion-Compensated Lagrangian Features and Scale-Sensitive Video-Level Representation,” IEEE Trans. Inf. Forensics Secur., vol. 12, no. 12, pp. 2945–2956, 2017, doi: 10.1109/TIFS.2017.2725820.
[47] K. Deepak, L. K. P. Vignesh, and S. Chandrakala, “Autocorrelation of gradients based violence detection in surveillance videos,” ICT Express, vol. 6, no. 3, pp. 155–159, Sep. 2020, doi: 10.1016/J.ICTE.2020.04.014.
[48] L. Ye, T. Liu, T. Han, H. Ferdinando, T. Seppänen, and E. Alasaarela, “Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences,” Remote Sens. 2021, Vol. 13, Page 628, vol. 13, no. 4, p. 628, Feb. 2021, doi: 10.3390/RS13040628.
[49] Y. Gao, H. Liu, X. Sun, C. Wang, and Y. Liu, “Violence detection using oriented violent flows,” Image Vis. Comput., vol. 48, pp. 37–41, Apr. 2016, doi: 10.1016/j.imavis.2016.01.006.
[50] S. Accattoli, P. Sernani, N. Falcionelli, D. N. Mekuria, and A. F. Dragoni, “Violence Detection in Videos by Combining 3D Convolutional Neural Networks and Support Vector Machines,” https://doi.org/10.1080/08839514.2020.1723876, vol. 34, no. 4, pp. 329–344, Mar. 2020, doi: 10.1080/08839514.2020.1723876.
[51]S. R. Dinesh Jackson et al., “Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM,” Comput. Networks, vol. 151, pp. 191–200, Mar. 2019, doi: 10.1016/J.COMNET.2019.01.028.
[52]C. W. Wu, “ProdSumNet: reducing model parameters in deep neural networks via product-of-sums matrix