پیاده‌سازی نسخه کم‌عمق شبکه عصبی کانولوشنی سبک‌وزن SqueezeNet بر روی پلتفرم Versal برای کاربردهای رایانش لبه

قاسمی, زهرا; سلیمی شهرکی, عاطفه; عباسی, مهدی; دستجردی, محمد لعلی

doi:10.22034/abmir.2026.24030.1191

پیاده‌سازی نسخه کم‌عمق شبکه عصبی کانولوشنی سبک‌وزن SqueezeNet بر روی پلتفرم Versal برای کاربردهای رایانش لبه

نوع مقاله : مقاله پژوهشی

نویسندگان

¹ گروه مهندسی برق، واحد اصفهان (خوراسگان)، دانشگاه آزاد اسلامی، اصفهان، ایران

² گروه مهندسی کامپیوتر، دانشکده مهندسی، دانشگاه بوعلی سینا، همدان، ایران و پژوهشکده دانش‌های بنیادی، پژوهشکده علوم کامپیوتر، تهران، ایران

10.22034/abmir.2026.24030.1191

چکیده

در سال‌های اخیر، شبکه‌های عصبی عمیق به‌طور گسترده در سامانه‌های بینایی رایانه مورد استفاده قرار گرفته‌اند. با اینحال، پیاده‌سازی این شبکه‌ها برروی وسایل قابل‌حمل به‌دلیل محدودیت‌های منابع محاسباتی، ظرفیت حافظه و توان مصرفی با چالش‌های جدی مواجه است. یکی از راهکارهای مؤثر برای غلبه بر این محدودیت‌ها، بهره‌گیری از پلتفرم‌های پیشرفته ناهمگن مانند AMD Versal است. در این پژوهش، پیاده‌سازی یک نسخه کم‌عمق از شبکه SqueezeNet (حاوی چهار ماژول Fire) برروی تراشه XCVE2302 از پلتفرم AMD Versal VD100 مورد بررسی قرار گرفته است. ابتدا، شبکه با استفاده از کتابخانه PyTorch در محیط Google Colab پیاده‌سازی و با مجموعه‌داده CIFAR-10 آموزش داده شد. سپس، معماری شبکه به‌همراه وزن‌های آموزش‌دیده با استفاده از ابزار Vitis HLS در نرم‌افزار Vivado 2023 به‌صورت سخت‌افزاری پیاده‌سازی گردید. پیاده‌سازی نهایی با فرکانس کاری 100 مگاهرتز انجام شد که منجر به استفاده از 129 بلوک حافظه BRAM، 136 بلوک حافظه URAM و توان مصرفی حدود 984/3 وات گردید. نتایج به‌دست‌آمده نشان می‌دهد که با طراحی شبکه‌های سبک‌وزن و مدیریت آگاهانه منابع حافظه می‌توان شتاب‌دهنده‌های کارآمدی را برروی پلتفرم Versal پیاده‌سازی کرد. در ادامه این مسیر، بهره‌برداری حداکثری از قابلیت‌های معماری Versal می‌تواند نقش مؤثری در کاهش چالش‌های پیاده‌سازی شبکه‌های عصبی برروی شتاب‌دهنده‌های سخت‌افزاری ایفا کند.

کلیدواژه‌ها

موضوعات

یادگیری عمیق

عنوان مقاله [English]

A Lightweight Implementation of a Reduced-Depth SqueezeNet CNN on the Versal Platform for Edge Computing Applications

نویسندگان [English]

Zahra Ghasemi ¹
Atefeh Salimi Shahraki ¹
Mahdi Abbasi ²
Mohammad Lali Dastjerdi ¹

¹ Department of Electrical Engineering, Isf.C., Islamic Azad University, Isfahan, Iran

² Department of Computer Engineering, Engineering Faculty, Bu-Ali Sina University Hamedan, Iran and School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

چکیده [English]

In recent years, deep neural networks have been widely adopted in computer vision applications. However, deploying these networks on portable and edge devices faces serious challenges due to limited computational resources, memory capacity, and power consumption constraints. One effective approach to overcoming these limitations is the use of advanced heterogeneous platforms such as AMD Versal. In this work, the implementation of a shallow version of the SqueezeNet network, consisting of four Fire modules, on the XCVE2302 device of the AMD Versal VD100 platform is investigated. First, the network was implemented using the PyTorch library in the Google Colab environment and trained on the CIFAR-10 dataset. Subsequently, the network architecture along with the trained weights was mapped to hardware using the Vitis HLS tool within Vivado 2023. The final implementation operates at a clock frequency of 100 MHz and utilizes 129 BRAM blocks and 136 URAM blocks, with a total power consumption of approximately 3.984 W. The obtained results demonstrate that by designing lightweight neural networks and employing efficient memory resource management, it is possible to implement efficient accelerators on the Versal platform. Furthermore, fully exploiting the architectural capabilities of AMD Versal can significantly reduce the challenges associated with deploying deep neural networks on hardware accelerators.

کلیدواژه‌ها [English]

Convolutional Neural Network
SqueezeNet
Versal ACAP
FPGA Accelerator
Edge AI

مراجع

[1] S. V. Mahadevkar et al., "A review on machine learning styles in computer vision—techniques and future directions," Ieee Access, vol. 10, pp. 107293-107329, 2022, doi: 10.1109/ACCESS.2022.3209825.

[2] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017, doi: 10.1109/JPROC.2017.2761740.

[3] J. Lee, S. Kang, J. Lee, D. Shin, D. Han, and H.-J. Yoo, "The hardware and algorithm co-design for energy-efficient DNN processor on edge/mobile devices," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 10, pp. 3458-3470, 2020, doi: 10.1109/TCSI.2020.3021397.

[4] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size," arXiv preprint arXiv:1602.07360, 2016, doi: 10.48550/arXiv.1602.07360.

[5] B. Koonce, "SqueezeNet," in Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization. Berkeley, CA: Apress, 2021, pp. 73-85.

[6] S. Mittal, P. Rajput, and S. Subramoney, "A survey of deep learning on CPUs: Opportunities and co-optimizations," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5095-5115, 2021, doi: 10.1109/TNNLS.2021.3071762.

[7] W. Jeon, G. Ko, J. Lee, H. Lee, D. Ha, and W. W. Ro, "Deep learning with GPUs," in Advances in computers, vol. 122: Elsevier, 2021, pp. 167-215.

[8] Shawahna, S. M. Sait, and A. El-Maleh, "FPGA-based accelerators of deep learning networks for learning and classification: A review," ieee Access, vol. 7, pp. 7823-7859, 2018, doi: 10.1109/ACCESS.2018.2890150.

[9] M. Hussain, "Sustainable machine vision for industry 4.0: a comprehensive review of convolutional neural networks and hardware accelerators in computer vision," AI, vol. 5, no. 3, 2024, doi: 10.3390/ai5030064.

[10] Versal AI Edge Series Development Board User Manual VD100, ALINX ELECTRONIC LIMITED, 2024. [Online]. Available: https://www.alinx.com/public/upload/file/VD100_User_Manual_REV1.0.pdf.

[11] "The CIFAR-10 dataset." https://www.cs.toronto.edu/~kriz/cifar.html (accessed 2025).

[12] Y. Meng, C. Yang, S. Xiang, J. Wang, K. Mei, and L. Geng, "An efficient CNN accelerator achieving high PE utilization using a dense-/sparse-aware redundancy reduction method and data–index decoupling workflow," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 31, no. 10, pp. 1537-1550, 2023, doi: 10.1109/TVLSI.2023.3298509.

[13] S. H. Hozhabr and R. Giorgi, "A Survey on Real-Time Object Detection on FPGAs," IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3544515.

[14] L. Gao, Z. Luo, and L. Wang, "Convolutional Neural Network Acceleration Techniques Based on FPGA Platforms: Principles, Methods, and Challenges," Information, vol. 16, no. 10, p. 914, 2025, doi: 10.3390/info16100914.

[15] G. C. Marinó, A. Petrini, D. Malchiodi, and M. Frasca, "Deep neural networks compression: A comparative survey and choice recommendations," Neurocomputing, vol. 520, pp. 152-170, 2023, doi: 10.1016/j.neucom.2022.11.072.

[16] K. Vineetha, M. M. S. Reddy, C. Ramesh, and D. G. Kurup, "An efficient design methodology to speed up the FPGA implementation of artificial neural networks," Engineering Science and Technology, an International Journal, vol. 47, p. 101542, 2023, doi: 10.1016/j.jestch.2023.101542.

[17] Z. Li et al., "A high-performance pixel-level fully pipelined hardware accelerator for neural networks," IEEE Transactions on Neural Networks and Learning Systems, 2024, doi: 10.1109/TNNLS.2024.3423664.

[18] V. H. Kim and K. K. Choi, "A reconfigurable CNN-based accelerator design for fast and energy-efficient object detection system on mobile FPGA," IEEE Access, vol. 11, pp. 59438-59445, 2023, doi: 10.1109/ACCESS.2023.3285279.

[19] T. Sledevič, A. Serackis, and D. Plonis, "FPGA implementation of a convolutional neural network and its application for pollen detection upon entrance to the beehive," Agriculture, vol. 12, no. 11, p. 1849, 2022, doi: 10.3390/agriculture12111849.

[20] G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017, doi: 10.48550/arXiv.1704.04861.

[21] X. Zhang, X. Zhou, M. Lin, and J. Sun, "Shufflenet: An extremely efficient convolutional neural network for mobile devices," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856, doi: 10.48550/arXiv.1707.01083.

[22] D. Gschwend, "Zynqnet: An fpga-accelerated embedded convolutional neural network," arXiv preprint arXiv:2005.06892, 2020, doi: 10.48550/arXiv.2005.06892.

[23] L. H. Crockett, R. A. Elliot, M. A. Enderwitz, and R. W. Stewart, The Zynq book: embedded processing with the ARM Cortex-A9 on the Xilinx Zynq-7000 all programmable SoC. Strathclyde Academic Media, 2014.

[24] P. G. Mousouliotis and L. P. Petrou, "Squeezejet: High-level synthesis accelerator design for deep convolutional neural networks," in International Symposium on Applied Reconfigurable Computing, 2018: Springer, pp. 55-66, doi: 10.48550/arXiv.1805.08695.

[25] P. Mousouliotis, N. Tampouratzis, and I. Papaefstathiou, "SqueezeJet-3: An HLS-based accelerator for edge CNN applications on SoC FPGAs," in 2023 XXIX International Conference on Information, Communication and Automation Technologies (ICAT), 2023: IEEE, pp. 1-6, doi: 10.1109/ICAT57854.2023.10171329.

[26] Z. Zhao et al., "A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow," IEEE Transactions on Circuits and Systems I: Regular Papers, 2025, doi: 10.48550/arXiv.2407.19449.

[27] L. Zhou, "Design and Implementation of Embedded Image Processing Chip Based on Lightweight MobileNetV3," in 2025 6th International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), 2025: IEEE, pp. 309-313, doi: 10.1109/ICBASE66587.2025.11181362.

[28] R. Yarnell, M. Hossain, and R. F. DeMara, "Image Quantization Tradeoffs in a YOLO-based FPGA Accelerator Framework," in 2023 24th International Symposium on Quality Electronic Design (ISQED), 2023: IEEE, pp. 1-7, doi: 10.1109/ISQED57927.2023.10129324.

[29] A. Parameshwara and S. H. Mokashi, "FPGA-Accelerated RISC-V ISA Extensions for Efficient Neural Network Inference on Edge Devices," arXiv preprint arXiv:2511.06955, 2025, doi: 10.48550/arXiv.2511.06955.

[30] H. Yan and Y. Ma, "A reconfigurable heterogeneous computing-hardware design with Aimed VGG16 acceleration," in 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), 2021: IEEE, pp. 274-278, doi: 10.1109/CEI52496.2021.9574533.

[31] "AMD Versal AI Edge." https://www.amd.com/en/products/adaptive-socs-and-fpgas/versal/ai-edge-series.html (accessed 2025).

[32] T. D. S. Ramos, "Implementation of Convolutional Neural Networks on a Versal Device," Universidade do Porto (Portugal), 2023.

[33] A. Al-Zoubi, G. Martino, F. H. Bahnsen, J. Zhu, H. Schlarb, and G. Fey, "Cnn implementation and analysis on xilinx versal acap at european xfel," in 2022 IEEE 35th International System-on-Chip Conference (SOCC), 2022: IEEE, pp. 1-6, doi: 10.1109/SOCC56010.2022.9908101.

[34] W. Zhang, Y. Liu, and Z. Bao, "Cat: Customized transformer accelerator framework on versal acap," arXiv preprint arXiv:2409.09689, 2024, doi: 10.48550/arXiv.2409.09689.

[35] N. Perryman, C. Wilson, and A. George, "Evaluation of xilinx versal architecture for next-gen edge computing in space," in 2023 IEEE aerospace conference, 2023: IEEE, pp. 1-11, doi: 10.1109/AERO55745.2023.10115906.

[36] T. Knopp, J. Chu, and S. Ahmad, "AMD versal™ AI edge series gen 2 for vision and automotive," in 2024 IEEE Hot Chips 36 Symposium (HCS), 2024: IEEE Computer Society, pp. 1-28, doi: 10.1109/hcs61935.2024.10665274.

[37] P. Dong et al., "EQ-ViT: Algorithm-hardware co-design for end-to-end acceleration of real-time vision transformer inference on Versal ACAP architecture," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 11, pp. 3949-3960, 2024, doi: 10.1109/tcad.2024.3443692.

[38] Z. Bao, T. Zang, Y. Liu, and W. Zhang, "Efficient Number Theoretic Transform accelerator on the versal platform powered by the AI Engine," Future Generation Computer Systems, vol. 166, p. 107728, 2025, doi: 10.1016/j.future.2025.107728.

[39] M. Petry, G. Wuwer, A. Koch, P. Gest, M. Ghiglione, and M. Werner, "Accelerated deep-learning inference on the versal adaptive SoC in the space domain," in 2023 European Data Handling & Data Processing Conference (EDHPC), 2023: IEEE, pp. 1-8, doi: 10.23919/EDHPC59100.2023.10396011.

[40] M. Zhang, L. Li, H. Wang, Y. Liu, H. Qin, and W. Zhao, "Optimized compression for implementing convolutional neural networks on FPGA," Electronics, vol. 8, no. 3, p. 295, 2019, doi: 10.3390/electronics8030295.

دوره 4، شماره 1
بهار و تابستان 1405
اردیبهشت 1405
صفحه 113-128

تعداد مشاهده مقاله: 137
تعداد دریافت فایل اصل مقاله: 106

پیاده‌سازی نسخه کم‌عمق شبکه عصبی کانولوشنی سبک‌وزن SqueezeNet بر روی پلتفرم Versal برای کاربردهای رایانش لبه

A Lightweight Implementation of a Reduced-Depth SqueezeNet CNN on the Versal Platform for Edge Computing Applications

مراجع

دوره 4، شماره 1
بهار و تابستان 1405
اردیبهشت 1405
صفحه 113-128

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

پیاده‌سازی نسخه کم‌عمق شبکه عصبی کانولوشنی سبک‌وزن SqueezeNet بر روی پلتفرم Versal برای کاربردهای رایانش لبه

A Lightweight Implementation of a Reduced-Depth SqueezeNet CNN on the Versal Platform for Edge Computing Applications

مراجع

دوره 4، شماره 1بهار و تابستان 1405اردیبهشت 1405صفحه 113-128

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 4، شماره 1
بهار و تابستان 1405
اردیبهشت 1405
صفحه 113-128