[1] S. V. Mahadevkar et al., "A review on machine learning styles in computer vision—techniques and future directions," Ieee Access, vol. 10, pp. 107293-107329, 2022, doi: 10.1109/ACCESS.2022.3209825.
[2] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, "Efficient processing of deep neural networks: A tutorial and survey," Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, 2017, doi: 10.1109/JPROC.2017.2761740.
[3] J. Lee, S. Kang, J. Lee, D. Shin, D. Han, and H.-J. Yoo, "The hardware and algorithm co-design for energy-efficient DNN processor on edge/mobile devices," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 10, pp. 3458-3470, 2020, doi: 10.1109/TCSI.2020.3021397.
[4] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size," arXiv preprint arXiv:1602.07360, 2016, doi: 10.48550/arXiv.1602.07360.
[5] B. Koonce, "SqueezeNet," in Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization. Berkeley, CA: Apress, 2021, pp. 73-85.
[6] S. Mittal, P. Rajput, and S. Subramoney, "A survey of deep learning on CPUs: Opportunities and co-optimizations," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 10, pp. 5095-5115, 2021, doi: 10.1109/TNNLS.2021.3071762.
[7] W. Jeon, G. Ko, J. Lee, H. Lee, D. Ha, and W. W. Ro, "Deep learning with GPUs," in Advances in computers, vol. 122: Elsevier, 2021, pp. 167-215.
[8] Shawahna, S. M. Sait, and A. El-Maleh, "FPGA-based accelerators of deep learning networks for learning and classification: A review," ieee Access, vol. 7, pp. 7823-7859, 2018, doi: 10.1109/ACCESS.2018.2890150.
[9] M. Hussain, "Sustainable machine vision for industry 4.0: a comprehensive review of convolutional neural networks and hardware accelerators in computer vision," AI, vol. 5, no. 3, 2024, doi: 10.3390/ai5030064.
[10] Versal AI Edge Series Development Board User Manual VD100, ALINX ELECTRONIC LIMITED, 2024. [Online]. Available: https://www.alinx.com/public/upload/file/VD100_User_Manual_REV1.0.pdf.
[11] "The CIFAR-10 dataset." https://www.cs.toronto.edu/~kriz/cifar.html (accessed 2025).
[12] Y. Meng, C. Yang, S. Xiang, J. Wang, K. Mei, and L. Geng, "An efficient CNN accelerator achieving high PE utilization using a dense-/sparse-aware redundancy reduction method and data–index decoupling workflow," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 31, no. 10, pp. 1537-1550, 2023, doi: 10.1109/TVLSI.2023.3298509.
[13] S. H. Hozhabr and R. Giorgi, "A Survey on Real-Time Object Detection on FPGAs," IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3544515.
[14] L. Gao, Z. Luo, and L. Wang, "Convolutional Neural Network Acceleration Techniques Based on FPGA Platforms: Principles, Methods, and Challenges," Information, vol. 16, no. 10, p. 914, 2025, doi: 10.3390/info16100914.
[15] G. C. Marinó, A. Petrini, D. Malchiodi, and M. Frasca, "Deep neural networks compression: A comparative survey and choice recommendations," Neurocomputing, vol. 520, pp. 152-170, 2023, doi: 10.1016/j.neucom.2022.11.072.
[16] K. Vineetha, M. M. S. Reddy, C. Ramesh, and D. G. Kurup, "An efficient design methodology to speed up the FPGA implementation of artificial neural networks," Engineering Science and Technology, an International Journal, vol. 47, p. 101542, 2023, doi: 10.1016/j.jestch.2023.101542.
[17] Z. Li et al., "A high-performance pixel-level fully pipelined hardware accelerator for neural networks," IEEE Transactions on Neural Networks and Learning Systems, 2024, doi: 10.1109/TNNLS.2024.3423664.
[18] V. H. Kim and K. K. Choi, "A reconfigurable CNN-based accelerator design for fast and energy-efficient object detection system on mobile FPGA," IEEE Access, vol. 11, pp. 59438-59445, 2023, doi: 10.1109/ACCESS.2023.3285279.
[19] T. Sledevič, A. Serackis, and D. Plonis, "FPGA implementation of a convolutional neural network and its application for pollen detection upon entrance to the beehive," Agriculture, vol. 12, no. 11, p. 1849, 2022, doi: 10.3390/agriculture12111849.
[20] G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017, doi: 10.48550/arXiv.1704.04861.
[21] X. Zhang, X. Zhou, M. Lin, and J. Sun, "Shufflenet: An extremely efficient convolutional neural network for mobile devices," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6848-6856, doi: 10.48550/arXiv.1707.01083.
[22] D. Gschwend, "Zynqnet: An fpga-accelerated embedded convolutional neural network," arXiv preprint arXiv:2005.06892, 2020, doi: 10.48550/arXiv.2005.06892.
[23] L. H. Crockett, R. A. Elliot, M. A. Enderwitz, and R. W. Stewart, The Zynq book: embedded processing with the ARM Cortex-A9 on the Xilinx Zynq-7000 all programmable SoC. Strathclyde Academic Media, 2014.
[24] P. G. Mousouliotis and L. P. Petrou, "Squeezejet: High-level synthesis accelerator design for deep convolutional neural networks," in International Symposium on Applied Reconfigurable Computing, 2018: Springer, pp. 55-66, doi: 10.48550/arXiv.1805.08695.
[25] P. Mousouliotis, N. Tampouratzis, and I. Papaefstathiou, "SqueezeJet-3: An HLS-based accelerator for edge CNN applications on SoC FPGAs," in 2023 XXIX International Conference on Information, Communication and Automation Technologies (ICAT), 2023: IEEE, pp. 1-6, doi: 10.1109/ICAT57854.2023.10171329.
[26] Z. Zhao et al., "A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow," IEEE Transactions on Circuits and Systems I: Regular Papers, 2025, doi: 10.48550/arXiv.2407.19449.
[27] L. Zhou, "Design and Implementation of Embedded Image Processing Chip Based on Lightweight MobileNetV3," in 2025 6th International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), 2025: IEEE, pp. 309-313, doi: 10.1109/ICBASE66587.2025.11181362.
[28] R. Yarnell, M. Hossain, and R. F. DeMara, "Image Quantization Tradeoffs in a YOLO-based FPGA Accelerator Framework," in 2023 24th International Symposium on Quality Electronic Design (ISQED), 2023: IEEE, pp. 1-7, doi: 10.1109/ISQED57927.2023.10129324.
[29] A. Parameshwara and S. H. Mokashi, "FPGA-Accelerated RISC-V ISA Extensions for Efficient Neural Network Inference on Edge Devices," arXiv preprint arXiv:2511.06955, 2025, doi: 10.48550/arXiv.2511.06955.
[30] H. Yan and Y. Ma, "A reconfigurable heterogeneous computing-hardware design with Aimed VGG16 acceleration," in 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), 2021: IEEE, pp. 274-278, doi: 10.1109/CEI52496.2021.9574533.
[31] "AMD Versal AI Edge." https://www.amd.com/en/products/adaptive-socs-and-fpgas/versal/ai-edge-series.html (accessed 2025).
[32] T. D. S. Ramos, "Implementation of Convolutional Neural Networks on a Versal Device," Universidade do Porto (Portugal), 2023.
[33] A. Al-Zoubi, G. Martino, F. H. Bahnsen, J. Zhu, H. Schlarb, and G. Fey, "Cnn implementation and analysis on xilinx versal acap at european xfel," in 2022 IEEE 35th International System-on-Chip Conference (SOCC), 2022: IEEE, pp. 1-6, doi: 10.1109/SOCC56010.2022.9908101.
[34] W. Zhang, Y. Liu, and Z. Bao, "Cat: Customized transformer accelerator framework on versal acap," arXiv preprint arXiv:2409.09689, 2024, doi: 10.48550/arXiv.2409.09689.
[35] N. Perryman, C. Wilson, and A. George, "Evaluation of xilinx versal architecture for next-gen edge computing in space," in 2023 IEEE aerospace conference, 2023: IEEE, pp. 1-11, doi: 10.1109/AERO55745.2023.10115906.
[36] T. Knopp, J. Chu, and S. Ahmad, "AMD versal™ AI edge series gen 2 for vision and automotive," in 2024 IEEE Hot Chips 36 Symposium (HCS), 2024: IEEE Computer Society, pp. 1-28, doi: 10.1109/hcs61935.2024.10665274.
[37] P. Dong et al., "EQ-ViT: Algorithm-hardware co-design for end-to-end acceleration of real-time vision transformer inference on Versal ACAP architecture," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 11, pp. 3949-3960, 2024, doi: 10.1109/tcad.2024.3443692.
[38] Z. Bao, T. Zang, Y. Liu, and W. Zhang, "Efficient Number Theoretic Transform accelerator on the versal platform powered by the AI Engine," Future Generation Computer Systems, vol. 166, p. 107728, 2025, doi: 10.1016/j.future.2025.107728.
[39] M. Petry, G. Wuwer, A. Koch, P. Gest, M. Ghiglione, and M. Werner, "Accelerated deep-learning inference on the versal adaptive SoC in the space domain," in 2023 European Data Handling & Data Processing Conference (EDHPC), 2023: IEEE, pp. 1-8, doi: 10.23919/EDHPC59100.2023.10396011.
[40] M. Zhang, L. Li, H. Wang, Y. Liu, H. Qin, and W. Zhao, "Optimized compression for implementing convolutional neural networks on FPGA," Electronics, vol. 8, no. 3, p. 295, 2019, doi: 10.3390/electronics8030295.