]1[ ولی درهمی، فریناز اعلمیان هرندی، محمدباقر دولتشاهی، . "یادگیری تقویتی"، دانشگاه یزد، چاپ اول، 2090
[2] K. Chaudhari and A. Thakkar, “Neural network systems with an integrated coefficient of variation-based feature selection for stock price and trend prediction.” Expert Systems with Applications, Vol. 219, p. 119527, 2023.
[3] Y. Zhao and G. Yang, “Deep Learning-based Integrated Framework for stock price movement prediction.” Applied Soft Computing, Vol. 133, p. 109921, 2023.
[4] A. Chudziak, “Predictability of stock returns using neural networks: Elusive in the long term.” Expert Systems with Applications, Vol. 213, p. 119203, 2023.
D. Bertsimas and A. W. Lo, “Optimal control of execution costs,” Journal of Financial Markets, Vol.1, No.1, pp. 1-50, 1998.
[6] R. Almgren and N. Chriss, “Optimal Execution of Portfolio Transactions,” Journal of Risk, Vol.3, No.2, pp. 5-40, 2001.
[7] R. F. Almgren, “Optimal execution with nonlinear impact functions and trading-enhanced risk,” Applied Mathematical Finance, Vol.10, No.1, pp. 1-18, 2003.
[8] J. Lorenz and R. Almgren, “Mean–Variance Optimal Adaptive Execution,” Applied Mathematical Finance, Vol.18, No.5, pp. 395-422, 2011.
[9] G. Huberman and W. Stanzl, “Optimal Liquidity Trading,” Review of finance, Vol.9, No.2, pp. 165-200, 2005.
[10] A. A. Obizhaeva and J. Wang, “Optimal trading strategy and supply/demand dynamics,” Journal of Financial Markets, Vol.16, No.1, pp. 1-32, 2013.
[11] A. Schied and T. Schöneborn, “Risk aversion and the dynamics of optimal liquidation strategies in illiquid markets,” Finance and Stochastics, Vol.13, No.2, pp. 181-204, 2009.
[12] R. Almgren, “Optimal Trading with Stochastic Liquidity and Volatility,” SIAM Journal on Financial Mathematics, Vol.3, No.1, pp. 163-181, 2012.
[13] P. Forsyth, J. Kennedy, S. Tse, and H. Windcliff, “Optimal trade execution: A mean quadratic variation approach,” Journal of Economic Dynamics and Control, Vol.36, No.12, pp. 1971-1991, 2012.
[14] O. Guéant, “Optimal Execution and Block Trade Pricing: A General Framework,” Applied Mathematical Finance, Vol.22, No.4, pp. 336-365, 2015.
[15] Z. Liu, Y. Zhai, J. Li, G. Wang, Y. Miao, and H. Wang, “Graph Relational Reinforcement Learning for Mobile Robot Navigation in Large-Scale Crowded Environments.” IEEE Transactions on Intelligent Transportation Systems, Vol. 24, No. 8, pp. 8776-8787, 2023.
[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, … and D. Hassabis, “Human-level control through deep reinforcement learning,” nature, Vol.518, No.7540, pp. 529-533, 2015.
[17] B. Xian, X. Zhang, H. Zhang, and X. Gu, “Robust Adaptive Control for a Small Unmanned Helicopter Using Reinforcement Learning.” IEEE Transactions on Neural Networks and Learning Systems, Vol. 33, No. 12, pp. 7589-7597, 2022.
[18] Y. D. Song, Q. Song, and W. C. Cai, “Fault-Tolerant Adaptive Control of High-Speed Trains Under Traction/Braking Failures: A Virtual Parameter-Based Approach,” IEEE Transactions on Intelligent Transportation Systems, Vol.15, No.2, pp. 737-748, 2014.
[19] F. S. Melo, “Convergence of Q-learning: A simple proof,” Institute of Systems and Robotics, Tech. Rep., pp. 1-4, 2001.
[20] T. Jaakkola, M. I. Jordan, and S. P. Singh, “Convergence of stochastic iterative dynamic programming algorithms,” Advances in Neural Information Processing Systems, pp. 703–710, 1994.
[21] G. Ritter, “Machine Learning for Trading,” SSRN Electronic Journal, 2017.
[22] J. C. H. Watkins and P. Dayan, “Q-learning,” Machine learning, Vol.8, No.3, pp. 279–292, 1992.