Exploring Proximal Policy Optimization in ViZDoom: Training Agents for Complex Tasks with Hyperparameter Optimization

Areej Fatemah Meghji; Rashid Hussain; Mirza Muhammad Abbas; Abdul Lahad

doi:10.21015/vtse.v14i1.2348

Authors

Areej Fatemah Meghji Department of Software Engineering, Mehran University of Engineering and Technology Jamshoro, Pakistan https://orcid.org/0000-0002-7302-2767
Rashid Hussain Department of Cyber Security, FAST - National University of Computer & Emerging Sciences, Karachi, Pakistan https://orcid.org/0009-0009-3398-1662
Mirza Muhammad Abbas Department of Software Engineering, Mehran University of Engineering and Technology Jamshoro, Pakistan https://orcid.org/0009-0007-5713-6768
Abdul Lahad Department of Software Engineering, Mehran University of Engineering and Technology Jamshoro, Pakistan

DOI:

https://doi.org/10.21015/vtse.v14i1.2348

Abstract

ViZDoom, a Reinforcement Learning (RL) research platform based on the classic first-person shooter game Doom, allows training and evaluating RL agents across a wide range of scenarios that vary in complexity. This research trains Deep Reinforcement Learning (DRL) agents using the modern Proximal Policy Optimization (PPO) algorithm, implemented through Stable-Baselines3 (SB3), across four different ViZDoom scenarios with increasing complexity: Basic, Defend the Center, Health Gathering, and Deadly Corridor. We analyze the impact of hyperparameter tuning on the PPO algorithm in these ViZDoom scenarios using the Optuna framework. While keeping in mind the complexity of the Deadly Corridor scenario, the advanced RL techniques like reward shaping and curriculum learning are employed to achieve the objective of this scenario. The agents are evaluated using the mean episodic reward and the mean episode length as performance metrics in each scenario. The results compare the PPO agents trained on default hyperparameters with the agents trained on optimized hyperparameters in each scenario. The findings demonstrate that hyperparameter optimization has a moderate impact in simple environments, resulting in a 3.9% and a 12.4% increase in the mean episodic rewards of Basic and Defend the Center scenarios, respectively, but shows significant gains in complex scenarios, achieving a 523.7% and a 203.7% improvement in Health Gathering and Deadly Corridor (skill level 5) scenarios, respectively. These findings provide insights into how DRL agents can be trained using the PPO algorithm in complex environments with multiple challenging tasks through appropriate hyperparameter optimization.

References

Z. Li, Q. Ji, X. Ling, and Q. Liu, “A comprehensive review of multi-agent reinforcement learning in video games,” IEEE Transactions on Games, 2025.

M. A. Mirza, A. Lahad, and A. F. Meghji, “A study of Q-learning in the Taxi-v3 environment: Reinforcement learning for optimal navigation through hyperparameter optimization,” KIET Journal of Computing & Information Sciences, vol. 8, no. 1, pp. 54–65, 2025, doi: 10.51153/w4rsbd31.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998. DOI: https://doi.org/10.1109/TNN.1998.712192

C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3–4, pp. 279–292, 1992. DOI: https://doi.org/10.1023/A:1022676722315

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing Atari with deep reinforcement learning,” arXiv:1312.5602, 2013.

R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Advances in Neural Information Processing Systems, vol. 12, 1999.

C. Berner et al., “Dota 2 with large scale deep reinforcement learning,” arXiv:1912.06680, 2019.

V. Mnih et al., “Asynchronous methods for deep reinforcement learning,” in Proc. Int. Conf. Machine Learning (ICML), 2016, pp. 1928–1937.

J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel, “Trust region policy optimization,” arXiv:1502.05477, 2015.

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv:1707.06347, 2017.

A. Khan and M. Naeem, “Evaluating reinforcement learning algorithms in first-person shooter games using VizDoom,” Multimedia Tools and Applications, vol. 84, no. 15, pp. 15053–15075, 2025.

M. Wydmuch, M. Kempka, and W. Jaśkowski, “VizDoom competitions: Playing Doom from pixels,” IEEE Transactions on Games, vol. 11, no. 3, pp. 248–259, 2018.

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2019, pp. 2623–2631.

M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaśkowski, “VizDoom: A Doom-based AI research platform for visual reinforcement learning,” in Proc. IEEE Conf. Computational Intelligence and Games (CIG), 2016, pp. 1–8. DOI: https://doi.org/10.1109/CIG.2016.7860433

J. Karttunen, A. Kanervisto, V. Kyrki, and V. Hautamäki, “From video game to real robot: The transfer between action spaces,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 3567–3571.

A. Zakhakharenkov and I. Makarov, “Deep reinforcement learning with DQN vs. PPO in VizDoom,” in Proc. IEEE Int. Symp. Computational Intelligence and Informatics (CINTI), 2021, pp. 131–136.

G. Lample and D. S. Chaplot, “Playing FPS games with deep reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, 2017.

M. Kalra and J. Patni, “Playing Doom with deep reinforcement learning,” Recent Trends in Science, Technology, Management and Social Development, p. 42, 2018.

W. Ali, S. Lakho, N. N. Bhatti, and I. A. Memon, “Adaptive bug localization framework for precision-driven bug localization in software engineering,” VFAST Transactions on Software Engineering, vol. 12, no. 3, pp. 230–242, 2024. DOI: https://doi.org/10.21015/vtse.v12i3.1832

M. R. Hossain and D. Timmer, “Machine learning model optimization with hyperparameter tuning approach,” Global Journal of Computer Science and Technology: Neural & Artificial Intelligence, vol. 21, no. 2, p. 31, 2021.

J. A. Ilemobayo et al., “Hyperparameter tuning in machine learning: A comprehensive review,” Journal of Engineering Research and Reports, vol. 26, no. 6, pp. 388–395, 2024.

M. Towers et al., “Gymnasium: A standard interface for reinforcement learning environments,” arXiv:2407.17032, 2024.

S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and P. Stone, “Curriculum learning for reinforcement learning domains: A framework and survey,” Journal of Machine Learning Research, vol. 21, no. 181, pp. 1–50, 2020.

A. Raffin et al., “Stable-Baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021.

M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous distributed systems,” arXiv:1603.04467, 2016.

Google, “Google Colaboratory.” [Online]. Available: https://colab.research.google.com/. Accessed: Jan. 3, 2026.

M. A. Mannan, R. Qamar, I. U. Khan, A. Hussain, S. Ahmed, and J. Khan, “Evaluating the performance of machine learning classifier algorithms for software estimation in software development projects,” VFAST Transactions on Software Engineering, vol. 12, no. 1, pp. 70–78, 2024. DOI: https://doi.org/10.21015/vtse.v12i1.1770

H. Bhuwad, R. Nikam, and D. L. S. Gunjal, “Optimizing DOOM game using reinforcement learning,” in Proc. Int. Conf. AI and Robotics (AIR), 2025, vol. 1, p. 374.

A. Khan et al., “Using VizDoom research platform scenarios for benchmarking reinforcement learning algorithms in first-person shooter games,” IEEE Access, vol. 12, pp. 15105–15132, 2024.

A. Khan and A. Aqeel, “Benchmarking reinforcement learning algorithms in first-person shooter games using VizDoom,” Entertainment Computing, 2025.

R. Spick, T. Bradley, A. Raina, P. V. Amadori, and G. Moss, “Behavioural cloning in VizDoom,” arXiv:2401.03993, 2024.

C. Zhang, H. Hu, Y. Zhou, X. Wang, and E. S. Liu, “HIFAS: A hybrid interactive FPS agent system for large game maps,” IEEE Transactions on Games, 2025.

S. S. Wagner and S. Harmeling, “Just cluster it: An approach for exploration in high-dimensions using clustering and pre-trained representations,” arXiv:2402.03138, 2024.

Exploring Proximal Policy Optimization in ViZDoom: Training Agents for Complex Tasks with Hyperparameter Optimization

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information

ISSN

Scopus Metrics

SCImago

Scopus CiteScore

Make a Submission