Abstract
Visualization of reinforcement learning (RL) environment and learning dynamics of an agent is a vital step for debugging and better understanding of the learnt policy. For virtual game environments, it is possible to visualize agent’s performance by rendering game screens. But for environments with optimisation of real world multidimensional spaces with continuous variables, such as optimisation of chemical process parameters, it is challenging and complex to observe agent’s behaviour with visualization. This field largely remains unexplored in the research community. In the current work, reinforcement learning agent is developed for optimisation of production process of rubber mix for tyre industry. This paper presents an attempt to visualize an RL agent’s training and inference for high dimensional state space problems with continuous state and action spaces. A number of techniques are presented here to assist for debugging and monitoring the convergence of an agent over complex domain. We explore plots for studying simulation environment, RL training dynamics, analysing trained policy and performance evaluation of trained policy in a given environment. Techniques described here are developed for actor-critic algorithms but can easily be extended to any RL algorithm.