Abstract:
In order to solve the shortcomings of the existing mining robot path planning methods, such as low efficiency, slow convergence speed, and easy to fall into local optimum, a path planning method based on Actor-Critic algorithm is proposed. Firstly, according to the real-time position information of the inspection target and the obstacles, the steering angle of the patrol robot is calculated and the forward direction is determined, which can significantly improve the efficiency of path planning. With the goal of minimizing energy consumption and avoiding collisions, the patrol robot learns the target inspection sequence and forward speed according to the dynamically changing mining environment. Because the dynamic and continuous changes of the mine environment lead to a high state dimension, the action and reward generated by the continuous state are estimated by the deep learning networks. In order to improve the efficiency of learning, two networks are adopted, namely the Actor network and the Critic network, to achieve real-time update of strategy and value. The simulation results show that the proposed method can design a safe and reasonable patrol route in a dynamic environment, and can complete the patrol task with a 98% success probability and lower energy consumption.