This research summary is based on the paper 'Jump-Start Reinforcement Learning' Please don't forget to join our ML Subreddit
In the field of artificial intelligence, reinforcement training is a type of machine learning strategy that rewards positive behavior and when punishing those who don’t. An agent can understand his surroundings and generally act with this form or presence through trial and error – it’s a kind of getting feedback on what works for you. However, learning rules from scratch in contexts with complex intelligence problems in RL is a major challenge. Since the agent does not receive any intermediate stimuli, he cannot determine how close the goal is to completion. As a result, exploring the space at random until the door opens is necessary. Given the length of the task and the exact level required, this is highly unlikely.
In carrying out this activity, it is necessary to refrain from randomly studying the state space with preliminary information. This preliminary knowledge will help the agent in determining which environmental condition is desirable and should be further investigated. Offline data collected through human demonstrations, programmed policies, or other RL agents can be used to study the policy and then launch a new RL policy. This includes copying the neural network policy of the previously trained policy to the new RL policy in the scenario, which we use to describe the neural network procedures. This process transforms the new RL policy into a pre-prepared policy. However, as can be seen below, naively launching a new RL policy like this often fails, especially for cost-based RL approaches.
Google AI researchers have developed a meta-algorithm to use a pre-existing policy to start each RL algorithm. Researchers use two illustrations to study tasks in Jump-Start Reinforcement Learning (JSRL): guidance policy and intelligence policy. Intelligence policy is an RL policy that has been trained using new agent experiences in an online environment. In contrast, the guideline policy is any pre-existing policy that does not change when studying online. JSRL develops the curriculum through the introduction of guidance policies and then a self-improvement intelligence policy that yields comparable or better results than the competitive IL + RL approaches.