Reinforcement Learning

Written by Rachit Agrawal, MBA

Published on Tue, February 18, 2020 4:58 AM • Updated on Thu, February 20, 2020 7:02 AM • 3 mins read

Reinforcement Learning covers the area of artificial intelligence and machine learning, which is concerned with how software sought to work in different real-life situations. Reinforcement learning can be thought of as one of the subsets of Machine Learning, along with supervised learning and unsupervised learning.

Reinforcement Learning is different from Supervised Learning because it does not need Labelled inputs or outputs to be presented or given for the result to be explicitly correct. In fact, in Reinforcement Learning, the stress is given on finding a correlation and maintaining a balance between the unorganized information and the current knowledge.

The environment for Reinforcement Learning is typically based on a Markov decision process(MDP); this is mainly because Reinforcement Learning uses Dynamic Programming to solve most of its problems. The main difference that lies between dynamic programming and Reinforcement Learning is that the latter does not require any knowledge of the mathematical models available.

The observation of Reinforcement Learning typically involves scalar, an immediate result that is mainly associated with the last transaction that was associated with the algorithm. In other words, the agent has partial observability. A Reinforcement Learning agent interacts with the environment related to discrete steps. At each time, the agent receives an observation at time t, which in turn typically includes a reward. It then chooses an appropriate action from the set of available actions. This action is then sent to the respective environment. The agent can choose any random action from the set of available actions.

When the agent’s performance is compared to that of its notion, to act optimally, the agent must point out in terms of long term consequences to maximize the outcome of the response. However, the immediate reward may be negative, depending on the algorithm.

Thus, Reinforcement Learning is mainly very useful in situations where there is a trade-off between long term and short term rewards. It has also been implemented in real life in various situations. Some of the examples include control, elevator scheduling, telecommunications, etc.

Some elements make Reinforcement Learning very powerful, and they are:- the use of samples to optimize the performance and the use of function approximation to deal with different algorithms in different environments.

With the help of these elements, Reinforcement Learning can be used in different situations such as:-

A situation where the model of the environment is known.
Only a Simulation model of the environment is known.
Collecting information from the system by interacting with it.

The first two of these problems can be viewed as planning problems since the model is available. The last situation can be viewed or considered to be a genuine learning problem. However, the main objective of Reinforcement Learning is to convert planning problems to machine learning problems so that they can be implemented in real-life situations.

Q learning is one of the Reinforcement Learning techniques and contains many advantages over plain and simple algorithms. Q Learning can be thought of two phases:- Training phase and the Exploitation phase. During the training phase, the agent explores the environment while in the later phase, the agent tries to find the optimum solution.