In this project, we proposed to develop theoretical foundations of model-free reinforcement learning following recent success of the AlphaGo Zero (AGZ) [2] in developing an extra ordinary game player for the games of Go, Chess and Shogi to name a few. At foundational level, we proposed a framework for such model-free setting for generic reinforcement learning, beyond two player games as considered in AGZ. We proposed learning mechanisms that will lead to high performant decision making policies that can be deployed in practice. We studied our policies in terms of their performance quality. We hope that the outcome of this work will be integrated with an end-customer, e.g. planning which generators to switch on and off continually in order to manage electricity utility grid in the cost effective manner.