Reward effect in Reinforcement Learning Systems


Learning Classifier Systems (LCS), are a machine learning technique whichcombines reinforcement learning, evolutionary computing and other heuristics toproduce adaptive systems. The system HRC (Human – Rat - Cheese) focuses increating artificial creature (Rat) using computer simulation, and learning it how tochoose between two different basic behaviors, (approach / escape) combining them toperform complex behavior, which represents the final response in changingenvironment.The HRC is built of two-classifier subsystems working together, eachclassifier system learns a simple behavior, and the system as a whole has as its learninggoal the control of activities. Flat architecture was used. The flat organization allowsdistinguishing between two different learning activities: the learning of basic behaviorand the learning of switch behavior. One classifier system learns basic behavior,(approach/escape), i.e., it is used to learn the simulated robot single step movement inevery direction in the environment. Whereas the other classifier system learns to controlthe activities of basic classifier systems, i.e., it is used to learn to choose between basicbehaviors using suppression as a composition mechanism to chose between two basicbehaviors which represent complex behavior.Simple experiments were executed for HRC: comparing and contrasting theeffect of the reinforcement learning using reward & punishment with learning usingreward only. Experiment results show that the run using reinforcement learning withreward only is unable to perform as well as the run with reinforcement learning withreward and punishment.