Reward Shaping For Video Game Playing Agents Based on Human Motivations

Faculty Mentor Name

Philippe Chauveau

Format Preference

Poster

Abstract

Video games have been used for machine learning research for decades because of their complex yet controlled environments. This project uses Quantic Foundry’s gamer motivation model to develop reward functions for multiple reinforcement learning agents tasked with playing the video game Pokémon Red. By basing the agents’ training on empirically derived motivations of human players, the agents exhibit more human-like behavior than previous approaches to game playing AI. We trained 9 RL agents, each representing one of the Quantic Foundry gamer types: Acrobat, Gardener, Slayer, Skirmisher, Gladiator, Ninja, Bounty Hunter, Architect, and Bard. The agents use the same proximal policy optimization algorithm, but different reward functions. Rewards are given for performing actions that correspond to the twelve gamer motivations: destruction, excitement, competition, community, challenge, strategy, completion, power, fantasy, story, design, and discovery. The amount of reward per action is weighted based on the weight of the corresponding motivation in the gamer type model. We have observed distinguishable behavior among the agents: Gardener progresses the furthest in-game and is able to earn two gym badges. Ninja progresses the least and rarely earns a single badge. These results are consistent with the respective motivation profiles of the gamer types. The data generated by this project provides insight into how human motivations can be used to train AI to perform tasks in certain ways. It is becoming increasingly relevant for AI systems to be able to behave and think more like humans. Our process and results may be extrapolated to more practical applications of RL agents, such as robotics and autonomous vehicles

Share

COinS
 

Reward Shaping For Video Game Playing Agents Based on Human Motivations

Video games have been used for machine learning research for decades because of their complex yet controlled environments. This project uses Quantic Foundry’s gamer motivation model to develop reward functions for multiple reinforcement learning agents tasked with playing the video game Pokémon Red. By basing the agents’ training on empirically derived motivations of human players, the agents exhibit more human-like behavior than previous approaches to game playing AI. We trained 9 RL agents, each representing one of the Quantic Foundry gamer types: Acrobat, Gardener, Slayer, Skirmisher, Gladiator, Ninja, Bounty Hunter, Architect, and Bard. The agents use the same proximal policy optimization algorithm, but different reward functions. Rewards are given for performing actions that correspond to the twelve gamer motivations: destruction, excitement, competition, community, challenge, strategy, completion, power, fantasy, story, design, and discovery. The amount of reward per action is weighted based on the weight of the corresponding motivation in the gamer type model. We have observed distinguishable behavior among the agents: Gardener progresses the furthest in-game and is able to earn two gym badges. Ninja progresses the least and rarely earns a single badge. These results are consistent with the respective motivation profiles of the gamer types. The data generated by this project provides insight into how human motivations can be used to train AI to perform tasks in certain ways. It is becoming increasingly relevant for AI systems to be able to behave and think more like humans. Our process and results may be extrapolated to more practical applications of RL agents, such as robotics and autonomous vehicles