Mario Learning AI
Mario AI is a group project I worked on for the project course CS 175: Project in Artificial Intelligence during the Fall 2013 quarter at UC Irvine.
While deciding possible group projects, a group of us suggested we work on a AI platformer that would be able to make its way through the level, having been inspired by seeing this video while taking the introductory Artificial Intelligence course.
When the game starts, Mario processes the environment around him. The system is given everything that is in a surrounding 40×40 grid, with Mario at the center. Classes implementing the IEnvironmentProcessor interface, such as our main processor, EnvironmentProcessor, iterate through each row and column starting from the left edge of the grid through to the right. EnvironmentProcessor does this three times as it processes for three different types of sprites (the iterations are defined in Table 1). The first iteration looks for enemies, and for each enemy, records information about it. Most values are saved as booleans in order to accommodate the neural network. The enemy values are recorded as follows: relativex, relativey, canStomp, canFireball, dropsShell, isFlying, score, and isPiranaPlant. Each coordinate position is relative to where Mario is positioned.
The second iteration focuses on bonus or goal items and records the following information: pointvalue, relativex, relativey, and canMove. The third iteration searches for environment pieces, anything Mario can break or walk on. For each of these features, the program records the following values: canbebroken, candisappear, relativex, relativey, and canproduceitem. These denote how Mario can interact with the objects and from that, what course Mario should take. This process outputs an array of float values that the neural network processes to further decide what to do.
The decision-making logic of Mario is done by a neural network. The output of the environment processor serves as the input for the neural network. The output of the neural network consists of six booleans which map to corresponding values that make up Mario’s actions (up, down, left, right, jump and speed buttons). As Mario runs through a level, for each game tick the following happens: 1) the environment processor generates an array of values representing Mario’s interpretation of his surroundings at that tick and 2) the array is passed through the neural network to determine Mario’s action for that tick. This process is repeated until Mario’s run on the level is complete, either by death, victory, or time running out.
The implementation of the neural network provides methods to set and get an array of all of the weights that are contained in the neural network. This array representation of the neural network (given that the size of the neural network remains the same) enables us to use a genetic algorithm to let Mario learn what weights for the neural network lead to a higher score.
The program uses a genetic algorithm to gather the best runs and weights from the neural network. For each generation, it runs through the level a set number of times (determined by the main method), configurable in the java file. The score for a given run is computed by the computeWeightedScore() method, provided by the Mario AI Coding Competition engine. The genetic algorithm gathers the top scores and creates the next generation of weights based on these collected scores. What weights helped produce the top scores of the previous run are inherited into the next run. From there, the AI determines what moves to make based on scores gathered by said weights. The game runs through another configurable number of generations before running the Mario program.
The genetic algorithm itself runs similar to how it is outlined in the Russel and Norvig book, where each “chromosome” is defined as a set of weights which are computed by the neural network. The algorithm creates descendents by taking two chromosomes as input, picking a random pivot point located between 0 and the length of the chromosome, and then swapping values of each chromosome at the pivot point and beyond until the result is the same length of the chromosome.
A mutation is defined as a random chance to mutate one of the weighted values within the chromosome. We set the chance to mutate as 1 in 10 (10% chance of mutation). Upon iterating through the entire chromosome, the AI randomly generates a value between 1 and 100 on each weight. If the value is between 1 and 10 the value mutates, if not, it does not. If the value is chosen to be mutated, a new value is computed that is the weight value multiplied by 0.1 (10%). The AI generates a random value between 0 and 2. On a 0, the new value is decremented from the weight value, and on a 1 it is added.
A Genetic Manager class helps keep track of all agents used, storing their final score and weights at the end of their runs. Statistics such as the highest, lowest, and average scores of a generation are made available through here, in addition to the highest scoring agent across all generations. When a new generation is to be created, a pool of agents is created by tournament selection, and 2 agents from that pool are selected at random to determine the next generation of agents. This is done by sending their weights to the genetic algorithm to create a new child.
I had a lot of fun building this project with my group, I was fortunate enough to be in a group with people who knew more about machine learning topics (an area in artificial intelligence I was fascinated about) and was able to learn much more than I had already known about more areas of AI. Unfortunately, through several attempts at optimization we were unable to make the AI finish the level, as we had hit a plateu around 3/4 of the way through a level, and due to severe time constraints within which we were required to finish the project it remains as such. We can say, however, that it accomplished its goal and that the AI does learn. Through more training generations it allows itself to get farther and farther through the level. So in that respect I can say the project was a success. AI is an incredibly difficult subject, as we had learned deeper into the project we got, and I can say I’m proud of the team for accomplishing what we did