AI Plays Football For Science!
Ever played Fifa or Pes? Yeah, the football game. No? Well, you’re just missing out on a lot of fun :)
But that shouldn’t stop you from understanding all the cool stuff you’re gonna read about, so let me provide you some intuition. Imagine you’re 1–0 down at the 87th minute. You’re Barcelona, it’s the El Clasico at Camp Nou and you simply cannot afford to lose to your arch-nemesis, Real Madrid. Benzema is looking dangerous in the penalty area. A goal here, and surely the game’s over. Ter Stegan blocks the shot with a terrific save! Jordi Alba gets the ball and you launch a swift counter-attack with the game’s scripts putting the momentum in your favor. Alba to Busquets. Busquets to Countinho. Coutinho to Messi. Messi back to Countinho. Countinho with a beautiful through ball past the defenders. Messi... GOOOOAAAAl! It’s 1–1 now, the match hangs in a balance as the Real Madrid players kick-off at the 89th minute with 3 minutes of added time. It’s looking tense with Bale on the ball but Mascherano intercepts just in time to hand Barcelona the final play. The team’s in an all-out attack now, and the ball’s at Messi’s feet. He’s surrounded by three defenders at the edge of the penalty area. Your heart’s pumping. This is a decisive moment.
Want to read this story later? Save it in Journal.
In a stroke of brilliance, Messi cuts in past the defenders and chips it over the advancing goalkeeper into the back of the net just seconds before the final whistle. The magician’s done it again, turned the match around in a space of five minutes. As you hear the final whistle, you’re screaming with joy, and your friend’s crying in despair. These are the emotions we play for.
At some point, most players catch themselves marveling at the skill and tenacity of the engineers and designers who made this possible. The graphics and the algorithms work in perfect symphony to create something special.
Most games are built on game engines. Game engines are foundations upon which developers can work their magic. These engines provide core functionalities like 3D rendering and physics engines, which encode the laws of physics into games so that they can feel more familiar and realistic. EA’s Fifa is built on a game engine called Frostbite. Now, many of us think there must be some sophisticated machine learning trickery to get a computer to play humans and win in many cases. However, the underlying mechanism is simply handcrafted and logic-based. Though the algorithm must account for a lot of possibilities, the majority of the gameplay is a question of finding the closest player based on the distance to the ball and then passing accurately.
These algorithms aren’t intelligent in the sense that they don’t learn from how the user plays. The latest versions have some rudimentary AI to work out high-level strategy but they are very limited. This is by design keeping compatibility in mind but since football (Americans, football <-> soccer) is such a dynamic game, studying how an AI could learn by simply playing more games could help us understand the inner workings of these AIs learn better.
A few researchers at Google got together and built the Google Research Football Environment. They say it is “a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator.” Essentially, it’s a cool way for the open-source community to contribute to research in reinforcement learning. That is, by playing football! We’ll get to how all that works but first, let’s take a step back and understand what reinforcement learning is.
Reinforcement learning is an important subfield within machine learning research where we teach an agent to choose a set of actions in an environment to maximize a score. In supervised learning, the machine learning model you hear most about, computers are given training data with the answer key so the computer can find patterns and then extend that knowledge to new data. However, in reinforcement learning, the computer or the agent is simply provided with the task it must solve and a certain reward function which it has to maximize.
Consider this, you need Pratham, a robot to travel through a maze and obtain a diamond. However, he must avoid fire. To solve this problem by reinforcement learning, we could tell Pratham the robot, that if he reaches the diamond, he will earn 1 point. However, on the way, if he steps on fire, he will lose 0.3 points. This is his reward function and Pratham’s objective would be to maximize this reward function. Pratham would solve this is by trying many possible paths through trial and error and then picking the one that maximizes his reward function which is, in this case, the scenario where he avoids all the fire and reaches the diamond. As he keeps doing the same task more and more times, he’d figure out the paths that led him to a favorable outcome. He’d know the best turns to take at every box and so, when he’s presented with the environment again, he’ll know to take the most optimized path to maximize the reward function. As you can probably infer, reinforcement learning is all about making the best possible decisions sequentially.
This kind of learning would be especially useful in complex simulations. Given a set of rules, the right rewards, and enough training, the AI would be able to discover novel and unique solutions to problems. Some we haven't even thought of.
The idea of using reinforcement learning to train AIs to play games is not a new one. OpenAI’s Agent had made headlines for defeating the world champion in Dota-2. It’s also being used by companies like Deepmind’s AlphaZero which can beat chess grandmasters by simply playing again and again and learning by itself through constant reinforcement.
Reinforcement learning also has huge applications in Robotics. For instance, look at this bot trying to learn to walk. It learns how to walk with absolutely no input from the user. He just has to start the process again and again.
In a sense, this is how we learn too: by trial and error. However, we learn much faster because we have low latency senses and the ability to process information from all these senses simultaneously. This works hand in hand with an intuition we develop based on prior non-related knowledge and our understanding of the way the world works. Some futurists argue that by 2050 we could come up with an AGI (Artificial General Intelligence), which can reason and think on a level that is on par with humans, and reinforcement learning would be a crucial part of it.
Right now, although quite a bit of research in using reinforcement learning in simulated games has been done, there are a few black patches. Most of the training cannot be replicated on everyday machines because they are so computationally expensive. Many AI models are lack stochasticity i.e irregularity and randomness which is characteristic of real-world problems. Moreover, most of these models are not open source, so researchers cannot inspect the underlying code to modify environments if required to test new research ideas. Football is particularly challenging for reinforcement learning, as it requires a natural balance between short-term control, learned concepts, such as passing, and long-term strategic planning.
To investigate these problems, researchers from Google built the football environment on the Football Engine (very creative name, Google 👏👏). The Football Engine provides several crucial components: a highly optimized game engine, a demanding set of research problems as well as a set of progressively harder RL scenarios. The environment looks like a barebones version of EA’s Fifa 10 but it is written in highly optimized C++ code, allowing it to be run on average day to day computers.
The Football Engine has additional features geared specifically towards Reinforcement Learning. It allows learning from both, information on player’s locations, speeds, etc as well as raw pixels. To investigate the impact of randomness, it can be run in both a stochastic mode, in which there is randomness in both the environment and opponent’s actions, and in a deterministic mode, where there is no randomness. The Football Engine is out of the box compatible with the widely used OpenAI Gym API which is used for developing and testing several reinforcement learning algorithms.
One of the main research objectives of this project is reward setting. Though members of the open-source community keep testing and trying new ways to define the reward function, there can broadly be classified into two main categories. The first one is to provide rewards only for goals (own goals attract massive penalty) and taking away points, which are slightly lesser in magnitude, for yellow and red cards. The other approach is to divide the pitch into zones where rewards can be distributed based on how far up the pitch the player can get with possession of the ball and providing a goal a proportionately larger upside.
These methods continue to be researched and developed with the help of the massive open source community. What makes this unique is that the football engine uses state-of-the-art learning algorithms: DQN and IMPALA.
One of the main challenges in training a single agent on many tasks at once is scalability. Since the old methods like A3C (Asynchronous Advantage Actor Critic) can require as much as a billion frames and multiple days to master a single stream of data, training them on tens of steams at once is too slow to be practical. However, this is extremely critical in a dynamic game like football where there are a lot of parameters to predict and learn from. IMPALA or Importance Weighted Actor-Learner Architecture helps scale reinforcement learning algorithms massively. It not only uses resources more efficiently in single-machine systems but also scales to thousands of machines without sacrificing data efficiency or resource utilization. As a result, it can work incredibly quickly.
In old methods like A3C, the actor (the program which makes the AI perform the next move) communicates with its parameter server, after which the parameters are suitably updated by the learner (the program which updates the parameters based on the outcome of the previous outcomes). This is slow and hard to scale. Impala, on the other hand, incorporates a more distributed approach. It has multiple actors, in multiple systems that communicate with a central learner in parallel processes. The learner then uses all this data from multiple input streams to update the parameters of all its parallel processes.
DQN or Deep Q-Network is similar to A3C in the sense that it is not a distributed algorithm but it uses a deep Neural Network (Think AI modeled after the brain) to learn from multi-dimensional sensory inputs like pixels from the game, the way you and I see it.
Training from pixels is easy to understand but extremely hard to perform. DQN can do this by training a neural network to understand what events take place on the screen and pass the information to a reinforcement learner that is responsible for the strategic, gameplay-related decisions. This way, it learns much faster than traditional techniques.
From the graph above, it is clear that for our situations, if a high number of steps are executed, Impala is significantly better than DQN. This conclusion can be fairly extended for simulations in general. More research is being done on how parallel processes, like on Impala, can be done with a neural network backend like DQN. That is definitely something to watch out for!
After plodding through all this abstraction, you might be thinking: why? Well, football creates a dynamic environment which is quite challenging for RL Agents. The researchers hope that findings from this study along with the contributions from the open-source community could help us make progress in understanding the black box of reinforcement learning better. This could ultimately help us design better algorithms for robots, self-driving cars, and more. This means that you could play football on your computer at home from here and help researchers develop RL algorithms for self-driving cars!
If you’ve read until this point, I hope you have had quite a few takeaways. I would love to discuss the topic further and I’ll be thrilled to hear your feedback too. Shoot me a mail at firstname.lastname@example.org. Connect with me on LinkedIn.
More from Journal
There are many Black creators doing incredible work in Tech. This collection of resources shines a light on some of us: