Can You Train AI Agents in Unity Without Python?
Early experiments with in-app on-device machine learning training
I’ve been using Unity’s MLAgents (code) to train various behaviors for robots and game behaviours, using reinforcement learning, Reinforcement learning is a type of machine learning that teaches an agent how to act in an environment by giving it rewards or penalties for its actions. The agent learns from its own experience and tries to find the best actions that maximize its total reward over time.
I’ve wanted to try to create a simultion that could run within an app, that doesn’t require the gigabytes and non trival installation of the machine learning python stack. Unity’s MLAgents is a really good solution for training, and its well thought through and well implemented. It’s not something that you can embed in a runtime though.
Is it possible to run machine learning models in your app, or even inside a framework like Unity3d without the standard Python technology stack?
First call was thumbing through some github repositories, I was looking for a small simple library, I considered adapting something like one of James McCaffrey’s C# algorithms - I like the idea of using such a simple bit of code, but it didn’t quite fit the bill.
I found this reworking of some standard reinforcement learning by EpicSpacesWorks, github.com/EpicSpaces/Reinforcement-Learning-c-sharp-Unity-ppo-ddpg-dqn
Its a C# implementation of a reinforcement learning using PPO1 , DDPG2 and DQN3. Its really janky! After a bit of refinement I got this one working, you can access my fork here: https://github.com/makeplayhappy/Unity-CSharp-ML
I’ve attempted to optimise some of the Unity specific sections. The ball shows balance after around 150 episodes. I’ve changed the rewards to mimic those in the Unity ML Agents Ball3D example
While running you’ll experience some large performance spikes. These are the matrix operations. Here’s a view of the demo running a classic ball balancing environment. The main code is very un-optimal, in the video you can see the periodic performance spikes hammering the profiler! It’s the learning algorythm running its operations on the matrix, which isn’t stored in an optimal way at all.
As a proof of concept it was an interesting project, it is certainly possible to run machine learning models in Unity without the standard Python technology stack.
There are some serious performance issues with that code though, and couldn’t be used for what I’m hoping to do, which is to run the simulations at runtime, ideally on mobile. I will continue to progress this and see if I can put together some more appropriate performant code.
Let me know if you do take the code out for a spin, or if you’ve got better suggestions for runtime learning feel free to share them in the comments section below.
PPO (Proximal Policy Optimization) is a way of teaching an agent what to do by giving it a score for each action it takes. The agent tries to improve its score by changing its actions slightly, but not too much. PPO works well for problems that have many possible actions and are hard to solve.
DDPG (Deep Deterministic Policy Gradient) is a way of teaching an agent what to do by having two helpers: one that tells the agent the best action for each situation, and another that tells the agent how good each action is. The agent learns from both helpers and tries to find the best actions. DDPG works well for problems that have many possible actions and are hard to solve, but it needs a lot of fine-tuning
DQN (Deep Q-Network) is a way of teaching an agent what to do by giving it a score for each action it takes. The agent remembers its past actions and scores, and uses them to learn how to get higher scores in the future. DQN works well for problems that have a few possible actions, but it may overestimate some actions