import { h } from "preact";
import Img01 from "../img/extra/switchbrain/sheet-clmb.jpg";
import Img02 from "../img/extra/switchbrain/tensorboard01.png";
import Img03 from "../img/extra/switchbrain/airtable.png";

// Tell Babel to transform JSX into h() calls:
/** @jsx h */

function InfoSwitchBrain() {
  return (
    <div>
      <p>
        <em>Note: this video is only a temporary preview of the game.</em>
      </p>
      <p>
        For this project, I combined two domains that I'm interested in: machine
        learning and video games. And I asked myself{" "}
        <strong>
          what could a gameplay look like, if it was entirely based on the new
          possibilities offered by Machine Learning?
        </strong>
      </p>
      <p>
        My answer is SwitchBrain, a game where instead of directly controlling
        the movements of the agents, the player controls which "brain" is used
        to decide where to move. By getting to know each brain more intimately
        and having in-game access to information like comments from the
        developer who trained the model, the player can gain both an
        intellectual and an intuitive understanding of machine learning and its
        limitations.
      </p>
      <p>
        I myself learned about a lot of machine learning concepts during this
        project. While I didn't get involved with the mathematics of neural
        networks, I became familiar with Reinforcement Learning and related
        notions like local minimas, reward hacking, overfitting, entropy,
        Proximal-Policy Optimization or Curriculum Learning. Tensorboard graphs
        like this one became valuable and easy to read:
      </p>
      <img alt="SwitchBrain Tensorboard example" src={Img02} />
      <p>
        The current limitations of Unity3D ML-Agents v0.6 require the training
        to take place before the compilation. Therefore, all the ML models must
        be trained beforehand and shipped to the user, which limits the possible
        interactions between the user and machine learning itself.
        <br />
        Note: there's one exception: Imitation Learning. With Imitation
        Learning, a model can for example quickly learn to "pilot" a car with
        only a few minutes of user gameplay. However, it seems that it only
        works in very basic cases (at the time of ML-Agents beta v0.5 at least).
        For this reason, I focused on more typical Reinforcement Learning
        setups.
      </p>
      <p>
        While training the different models, I was often frustrated by the
        agent, but also regularly surprised by very interesting or unexpected
        behaviors. For example: agents rewarded for reaching the target and
        punished for falling would often end up deciding that the best solution
        is to stay perfectly still to avoid falling! Careful adjustments were
        always needed and I discovered that trying to compensate for unwanted
        behaviors was rarely a good idea. What usually works best is to have 1
        or 2 smartly distributed rewards, very basic tasks at the beginning and
        levels as randomly generated as possible.
      </p>
      <p>
        An unexpected challenge was to keep a complete track record of each
        training process, because they resulted in a lot of wildly different
        types of data (among which: the trained model weights, the raw
        Tensorboard data, the Tensorboard graphics, the code that produced the
        observations and rewards, the Unity3D setup, the curriculum learning
        configuration, the actual resulting behavior of the agent). So what I
        ended up doing is using Airtable to gather this information for each
        training session:
      </p>
      <ul>
        <li>A description of the goal of the training</li>
        <li>A description of the observed agent behavior</li>
        <li>A screenshot of the Tensorboard data</li>
        <li>A screencast of the agent behavior</li>
        <li>Some training parameters</li>
        <li>Some meta-information about the resulting model</li>
        <li>
          The git commit hash, in which I checked in the code, model weights and
          training data.
        </li>
      </ul>
      <img alt="SwitchBrain Airtable screenshot" src={Img03} />
      <p>
        In an attempt to demystify machine learning and give information to the
        player about the brains he collects, some of this meta-information is
        then later reused in the game, thus establishing a direct channel of
        communication from the developer to the player.
      </p>
      <img alt="SwitchBrain brain CLMB_v0.2 data" src={Img01} />
      <p>
        The sheer amount of data produced per training session combined with the
        10 to 90 minutes of training the model made the development of the game
        slower than I hoped, but I was still able to experiment with a lot of
        setups, train models for different behaviors ( hunt target, explore,
        climb and dodge enemies) and add ML-controlled enemy bots in the game.
      </p>
      <p>
        My first experimentations also included a completely ML-generated mode
        of locomotion (like Puppo The Corgi:{" "}
        <a href="https://www.youtube.com/watch?v=shWRx2N_9jU">
          www.youtube.com/watch?v=shWRx2N_9jU
        </a>
        ) but that made training iterations too slow. Instead, I used a simple
        ball to have the physics as simple as possible for the ML model to
        understand.
      </p>
      <p>Tech stack: Unity3D, ML-Agents plugin for Unity (version beta 0.6)</p>
      <p>
        ECAL/Nathan Vogel
        <br />
        Supervised by Alain Bellet
      </p>
    </div>
  );
}

export default InfoSwitchBrain;
