• nmca 3 hours ago

    A wonderful treatise on the same topic, “Reinforcement Learning Bit by Bit”, for anyone looking for a more advanced treatment of explore/exploit.

    https://arxiv.org/abs/2103.04047

    • matheist 3 hours ago

      See also Thompson sampling[+] for a different approach to multi-armed bandits that doesn't depend on explicitly distinguishing between explore-exploit.

      [+] https://en.wikipedia.org/wiki/Thompson_sampling