A wonderful treatise on the same topic, “Reinforcement Learning Bit by Bit”, for anyone looking for a more advanced treatment of explore/exploit.
See also Thompson sampling[+] for a different approach to multi-armed bandits that doesn't depend on explicitly distinguishing between explore-exploit.