Across multiple disciplines in science and engineering, practitioners often face the challenge of making sequential decisions in the face of uncertainty of the outcomes at each stage. Barring computational complexity, such problems can be solved using dynamic programming techniques when the sufficient statistics of the system are known. However, in the absence of complete information about the statistical properties of the underlying system, due to practical considerations, one has to resort to real-time learning and control. For instance, in the medical field, doctors face the challenge of selecting the optimal sequence of treatment plans or drug administration for a particular patient. This challenge becomes even more critical in the field of personalized medicine when a specific treatment plan needs to be tailored for a patient based on limited historical precedence and data. Similar problems arise in modern farming practices. Advancements in technology unlock an increasing number of ways to tailor farming practices to particular farm characteristics. For example, farming data collected from various sensors can guide the customization of fertilizers or other upgrades and evaluate the effects. However, the complexity of customization, selecting the optimal sequence of decisions, multiplies as the number of factors considered increases. At the conceptual level, our goal is to design algorithms that learn approximately optimal strategies for many agents or entities through a sequence of online experiments, that is, selective choice of control actions among available levers.