Reinforcement learning

By Skander, 8 December, 2024

Thompson Sampling With Gaussian Distribution - A Stochastic Multi-armed Bandit

A Thompson Sampling algorithm using a Beta probability distribution was introduced in a previous post. The Beta distribution is well-suited for binary multi-armed bandits (MABs), where arm rewards are restricted to values of 0 or 1. In this article, we introduce an alternative MAB sampling algorithm designed for the more general case where arm rewards are continuous: Thompson Sampling with a Gaussian Distribution (TSG).

By Skander, 29 November, 2024

Stochastic Multi-armed Bandit - Thompson Sampling With Beta Distribution

MAB - Thompson Sampling With Beta Distribution

We have previously explored two multi-armed bandit (MAB) strategies: Maximum Average Reward (MAR) and Upper Confidence Bound (UCB). Both approaches rely on the observed average reward to determine which arm to pull next, using a deterministic scoring mechanism for decision-making.

By Skander, 15 November, 2024

The Exploration-Exploitation Balance: The Epsilon-Greedy Approach in Multi-Armed Bandits

In this article, I will explore the balance between exploration and exploitation, a key concept in reinforcement learning and optimization problems. To illustrate this, I will use the multi-armed bandit problem as an example. I will also explain how the epsilon-greedy strategy effectively manages this balance.

By Skander, 12 November, 2024

Comparison of Three Multi-armed Bandit Strategies

Comparison of multi-armed bandit strategies

In a previous article, I introduced the design and implementation of a multi-armed bandit (MAB) framework. This framework was built to simplify the implementation of new MAB strategies and provide a structured approach for their analysis. Three strategies have already been integrated into the framework: RandomSelector, MaxAverageRewardSelector, and UpperConfidenceBoundSelector. The goal of this article is to compare these three strategies.

By Skander, 1 November, 2024

A Python Implementation of The Upper Confidence Bound Reinforcement Learning Algorithm

This article explores the implementation of a reinforcement learning algorithm called the Upper Confidence Bound (UCB) algorithm. Reinforcement learning, a subset of artificial intelligence, involves an agent interacting with an environment through a series of episodes or rounds. In each round, the agent makes a decision that may yield a reward. The agent's ultimate objective is to learn a strategy that maximizes its cumulative reward over time.

Reinforcement learning