Skip to main content
Home

Main navigation

  • Home
User account menu
  • Log in

Breadcrumb

  1. Home

Reinforcement learning

By Skander, 8 December, 2024

Thompson Sampling With Gaussian Distribution - A Stochastic Multi-armed Bandit

Gauss sampling pebbles
A Thompson Sampling algorithm using a Beta probability distribution was introduced in a previous post. The Beta distribution is well-suited for binary multi-armed bandits (MABs), where arm rewards are restricted to values of 0 or 1. In this article, we introduce an alternative MAB sampling algorithm designed for the more general case where arm rewards are continuous: Thompson Sampling with a Gaussian Distribution (TSG).
By Skander, 29 November, 2024

Stochastic Multi-armed Bandit - Thompson Sampling With Beta Distribution

MAB - Thompson Sampling With Beta Distribution

We have previously explored two multi-armed bandit (MAB) strategies: Maximum Average Reward (MAR) and Upper Confidence Bound (UCB). Both approaches rely on the observed average reward to determine which arm to pull next, using a deterministic scoring mechanism for decision-making.

By Skander, 15 November, 2024

The Exploration-Exploitation Balance: The Epsilon-Greedy Approach in Multi-Armed Bandits

Exploration versus exploitation

In this article, I will explore the balance between exploration and exploitation, a key concept in reinforcement learning and optimization problems. To illustrate this, I will use the multi-armed bandit problem as an example. I will also explain how the epsilon-greedy strategy effectively manages this balance.

By Skander, 12 November, 2024

Comparison of Three Multi-armed Bandit Strategies

Comparison of multi-armed bandit strategies
In a previous article, I introduced the design and implementation of a multi-armed bandit (MAB) framework. This framework was built to simplify the implementation of new MAB strategies and provide a structured approach for their analysis. Three strategies have already been integrated into the framework: RandomSelector, MaxAverageRewardSelector, and UpperConfidenceBoundSelector. The goal of this article is to compare these three strategies.
By Skander, 1 November, 2024

A Python Implementation of The Upper Confidence Bound Reinforcement Learning Algorithm

multi-armed bandit

This article explores the implementation of a reinforcement learning algorithm called the Upper Confidence Bound (UCB) algorithm. Reinforcement learning, a subset of artificial intelligence, involves an agent interacting with an environment through a series of episodes or rounds. In each round, the agent makes a decision that may yield a reward. The agent's ultimate objective is to learn a strategy that maximizes its cumulative reward over time.

Reinforcement learning

My Apps

  • One-dimensional Cellular Automata Simulator
  • Collatz (Syracuse) Sequence Calculator / Visualizer
  • Erdős–Rényi Random Graph Generator / Analyzer
  • KMeans Animator
  • Language Family Explorer

New Articles

Beyond Coding by Addy Osmani - A Book Review
When Free LLMs Turned Restrictive: Lessons from Building a YouTube Recommender with Gemini
DigitalOcean vs OVH: A Hands-On VPS Performance Comparison
Web Design in the Age of Artificial Intelligence – My Personal Experience
The Hundred-Page Language Models Book - Book Review

Skander Kort