Skip to main content
Home

Main navigation

  • Home
User account menu
  • Log in

Breadcrumb

  1. Home

Reinforcement learning

By Skander, 8 December, 2024

Thompson Sampling With Gaussian Distribution - A Stochastic Multi-armed Bandit

A Thompson Sampling algorithm using a Beta probability distribution was introduced in a previous post. The Beta distribution is well-suited for binary multi-armed bandits (MABs), where arm rewards are restricted to values of 0 or 1. In this article, we introduce an alternative MAB sampling algorithm designed for the more general case where arm rewards are continuous: Thompson Sampling with a Gaussian Distribution (TSG).
Gauss sampling pebbles
By Skander, 29 November, 2024

Stochastic Multi-armed Bandit - Thompson Sampling With Beta Distribution

We have previously explored two multi-armed bandit (MAB) strategies: Maximum Average Reward (MAR) and Upper Confidence Bound (UCB). Both approaches rely on the observed average reward to determine which arm to pull next, using a deterministic scoring mechanism for decision-making.

MAB - Thompson Sampling With Beta Distribution
By Skander, 15 November, 2024

The Exploration-Exploitation Balance: The Epsilon-Greedy Approach in Multi-Armed Bandits

In this article, I will explore the balance between exploration and exploitation, a key concept in reinforcement learning and optimization problems. To illustrate this, I will use the multi-armed bandit problem as an example. I will also explain how the epsilon-greedy strategy effectively manages this balance.

Exploration versus exploitation
By Skander, 12 November, 2024

Comparison of Three Multi-armed Bandit Strategies

In a previous article, I introduced the design and implementation of a multi-armed bandit (MAB) framework. This framework was built to simplify the implementation of new MAB strategies and provide a structured approach for their analysis. Three strategies have already been integrated into the framework: RandomSelector, MaxAverageRewardSelector, and UpperConfidenceBoundSelector. The goal of this article is to compare these three strategies.
Comparison of multi-armed bandit strategies
By Skander, 1 November, 2024

A Python Implementation of The Upper Confidence Bound Reinforcement Learning Algorithm

This article explores the implementation of a reinforcement learning algorithm called the Upper Confidence Bound (UCB) algorithm. Reinforcement learning, a subset of artificial intelligence, involves an agent interacting with an environment through a series of episodes or rounds. In each round, the agent makes a decision that may yield a reward. The agent's ultimate objective is to learn a strategy that maximizes its cumulative reward over time.

multi-armed bandit
Reinforcement learning

My Apps

  • One-dimensional Cellular Automata Simulator
  • Collatz (Syracuse) Sequence Calculator / Visualizer
  • Erdős–Rényi Random Graph Generator / Analyzer
  • KMeans Animator
  • Language Family Explorer

New Articles

The Hundred-Page Language Models Book - Book Review
A Utility for Converting TSPLIB files to JSON Format
Escape YouTube Filter Bubble - An LLM-based Video Recommender
Implementing a 1-D Binary-State Cellular Automaton with TypeScript, Svelte, and PixiJS
A Parametric Approach to Cellular Automata Framework Design

Skander Kort