Analyzing the Upper Confidence Bound Algorithm

By Skander, 3 November, 2024

Number of impressions for each ad over time.

This article focuses on evaluating the implementation of the Upper Confidence Bound (UCB) algorithm discussed herein. The evaluation is conducted using a single dataset provided by Super Data Science.

The Dataset

The dataset is a matrix with $10,000$ rows and $10$ columns. Each row represents a request made to an online ad server to display an advertisement. The ad server has a catalog of $10$ ads: ${Ad_1, \ldots, Ad_{10}}$. The value in cell $(i, j)$ is 1 if and only if ad $Ad_j$ would be clicked when displayed to a user visiting the website; otherwise, the value is 0. Importantly, this information is unknown to the ad server at the time of ad selection.

The ad server aims to select an ad that is most likely to be clicked, using historical data to make informed decisions. This approach seeks to maximize revenue for publishers (website owners) while optimizing reach for advertisers.

The problem can be modeled as a multi-armed bandit game, where:

Each advertisement represents an arm of the bandit, yielding a reward of 1 if the corresponding ad is clicked.
Each ad request event corresponds to a round of the game.
The ad server acts as the player in this game, tasked with selecting the optimal arm to maximize rewards.

Initializing and Running UCB

Let’s now set up the UCB algorithm and put it to the test by playing the multi-armed bandit game using the dataset described earlier. To begin, we’ll import all the necessary Python libraries required for this analysis, laying the groundwork for our implementation.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
import pandas as pd
import math
from ucb import UpperConfidenceBound

Loading the dataset is made easy, thanks to Pandas library:

dataset = pd.read_csv('ad_server.csv')

Finally, we create an instance of the UpperConfidenceBound class and initiate the game, or reinforcement learning process, by calling its learn() method.

n_rounds = 10000
training_set = dataset[:n_rounds + 1]
ucb = UpperConfidenceBound(training_set.values)
ucb.learn()

Total Reward

The objective of the ad server is to maximize revenue, which translates to maximizing the total reward by the end of the multi-armed bandit game. Upon examining the dataset, we notice rows where all columns contain zeroes (like rows $2$ and $4$ in the figure above). This means that for these specific ad request events, regardless of which ad is served, no clicks will occur from the ad impressions.

So, first, let's calculate the optimum reward one hopes to reach. This is equivalent to counting the rows where all columns are zeroes.

rows_with_all_zeros = dataset[(dataset == 0).all(axis=1)].shape[0]
print(f"Number of rows with all zeros: {rows_with_all_zeros}")
optimum_reward = n_rounds - rows_with_all_zeros
print(f"The optmum reward is {optimum_reward}")

In this dataset, the optimum reward is $7424$. We define the efficiency of a multi-armed bandit strategy as: $$efficiency = \frac{total\_reward(strategy)}{optimum\_reward}$$

The code snippet below, calculates the total reward and the efficiency of UCB:

total_reward = ucb.total_reward()
efficiency = round(total_reward * 100.0 / optimum_reward)

The UCB algorithm achieved a total reward of $2211$, which is only $30\%$ of the optimal reward. This indicates significant potential for improvement.

Most Served Ad

In this section, I will demonstrate how often each advertisement was served. Recall the array UpperConfidenceBound.selected_candidates(). To visualize this, we can simply plot the array as a histogram using the code provided below.

freqs, bins, patches = plt.hist(ucb.selected_candidates())
print(f"Frequency of each ad: {','.join(map(lambda fr: str(int(fr)), freqs))}")
plt.xlabel('Ad index')
plt.ylabel('Impressions')
plt.title('Number of times an ad is displayed')
norm_freqs = freqs / freqs.max() 
norm = colors.Normalize(norm_freqs.min(), norm_freqs.max())
print(norm)
for thisfrac, thispatch in zip(norm_freqs, patches):
    color = plt.cm.cividis(norm(thisfrac))
    thispatch.set_facecolor(color)
plt.show()

This produces the histogram shown below.

The diagram reveals that $Ad_5$ was the most frequently displayed advertisement, with $Ad_8$ being the second most served.

Add new comment