Thompson Sampling: A Data-Driven Approach to Optimizing Digital Engagement

Rifan MuazinApril 22, 2026

0 95 11 minutes read

The modern business landscape is increasingly defined by data-driven decision-making. Organizations across industries are amassing vast quantities of information, with numerous teams relying on this data to inform strategic choices. From analyzing user clickstream traffic and data generated by wearable edge devices to processing telemetry from complex systems, the velocity and scale of data generation are accelerating exponentially. This surge in data fuels the growing integration of machine learning and artificial intelligence frameworks to extract actionable insights.

Among the most reliable and time-tested frameworks for data-driven decision-making is A/B testing. This methodology is particularly prevalent in digital environments such as websites and applications, where customer interactions like clicks and orders provide near-instantaneous, large-scale feedback. A/B testing’s power lies in its ability to isolate and control numerous variables, allowing stakeholders to precisely assess the impact of introducing a specific element on key performance indicators (KPIs). However, A/B testing can be time-consuming. The process of concluding a test, communicating results, and then deliberating and implementing decisions can incur significant opportunity costs, especially if the tested experience proves beneficial. This is where algorithms like Thompson Sampling offer a compelling alternative by systematically automating this decision-making process.

Table of Contents

The Multi-Armed Bandit Problem: A Casino Analogy

The conceptual foundation for Thompson Sampling can be illustrated through the "Multi-Armed Bandit Problem." Imagine a scenario where an individual encounters three slot machines at a casino, each with an unknown payout rate. To determine the most lucrative machine, a strategy might involve pulling the arms of each machine a few times at random, meticulously recording the outcomes. After an initial phase, the player would analyze the win rates and, based on this preliminary data, begin to favor the machine that appears to offer the highest payout. This iterative process of exploration (trying different machines) and exploitation (focusing on the most promising one) is central to solving such problems.

DIY AI & ML: Solving The Multi-Armed Bandit Problem with Thompson Sampling

In this hypothetical casino scenario, after initial random pulls, a player might observe the following win rates:

Machine A: 15% win rate
Machine B: 18% win rate
Machine C: 22% win rate

Based on these initial observations, the player might decide to pull Machine C’s arm more frequently than the others, believing it has the highest win rate, while still collecting more data to confirm this hypothesis. After further iterations, the win rates might evolve, leading to increased confidence in Machine C.

This classic example highlights how Thompson Sampling, a Bayesian algorithm, is designed to select among multiple options with unknown reward distributions to maximize expected rewards. It achieves this by navigating the exploration-exploitation trade-off. Because the reward distributions are initially unknown, the algorithm explores by choosing options randomly, gathers data on the results, and progressively favors options that yield higher average rewards. This article will guide readers through building a Thompson Sampling algorithm in Python and applying it to a practical, real-world scenario.

Optimizing Email Headlines for Higher Open Rates

Consider the role of a marketing professional responsible for email campaigns. Historically, the team might have used A/B testing to determine which email headlines result in higher open rates. However, adopting a multi-armed bandit approach could accelerate the realization of value. To demonstrate the effectiveness of Thompson Sampling, a Python simulation will be developed to compare it against a purely random approach.

Step 1: Establishing a Base Email Simulation Framework

A foundational class, BaseEmailSimulation, will serve as a template for both random and bandit simulations. This class will store essential information such as email headlines and their true, albeit unknown to the algorithm, open rates. The "true open rates" are treated as probabilities that govern the outcome of sending an email. A random number generator is included to ensure simulation reproducibility. Additionally, a reset_results() function is implemented to clear simulation state for fresh runs.

import numpy as np
import pandas as pd

class BaseEmailSimulation:
    """
    Base class for email headline simulations.

    Shared responsibilities:
    - store headlines and their true open probabilities
    - simulate a binary email-open outcome
    - reset simulation state
    - build a summary table from the latest run
    """

    def __init__(self, headlines, true_probabilities, random_state=None):
        self.headlines = list(headlines)
        self.true_probabilities = np.array(true_probabilities, dtype=float)

        if len(self.headlines) == 0:
            raise ValueError("At least one headline must be provided.")

        if len(self.headlines) != len(self.true_probabilities):
            raise ValueError("headlines and true_probabilities must have the same length.")

        if np.any(self.true_probabilities < 0) or np.any(self.true_probabilities > 1):
            raise ValueError("All true_probabilities must be between 0 and 1.")

        self.n_arms = len(self.headlines)
        self.rng = np.random.default_rng(random_state)

        # Ground-truth best arm information for evaluation
        self.best_arm_index = int(np.argmax(self.true_probabilities))
        self.best_headline = self.headlines[self.best_arm_index]
        self.best_true_probability = float(self.true_probabilities[self.best_arm_index])

        # Results from the latest completed simulation
        self.reset_results()

    def reset_results(self):
        """
        Clear all results from the latest simulation.
        Called automatically at initialization and at the start of each run().
        """
        self.reward_history = []
        self.selection_history = []
        self.history = pd.DataFrame()
        self.summary_table = pd.DataFrame()
        self.total_opens = 0
        self.cumulative_opens = []

    def send_email(self, arm_index):
        """
        Simulate sending an email with the selected headline.

        Returns
        -------
        int
            1 if opened, 0 otherwise.
        """
        if arm_index < 0 or arm_index >= self.n_arms:
            raise IndexError("arm_index is out of bounds.")

        true_p = self.true_probabilities[arm_index]
        reward = self.rng.binomial(n=1, p=true_p)

        return int(reward)

    def _finalize_history(self, records):
        """
        Convert round-level records into a DataFrame and populate
        shared result attributes.
        """
        self.history = pd.DataFrame(records)

        if not self.history.empty:
            self.reward_history = self.history["reward"].tolist()
            self.selection_history = self.history["arm_index"].tolist()
            self.total_opens = int(self.history["reward"].sum())
            self.cumulative_opens = self.history["reward"].cumsum().tolist()
        else:
            self.reward_history = []
            self.selection_history = []
            self.total_opens = 0
            self.cumulative_opens = []

        self.summary_table = self.build_summary_table()

    def build_summary_table(self):
        """
        Build a summary table from the latest completed simulation.

        Returns
        -------
        pd.DataFrame
            Summary by headline.
        """
        if self.history.empty:
            return pd.DataFrame(columns=[
                "arm_index",
                "headline",
                "selections",
                "opens",
                "realized_open_rate",
                "true_open_rate"
            ])

        summary = (
            self.history
            .groupby(["arm_index", "headline"], as_index=False)
            .agg(
                selections=("reward", "size"),
                opens=("reward", "sum"),
                realized_open_rate=("reward", "mean"),
                true_open_rate=("true_open_rate", "first")
            )
            .sort_values("arm_index")
            .reset_index(drop=True)
        )

        return summary

The reset_results() method ensures that each simulation run starts with a clean slate, crucial for comparing different approaches fairly. The send_email() function simulates the binary outcome of an email being opened (reward=1) or not (reward=0) based on the predefined true open rate for a given headline. Finally, _finalize_history() and build_summary_table() process the raw simulation data into a comprehensive summary, detailing metrics like the number of times each headline was selected, the total opens, and the realized open rate compared to the true open rate.

Step 2: The Random Email Simulation Subclass

To establish a baseline for comparison, a RandomEmailSimulation subclass is introduced. This class mirrors the behavior of a standard A/B test where options are chosen uniformly at random.

class RandomEmailSimulation(BaseEmailSimulation):
    """
    Random selection email headline simulation.
    """

    def select_headline(self):
        """
        Select one headline uniformly at random.
        """
        return int(self.rng.integers(low=0, high=self.n_arms))

    def run(self, num_iterations):
        """
        Run a fresh random simulation from scratch.

        Parameters
        ----------
        num_iterations : int
            Number of simulated email sends.
        """
        if num_iterations <= 0:
            raise ValueError("num_iterations must be greater than 0.")

        self.reset_results()
        records = []
        cumulative_opens = 0

        for round_number in range(1, num_iterations + 1):
            arm_index = self.select_headline()
            reward = self.send_email(arm_index)
            cumulative_opens += reward

            records.append(
                "round": round_number,
                "arm_index": arm_index,
                "headline": self.headlines[arm_index],
                "reward": reward,
                "true_open_rate": self.true_probabilities[arm_index],
                "cumulative_opens": cumulative_opens
            )

        self._finalize_history(records)

The select_headline() method within this subclass randomly picks one of the available headlines. The run() method orchestrates the simulation, repeatedly calling select_headline() and send_email() to gather results over a specified number of iterations.

Thompson Sampling and the Beta Distribution: A Deeper Dive

Before implementing the Thompson Sampling subclass, it is essential to understand the underlying mathematical principles, specifically the Beta distribution. In the context of email headlines, we have a set of options (headlines) with unknown open rates. Thompson Sampling leverages the Beta distribution to model the uncertainty surrounding these rates.

The Beta distribution is a continuous probability distribution defined on the interval (0,1). It is characterized by two parameters: alpha (α) and beta (β), often interpreted as representing "successes" and "failures," respectively. Initially, for each headline, the algorithm assumes a prior distribution where α = 1 and β = 1. When an email with a particular headline is opened (a success), the α value for that headline’s distribution is incremented. If the email is not opened (a failure), the β value is incremented.

This initial setting (α=1, β=1) does not necessarily imply a 50% assumed open rate. Instead, it represents a state of maximum uncertainty, where all probabilities between 0 and 1 are equally likely. As data is collected, these α and β values are updated, and the Beta distribution for each headline becomes more concentrated around its true open rate.

For example, after an initial phase, if a headline has been sent 18 times with 9 opens (successes) and 9 non-opens (failures), its α would be 1 + 9 = 10, and its β would be 1 + 9 = 10. The mean of this Beta distribution (α / (α + β)) would be 10 / (10 + 10) = 0.5, indicating a realized open rate of 50%. As more data is gathered, the distribution becomes narrower, indicating greater certainty.

Thompson Sampling works by sampling a random value from the current Beta distribution for each headline at each iteration. The headline corresponding to the highest sampled value is then selected. This process inherently balances exploration (sampling from distributions that might still have high uncertainty) and exploitation (favoring headlines whose distributions suggest higher probabilities of success).

Step 3: The Bandit Email Simulation Subclass

The BanditSimulation subclass implements the Thompson Sampling algorithm. It inherits from BaseEmailSimulation and introduces specific attributes for managing the Beta posteriors.

class BanditSimulation(BaseEmailSimulation):
    """
    Thompson Sampling email headline simulation.

    Each headline is modeled with a Beta posterior over its
    unknown open probability. At each iteration, one sample is drawn
    from each posterior, and the headline with the largest sample is selected.
    """

    def __init__(
        self,
        headlines,
        true_probabilities,
        alpha_prior=1.0,
        beta_prior=1.0,
        random_state=None
    ):
        super().__init__(
            headlines=headlines,
            true_probabilities=true_probabilities,
            random_state=random_state
        )

        if alpha_prior <= 0 or beta_prior <= 0:
            raise ValueError("alpha_prior and beta_prior must be positive.")

        self.alpha_prior = float(alpha_prior)
        self.beta_prior = float(beta_prior)

        self.reset_bandit_state()

    def reset_bandit_state(self):
        """
        Reset posterior state for a fresh Thompson Sampling run.
        """
        self.alpha = np.full(self.n_arms, self.alpha_prior, dtype=float)
        self.beta = np.full(self.n_arms, self.beta_prior, dtype=float)

    def posterior_means(self):
        """
        Return the posterior mean for each headline.
        """
        return self.alpha / (self.alpha + self.beta)

    def select_headline(self):
        """
        Draw one sample from each arm's Beta posterior and
        select the headline with the highest sampled value.
        """
        sampled_values = self.rng.beta(self.alpha, self.beta)
        return int(np.argmax(sampled_values))

    def update_posterior(self, arm_index, reward):
        """
        Update the selected arm's Beta posterior using the observed reward.
        """
        if arm_index < 0 or arm_index >= self.n_arms:
            raise IndexError("arm_index is out of bounds.")

        if reward not in (0, 1):
            raise ValueError("reward must be either 0 or 1.")

        self.alpha[arm_index] += reward
        self.beta[arm_index] += (1 - reward)

    def run(self, num_iterations):
        """
        Run a fresh Thompson Sampling simulation from scratch.

        Parameters
        ----------
        num_iterations : int
            Number of simulated email sends.
        """
        if num_iterations <= 0:
            raise ValueError("num_iterations must be greater than 0.")

        self.reset_results()
        self.reset_bandit_state()

        records = []
        cumulative_opens = 0

        for round_number in range(1, num_iterations + 1):
            arm_index = self.select_headline()
            reward = self.send_email(arm_index)
            self.update_posterior(arm_index, reward)

            cumulative_opens += reward

            records.append(
                "round": round_number,
                "arm_index": arm_index,
                "headline": self.headlines[arm_index],
                "reward": reward,
                "true_open_rate": self.true_probabilities[arm_index],
                "cumulative_opens": cumulative_opens,
                "posterior_mean": self.posterior_means()[arm_index],
                "alpha": self.alpha[arm_index],
                "beta": self.beta[arm_index]
            )

        self._finalize_history(records)

        # Rebuild summary table with extra posterior columns
        self.summary_table = self.build_summary_table()

    def build_summary_table(self):
        """
        Build a summary table for the latest Thompson Sampling run.
        """
        if self.history.empty:
            return pd.DataFrame(columns=[
                "arm_index",
                "headline",
                "selections",
                "opens",
                "realized_open_rate",
                "true_open_rate",
                "final_posterior_mean",
                "final_alpha",
                "final_beta"
            ])

        summary = (
            self.history
            .groupby(["arm_index", "headline"], as_index=False)
            .agg(
                selections=("reward", "size"),
                opens=("reward", "sum"),
                realized_open_rate=("reward", "mean"),
                true_open_rate=("true_open_rate", "first")
            )
            .sort_values("arm_index")
            .reset_index(drop=True)
        )

        summary["final_posterior_mean"] = self.posterior_means()
        summary["final_alpha"] = self.alpha
        summary["final_beta"] = self.beta

        return summary

The reset_bandit_state() method ensures that the α and β values are reset for each new simulation run, preventing data leakage. The select_headline() method samples from each headline’s Beta posterior and chooses the one with the highest sampled value. The update_posterior() function then updates the α or β parameter based on the observed reward (email open or not). The run() method orchestrates the Thompson Sampling process, and the build_summary_table() is adapted to include posterior-related metrics.

Running the Simulation and Analyzing Results

To provide a comprehensive comparison, a run_comparison_experiment function is implemented. This function executes both the random and bandit simulations for various iteration counts and generates a detailed comparison report.

def run_comparison_experiment(
    headlines,
    true_probabilities,
    iteration_list=(100, 1000, 10000, 100000, 1000000),
    random_seed=42,
    bandit_seed=123,
    alpha_prior=1.0,
    beta_prior=1.0
):
    """
    Run RandomSimulation and BanditSimulation side by side across
    multiple iteration counts.

    Returns
    -------
    comparison_df : pd.DataFrame
        High-level comparison table across iteration counts.

    detailed_results : dict
        Nested dictionary containing simulation objects and summary tables
        for each iteration count.
    """

    comparison_rows = []
    detailed_results = 

    for n in iteration_list:
        # Fresh objects for each simulation size
        random_sim = RandomEmailSimulation(
            headlines=headlines,
            true_probabilities=true_probabilities,
            random_state=random_seed
        )

        bandit_sim = BanditSimulation(
            headlines=headlines,
            true_probabilities=true_probabilities,
            alpha_prior=alpha_prior,
            beta_prior=beta_prior,
            random_state=bandit_seed
        )

        # Run both simulations
        random_sim.run(num_iterations=n)
        bandit_sim.run(num_iterations=n)

        # Core metrics
        random_opens = random_sim.total_opens
        bandit_opens = bandit_sim.total_opens

        random_open_rate = random_opens / n if n > 0 else 0
        bandit_open_rate = bandit_opens / n if n > 0 else 0

        additional_opens = bandit_opens - random_opens

        opens_lift_pct = (
            ((bandit_opens - random_opens) / random_opens) * 100
            if random_opens != 0 else np.nan
        )

        open_rate_lift_pct = (
            ((bandit_open_rate - random_open_rate) / random_open_rate) * 100
            if random_open_rate != 0 else np.nan
        )

        comparison_rows.append(
            "iterations": n,
            "random_opens": random_opens,
            "bandit_opens": bandit_opens,
            "additional_opens_from_bandit": additional_opens,
            "opens_lift_pct": opens_lift_pct,
            "random_open_rate": random_open_rate,
            "bandit_open_rate": bandit_open_rate,
            "open_rate_lift_pct": open_rate_lift_pct
        )

        detailed_results[n] = 
            "random_sim": random_sim,
            "bandit_sim": bandit_sim,
            "random_summary_table": random_sim.summary_table.copy(),
            "bandit_summary_table": bandit_sim.summary_table.copy()
        

    comparison_df = pd.DataFrame(comparison_rows)

    # Optional formatting helpers
    comparison_df["random_open_rate"] = comparison_df["random_open_rate"].round(4)
    comparison_df["bandit_open_rate"] = comparison_df["bandit_open_rate"].round(4)
    comparison_df["opens_lift_pct"] = comparison_df["opens_lift_pct"].round(2)
    comparison_df["open_rate_lift_pct"] = comparison_df["open_rate_lift_pct"].round(2)

    return comparison_df, detailed_results

# Example Usage:
headlines = [
    "48 Hours Only: Save 25%",
    "Your Exclusive Spring Offer Is Here",
    "Don’t Miss Your Member Discount",
    "Ending Tonight: Final Chance to Save",
    "A Little Something Just for You"
]

true_open_rates = [0.18, 0.21, 0.16, 0.24, 0.20]

comparison_df, detailed_results = run_comparison_experiment(
    headlines=headlines,
    true_probabilities=true_open_rates,
    iteration_list=(100, 1000, 10000, 100000, 1000000),
    random_seed=42,
    bandit_seed=123
)

display_df = comparison_df.copy()
display_df["random_open_rate"] = (display_df["random_open_rate"] * 100).round(2).astype(str) + "%"
display_df["bandit_open_rate"] = (display_df["bandit_open_rate"] * 100).round(2).astype(str) + "%"
display_df["opens_lift_pct"] = display_df["opens_lift_pct"].round(2).astype(str) + "%"
display_df["open_rate_lift_pct"] = display_df["open_rate_lift_pct"].round(2).astype(str) + "%"

print(display_df)

The simulation is run with a set of example headlines and their true open rates. The results clearly demonstrate the advantage of Thompson Sampling as the number of iterations increases.

Simulation Results Analysis:

At 100 and 1,000 iterations, the performance difference between the random approach and Thompson Sampling is negligible, with the bandit approach even lagging slightly in the 1,000-iteration scenario. However, as the simulation scales to 10,000 iterations and beyond, the Thompson Sampling approach consistently outperforms the random method, showing a lift of approximately 20%.

Consider the implications for a large enterprise. A 20% improvement in email open rates, when sending millions of emails in a single campaign, could translate into millions of dollars in incremental revenue. This highlights the significant business value that can be unlocked by optimizing decision-making processes through advanced algorithms like Thompson Sampling.

Conclusion: When to Deploy Thompson Sampling

Thompson Sampling presents a powerful alternative to traditional A/B testing, particularly for optimizing online campaigns, recommendation systems, and other scenarios requiring continuous learning and adaptation. However, its effectiveness can vary depending on the specific context. Here is a checklist to help determine if a Thompson Sampling approach is suitable for a given problem:

Frequent Decision-Making: The problem involves making a high volume of decisions over time.
Unknown Reward Distributions: The outcomes (rewards) of different options are not precisely known beforehand.
Need for Rapid Learning: There is a requirement to quickly identify and capitalize on the best-performing options.
Exploration-Exploitation Trade-off: The situation naturally involves balancing the need to explore new options with exploiting known good ones.
Online Environment: Decisions and their outcomes occur in real-time, allowing for continuous updates.

By understanding the principles of the Multi-Armed Bandit problem and leveraging the statistical power of the Beta distribution, organizations can implement Thompson Sampling to drive more efficient and profitable outcomes in their digital strategies. The transition from static A/B tests to dynamic, adaptive bandit algorithms represents a significant advancement in data-driven decision-making, enabling businesses to continuously optimize their engagement strategies.

The Multi-Armed Bandit Problem: A Casino Analogy

Optimizing Email Headlines for Higher Open Rates

Step 1: Establishing a Base Email Simulation Framework

Step 2: The Random Email Simulation Subclass

Thompson Sampling and the Beta Distribution: A Deeper Dive

Step 3: The Bandit Email Simulation Subclass

Running the Simulation and Analyzing Results

Conclusion: When to Deploy Thompson Sampling

Share this:

Related posts:

Rifan Muazin

Related Articles

Automation and Rent Dissipation: Implications for Wages, Inequality, and Productivity

TurboQuant’s Emergence at ICLR 2026 Highlights Key Differences with EDEN Vector Quantization

The Causal Impact of London Tube Strikes on Santander Cycle Usage

Train, Serve, and Deploy a Scikit-learn Model with FastAPI

Leave a Reply Cancel reply