TIL but from youtube: Who's that pokémon

TIL but from youtube: Who's that pokémon

Today I talk about a video I saw about geometric distribution with the help of pokémon.

Introduction

The video tries to explore what is the expected number of encounters before we catch all the pokémons in the game, of course there are quite a few assumptions made which you can check out in the video itself which is super super fun. I try to simulate that stuff myself in a jupyter notebook to see what results I get.

Data Preparation and Loading

The first step in any data analysis is loading and preparing the data. The notebook begins by importing necessary libraries and setting up the environment for analysis.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Simulation of Pokémon Encounters

To estimate how many encounters are required to catch all the pokemons we simulate it using:

def simulate_encounters(num_pokemon=150, num_simulations=10000):
    np.random.seed(42)
    encounter_results = []
    for _ in range(num_simulations):
        caught = set()
        encounters = 0
        while len(caught) < num_pokemon:
            pokemon = np.random.randint(1, num_pokemon + 1)
            caught.add(pokemon)
            encounters += 1
        encounter_results.append(encounters)
    return encounter_results

encounters = simulate_encounters()

Analysis of Simulation Results

Geometric Distribution

The geometric distribution models the number of trials needed for a first success in a series of Bernoulli trials. The probability of success in each trial is p, and the probability of failure is 1-p.

For example, if the probability of encountering a new Pokémon is p, the number of encounters needed to find a new Pokémon follows a geometric distribution. The expected number of encounters for a new Pokémon is 1/p.

In the Pokémon context, the probability p changes as you catch new Pokémon because the pool of unseen Pokémon decreases.

Expected Value

The expected value is the average number of trials needed for an event to occur, considering the probabilities of different outcomes. For a geometric distribution with probability p, the expected value (or mean) is 1/p.

In our scenario:

  • The first encounter is guaranteed to be a new Pokémon (expected value = 1).

  • The second encounter has a probability of n-1/n of being a new Pokémon, so the expected value is n/(n-1).

  • This pattern continues until the last Pokémon is caught.

The total expected number of encounters to catch all n Pokémon is the sum of these expected values.

Harmonic Series

The harmonic series is the sum of the reciprocals of the first n natural numbers: H_n = 1 + 1/2 + 1/3 + ... + 1/n

The n-th harmonic number grows logarithmically with n, approximately as H_n ≈ ln(n) + γ, where γ is the Euler-Mascheroni constant. For large n, this approximation is very useful.

In the context of catching Pokémon, the expected number of encounters grows as n * H_n.

df_encounters = pd.DataFrame(encounters, columns=['encounters'])
mean_encounters = df_encounters['encounters'].mean()
median_encounters = df_encounters['encounters'].median()
expected = num_pokemon * (np.log(num_pokemon) + 0.57721)

Visualisation of Results

plt.figure(figsize=(12, 6))
sns.histplot(df_encounters, bins=30, kde=True)
plt.axvline(expected, color='r', linestyle='--', label=f'Expected: {expected:.2f}')
plt.title('Distribution of Encounters Needed to Catch 150 Pokémon')
plt.xlabel('Number of Encounters')
plt.ylabel('Frequency')
plt.legend()
plt.show()

Code: https://github.com/SammithSB/TIL/blob/main/day5_pokemon/pokemon.ipynb