(3) – Not Magic, Just Math: Unlocking the Probability Behind Machine Learning

Olivier Krieger
18.01.2026 · 3 min read

From the Monty Hall Paradox to Self-Driving Cars: My takeaways from Module 3.

This week, I tackled Module 3 of my Machine Learning Professional Certificate. If Module 2 was about the “messy reality” of cleaning data, Module 3 was about the mathematical engine that makes sense of that mess: Probability.

We often talk about AI as if it’s magic, but at a fundamental level, it is rooted in mathematical principles. Because real-world data is often incomplete or noisy, machine learning relies on probability to model uncertainty and optimize decisions.

The AI Hierarchy: Who sits where?

First, we cleared up the confusion between common terms. They aren’t synonyms; they are a hierarchy.

  • Artificial Intelligence (AI): The broad umbrella of creating systems that emulate human behavior and intelligence.
  • Machine Learning (ML): A subset of AI where machines learn from data without explicit programming.
  • Deep Learning (DL): A subset of ML using neural networks to simulate how human brains perceive the world.
  • Generative AI: A subset of DL that creates new content (text, images, code).

Statistics vs. Probability: Two Sides of the Same Coin

I learned an interesting distinction between these two fields. They address different questions using different approaches.

  • Probability is forward-looking: It starts with a known model (conditions) and predicts likely outcomes. For example, “If I have a fair coin, what are the chances I get heads?”
  • Statistics is backward-looking: It analyzes past data to infer the model. For example, “I flipped a coin 100 times and got 60 heads; is this coin biased?”

In Machine Learning, we need both. Probability helps us make predictions, while statistics helps us validate those predictions against observed data.

Handling Uncertainty with Bayes’ Rule

One of the most powerful tools in ML is Bayes’ Rule. It allows a system to update its beliefs based on new evidence.

  • Prior: The initial probability (belief).
  • Likelihood: The probability of seeing the evidence given that belief.
  • Posterior: The updated probability after considering the evidence.

This is how self-driving cars work. They blend sensor data (evidence) with probability models to estimate where they are and what pedestrians might do, updating their decisions in real-time.

The Monty Hall Paradox

The module used the famous Monty Hall problem to prove how easily human intuition fails at probability.

  • The Setup: You pick one of three doors (Door 1). There is a car behind one and goats behind the others. The host opens a different door (Door 3) to reveal a goat. He asks if you want to switch to Door 2 .
  • The Intuition: Most people think it’s a 50/50 split, so it doesn’t matter if you switch.
  • The Math: You should always switch. Sticking gives you a 1/3 chance of winning, while switching doubles your chances to 2/3.

I even ran a simulation in Python to prove this. After running the game 10,000 times, the graph showed the win rate for switching converging perfectly to roughly 66%.

Probability Distributions

Finally, we covered how variables are classified:

  • Discrete Variables: Countable values (e.g., number of customers). These are often modeled using a Binomial Distribution.
  • Continuous Variables: Infinite values within a range (e.g., temperature). These are often modeled using a Normal Distribution.

These distributions are used everywhere, from stock market analysis (normal distribution) to risk assessment in insurance (Poisson distribution).