(4) - The Statistics Survival Guide: Probing the Data and Quantifying the Unknown

From the Challenger disaster to Bootstrapping: How stats keep ML models honest

In Module 4 of my Machine Learning journey, I moved into the “engine room” of AI: Applied Statistics. While coding is the tool, statistics is the mathematical backbone that allows computers to actually learn from data.

Below are the four pillars that defined my study this week.

Maximum Likelihood Estimation (MLE)

MLE is the bridge between classical math and modern AI. Its goal is to find the specific parameter values that make our observed data “most likely” to have occurred.

The Log-Likelihood Trick: Computers can struggle with multiplying many tiny probabilities (arithmetic underflow). To solve this, we take the logarithm of the likelihood function, l(θ, X) = log L(θ, X), which turns complex multiplications into simple additions.
Applications: MLE is the logic behind Logistic Regression coefficients and the Loss Functions used in deep learning.

Mastering the Mess: Outlier Detection

Outliers are data points that deviate significantly from the rest of the set, often caused by measurement errors, typos, or unique system conditions.

Detection Methods: The Median vs. Mean: Unlike the mean, the median is “resistant” to outliers, making it a more reliable measure of central tendency when your data is noisy.
- The IQR Rule: We sort the data and identify the Interquartile Range (IQR = Q3 - Q1). Low outliers are defined as any point x < Q1 - 1.5 * IQR while high outliers are defined as any point x > Q3 + 1.5 * IQR.
- Z-Scores: A Z-score represents how many standard deviations a point is from the mean; points beyond ±3 are usually flagged.
How to Handle Them: Once found, we can remove them (if they are clearly errors), impute them (replace with the median), or transform them (using log scales to compress their impact).

The Workhorses: Regression and Correlation

This week highlighted Linear Regression as the “workhorse” of applied statistics.

Linear Regression: We use the “least squares” method to find the “best-fitting line” (Y = aX + b) by minimizing the sum of squared errors between the data and our predictions.
Correlation (r): This measures “co-movement” between variables on a scale of -1 to 1. A positive correlation means they increase together, while a negative one means they move in opposite directions.

The Stakes: We studied the Challenger Space Shuttle disaster, where a failure to properly account for the correlation between O-ring distress and cold temperatures led to a fatal accident.

Bootstrapping: Measuring Trust

How do we know if our data summary (like a mean) is actually accurate? We use Bootstrapping, a resampling technique.

The Process: We create thousands of new datasets by sampling from our original data “with replacement”—meaning one data point can appear multiple times in a single set.
The Goal: By looking at the distribution of results across these thousands of simulations, we can quantify uncertainty and establish confidence intervals.