From If-Then logic to Information Gain—my deep dive into Module 9.

Two weeks ago, I started Module 9 of my Machine Learning Professional Certificate, and it introduced one of the most intuitive models in the field: Decision Trees.
If k-NN (from last week) was about finding your closest neighbors, Decision Trees are about breaking decisions down into a sequence of simple “if-then” questions. It mirrors human reasoning—narrowing down options step by step until you reach a conclusion.
Here are the core concepts I learned about building and controlling these trees:
1. The “White Box” Advantage
While models like Neural Networks are considered “black boxes” because their logic is hidden, Decision Trees are highly interpretable “white box” models. You can visually follow the exact path the algorithm took to make a prediction, making it incredibly easy to explain to non-technical stakeholders.
2. The Anatomy of a Tree
A decision tree is an upside-down structure made of a few key components:
- Root Node: The starting point that holds the entire dataset before any splits occur.
- Internal Nodes: The points where the data is evaluated and split based on a specific feature.
- Branches: The arrows representing the outcome of a test (e.g., “Is income > $50k?”).
- Leaf Nodes: The final terminal points that provide the actual prediction or class label.
3. The Math of Splitting: Entropy and Gini
How does the algorithm actually know where to split the data? It tries to make the resulting groups as “pure” as possible using mathematical metrics:
- Entropy: A measure of the amount of uncertainty, chaos, or impurity in a node.
- Information Gain: The goal of a split is to maximize Information Gain, which is calculated by taking the entropy of the parent node and subtracting the weighted average entropy of the newly created child nodes.
- Gini Index: An alternative to Entropy, this measures the probability of misclassifying a randomly chosen element in the node.
4. The Overfitting Trap and “Pruning”
The biggest weakness of a Decision Tree is that it doesn’t know when to quit. A fully grown tree will keep branching until every single leaf is 100% pure, which means it will completely overfit the training data and memorize the noise.
To fix this, we use a technique called Pruning:
- Pre-pruning: Stopping the tree from growing early by setting strict rules, such as a maximum depth or a minimum number of samples required to make a split.
- Post-pruning: Allowing the tree to grow to its full, complex maximum, and then strategically removing branches that don’t add predictive power.
Conclusion
Decision trees are fantastic because they require almost no data preprocessing—you don’t even need to scale or normalize your data like you do with k-NN. However, they suffer from high variance, meaning even tiny changes in your training data can result in a completely different tree structure.
Next, we’ll look at how to solve this instability by combining multiple trees together!