Decision Trees are one of the most intuitive machine learning algorithms. They are built by splitting datasets into branches, guided by
Information Gain (based on Entropy). In this post, we’ll walk step by step through coding the building blocks of a decision tree
- Computing entropy
- Splitting the dataset
- Calculating information gain
- Choosing the best split
1. Computing Entropy
Entropy is a measure of impurity at a node.
The formula is:
where is the fraction of positive examples (e.g., “edible” mushrooms).
If the node is pure (all edible or all poisonous), entropy = 0.
Here’s the implementation:
✅ Quick test:
2. Splitting the Dataset
To build a decision tree, we split data based on feature values.
If the feature = 1 → goes to the left branch, otherwise → right branch.
✅ Quick test:
3. Computing Information Gain
Now, we measure how much splitting reduces impurity:
Implementation:
✅ Quick test:
4. Choosing the Best Split
Finally, we loop through all features and pick the one that maximizes Information Gain.
✅ Quick test:
๐ Key Takeaways
- Entropy measures impurity.
- Splitting separates data into left (1) and right (0).
- Information Gain tells us how good a split is.
- Best Split is the feature with the highest information gain.