3.3 Information Gain
1. Definitions
1.1 Entropy
Entropy is a measure of impurity or uncertainty in a dataset. It comes from information theory and is calculated as:
where
- High entropy → Data is more mixed (uncertain).
- Low entropy → Data is more pure (certain).
Example:
- Dataset 1: 50% cats, 50% dogs → High entropy (uncertain).
- Dataset 2: 100% cats → Low entropy (pure).
1.2 Information Gain (IG)
Information gain measures how much entropy decreases after splitting on a feature:
where:
= entropy before the split = subsets created by the split = entropy of each subset
A higher information gain means the feature reduces uncertainty more, making it a better choice.
2. Toy Example
Python example demonstrating entropy and information gain using sklearn
.
import numpy as npfrom sklearn.tree import DecisionTreeClassifierfrom scipy.stats import entropy
# Function to compute entropydef calculate_entropy(y): class_counts = np.bincount(y) probabilities = class_counts / np.sum(class_counts) return entropy(probabilities, base=2)
# Function to compute information gaindef information_gain(X, y, feature_index): # Original entropy original_entropy = calculate_entropy(y)
# Split dataset based on feature values = np.unique(X[:, feature_index]) weighted_entropy = 0
for value in values: subset_y = y[X[:, feature_index] == value] weighted_entropy += (len(subset_y) / len(y)) * calculate_entropy(subset_y)
# Information Gain return original_entropy - weighted_entropy
# Sample dataset (Feature: Weather, Target: Play Tennis)X = np.array([ [0], # Sunny [0], # Sunny [1], # Overcast [2], # Rain [2], # Rain [2], # Rain [1], # Overcast [0], # Sunny [0], # Sunny [2], # Rain [0], # Sunny [1], # Overcast [1], # Overcast [2] # Rain])
y = np.array([0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0]) # 1 = Play, 0 = No Play
# Compute Information Gain for "Weather" featureig = information_gain(X, y, 0)print(f"Information Gain for Weather: {ig:.4f}")
- Entropy Calculation:
- Measures uncertainty in
y
before and after splitting on “Weather”.
- Measures uncertainty in
- Information Gain:
- Computes the reduction in entropy after the split.
- A higher value means the feature is more important.