Skip to content

3.3 Information Gain

1. Definitions

1.1 Entropy

Entropy is a measure of impurity or uncertainty in a dataset. It comes from information theory and is calculated as:

where is the probability of class .

  • High entropy → Data is more mixed (uncertain).
  • Low entropy → Data is more pure (certain).

Example:

  • Dataset 1: 50% cats, 50% dogs → High entropy (uncertain).
  • Dataset 2: 100% cats → Low entropy (pure).

1.2 Information Gain (IG)

Information gain measures how much entropy decreases after splitting on a feature:

where:

  • = entropy before the split
  • = subsets created by the split
  • = entropy of each subset

A higher information gain means the feature reduces uncertainty more, making it a better choice.

2. Toy Example

Python example demonstrating entropy and information gain using sklearn.

import numpy as np
from sklearn.tree import DecisionTreeClassifier
from scipy.stats import entropy
# Function to compute entropy
def calculate_entropy(y):
class_counts = np.bincount(y)
probabilities = class_counts / np.sum(class_counts)
return entropy(probabilities, base=2)
# Function to compute information gain
def information_gain(X, y, feature_index):
# Original entropy
original_entropy = calculate_entropy(y)
# Split dataset based on feature
values = np.unique(X[:, feature_index])
weighted_entropy = 0
for value in values:
subset_y = y[X[:, feature_index] == value]
weighted_entropy += (len(subset_y) / len(y)) * calculate_entropy(subset_y)
# Information Gain
return original_entropy - weighted_entropy
# Sample dataset (Feature: Weather, Target: Play Tennis)
X = np.array([
[0], # Sunny
[0], # Sunny
[1], # Overcast
[2], # Rain
[2], # Rain
[2], # Rain
[1], # Overcast
[0], # Sunny
[0], # Sunny
[2], # Rain
[0], # Sunny
[1], # Overcast
[1], # Overcast
[2] # Rain
])
y = np.array([0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0]) # 1 = Play, 0 = No Play
# Compute Information Gain for "Weather" feature
ig = information_gain(X, y, 0)
print(f"Information Gain for Weather: {ig:.4f}")
  1. Entropy Calculation:
    • Measures uncertainty in y before and after splitting on “Weather”.
  2. Information Gain:
    • Computes the reduction in entropy after the split.
    • A higher value means the feature is more important.