3.3 Information Gain

1. Definitions

1.1 Entropy

Entropy is a measure of impurity or uncertainty in a dataset. It comes from information theory and is calculated as:

where is the probability of class .

High entropy → Data is more mixed (uncertain).
Low entropy → Data is more pure (certain).

Example:

Dataset 1: 50% cats, 50% dogs → High entropy (uncertain).
Dataset 2: 100% cats → Low entropy (pure).

1.2 Information Gain (IG)

Information gain measures how much entropy decreases after splitting on a feature:

where:

= entropy before the split
= subsets created by the split
= entropy of each subset

A higher information gain means the feature reduces uncertainty more, making it a better choice.

2. Toy Example

Python example demonstrating entropy and information gain using sklearn.

import numpy as np
from sklearn.tree import DecisionTreeClassifier
from scipy.stats import entropy

# Function to compute entropy
def calculate_entropy(y):
    class_counts = np.bincount(y)
    probabilities = class_counts / np.sum(class_counts)
    return entropy(probabilities, base=2)

# Function to compute information gain
def information_gain(X, y, feature_index):
    # Original entropy
    original_entropy = calculate_entropy(y)

    # Split dataset based on feature
    values = np.unique(X[:, feature_index])
    weighted_entropy = 0

    for value in values:
        subset_y = y[X[:, feature_index] == value]
        weighted_entropy += (len(subset_y) / len(y)) * calculate_entropy(subset_y)

    # Information Gain
    return original_entropy - weighted_entropy

# Sample dataset (Feature: Weather, Target: Play Tennis)
X = np.array([
    [0],  # Sunny
    [0],  # Sunny
    [1],  # Overcast
    [2],  # Rain
    [2],  # Rain
    [2],  # Rain
    [1],  # Overcast
    [0],  # Sunny
    [0],  # Sunny
    [2],  # Rain
    [0],  # Sunny
    [1],  # Overcast
    [1],  # Overcast
    [2]   # Rain
])

y = np.array([0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])  # 1 = Play, 0 = No Play

# Compute Information Gain for "Weather" feature
ig = information_gain(X, y, 0)
print(f"Information Gain for Weather: {ig:.4f}")

Entropy Calculation:
- Measures uncertainty in y before and after splitting on “Weather”.
Information Gain:
- Computes the reduction in entropy after the split.
- A higher value means the feature is more important.