2.1 CNN Basics
1. Convolutional Layers vs Fully-connected Layers
Convolutional layers and fully-connected layers are both fundamental components of neural networks, especially in deep learning. They differ significantly in structure, function, and use cases.
1.1 π Comparison Table
Aspect | Convolutional Layers | Fully-Connected Layers |
---|---|---|
Structure | Use filters (kernels) sliding across input | Every neuron connected to all neurons in previous layer |
Parameter Sharing | Yes β same weights (filter) reused across spatial locations | No β each connection has a unique weight |
Sparsity of Connections | Sparse β each neuron connects to a local region (receptive field) | Dense β all neurons are fully connected |
Input Type | Typically used with grid-like data (e.g. images) | Accepts 1D vectors |
Spatial Information | Preserves spatial structure (e.g. image locality) | Discards spatial layout after flattening |
Parameters | Fewer β due to weight sharing | More β every connection has its own weight |
Computational Cost | Lower (per parameter) | Higher (scales with input and layer size) |
Translation Invariance | Yes β detects features regardless of position | No β position sensitivity is lost |
Typical Use | Feature extraction in CNNs | Final classification layers |
1.2 π§ Key Insights
-
Convolutional Layers: Ideal for detecting spatial features like edges or textures in images. Filters help generalize better across positions with fewer parameters.
-
Fully-Connected Layers: Effective for decision making once features have been extracted. Commonly used near the output of deep networks (e.g., classifiers in CNNs).
1.3 π§ Example in a CNN (e.g., image classification):
- Conv Layers: Extract patterns (e.g., edges, shapes, object parts).
- Pooling Layers: Reduce size while keeping important features.
- Fully-Connected Layer(s): Combine high-level features to make predictions (e.g., dog vs. cat).
2. Role of pooling layers in CNN
Pooling layers are a key component of Convolutional Neural Networks (CNNs) that help in downsampling feature maps while preserving important information. They reduce the spatial dimensions (width and height) of the input, helping control overfitting, speed up computation, and extract dominant features.
2.1 π― Main Functions of Pooling Layers
Function | Description |
---|---|
Dimensionality Reduction | Shrinks feature maps, lowering computational cost and memory usage. |
Translation Invariance | Makes detection of features more robust to small shifts or distortions in input. |
Noise Suppression | Emphasizes dominant features and reduces the influence of less relevant details. |
Control Overfitting | By reducing parameters and complexity, pooling helps prevent the model from memorizing noise. (0 parameter in pooling layer) |
Expanding the receptive field | Pooling increases the effective receptive field of neurons in subsequent layers. |
2.2 π§ͺ Types of Pooling
Pooling Type | Description | Use Case |
---|---|---|
Max Pooling | Takes the maximum value in each region | Most common; captures strongest activation |
Average Pooling | Takes the average value in each region | Smooths features; sometimes used in older architectures |
Global Average Pooling | Averages the entire spatial map into a single value per feature map | Used just before classification (e.g., in modern CNNs like GoogLeNet) |
2.3 π Example: Max Pooling
Input Feature Map (4Γ4):
[[1, 3, 2, 4], [5, 6, 1, 2], [1, 2, 0, 1], [3, 4, 2, 0]]
After applying 2Γ2 Max Pooling with stride 2:
[[6, 4], [4, 2]]
2.4 π Where Pooling Layers Appear
Pooling layers typically follow one or more convolutional layers. In modern CNNs, theyβre used less frequently as strided convolutions or attention mechanisms sometimes replace them.
3. Receptive Field of Neurons
3.1 π― Concept: Receptive Field of Neurons
In a neural network, especially CNNs, the receptive field of a neuron refers to the region of the input that influences the neuronβs activation.
- In the first convolutional layer, itβs just the size of the filter (e.g.,
). - In deeper layers, a neuronβs receptive field grows as it depends on multiple earlier neurons, each of which sees a portion of the input.
3.2 π§ Effective Receptive Field
The effective receptive field is the total area in the input image that affects a specific output neuron after multiple layers.
3.3 Example
π Given:
- Three 3Γ3 convolutional layers
- Stride = 1
- No padding (assumed unless stated otherwise)
π Calculation of Effective Receptive Field
For layers with kernel size
Effective receptive field after
So, the effective receptive field is 7Γ7.
- The number of parameters of three 3x3 conv layers is:
- Each conv filer: 3 x 3 x C + 1 = 9C + 1 (assuming input channel is C)
- Each layer: (9C+1) x C (assuming output channel is c as well)
- Three layers: 3 x (
) =
- The number of parameter os 1 7x7 conv layers is:
- Each filer: 7 x 7 x C + 1 = 49C + 1
- Each layer: (49C+1) x C =
which means the stacking of smaller conv filers has fewer parameters!
β Summary
Item | Value |
---|---|
Effective Receptive Field | |
Number of Parameters |
4. Issues in CNN Model Training
4.1 π Problem 01: Possible Causes and Solutions
1. Model is Too Simple
-
Cause: Not enough layers, filters, or complexity to learn the patterns in your data.
-
Solution:
- Add more convolutional layers.
- Increase the number of filters per layer.
- Use deeper architectures (e.g., ResNet, VGG).
2. Insufficient Training Time
-
Cause: The model hasnβt trained for enough epochs.
-
Solution:
- Increase the number of epochs.
- Monitor the training/validation loss curves.
3. Learning Rate is Too High or Too Low
-
Cause: Poor optimization due to a bad learning rate.
-
Solution:
- Try a smaller learning rate (e.g., 1e-4 or 1e-5).
- Use learning rate scheduling or adaptive optimizers like Adam.
4. Input Data Issues
-
Cause: Bad quality data, unnormalized inputs, or incorrect labels.
-
Solution:
- Normalize/standardize input images.
- Check dataset for label errors or imbalances.
- Use data augmentation (e.g., flipping, cropping, color jittering).
5. Inappropriate Loss Function or Evaluation Metric
-
Cause: Loss function not suitable for the task.
-
Solution:
- Use cross-entropy loss for classification.
- Make sure accuracy is being computed correctly (e.g., after applying softmax or argmax).
6. Over-regularization
-
Cause: Too much dropout, weight decay, or early stopping.
-
Solution:
- Reduce dropout rate or regularization strength.
- Allow more training before early stopping.
π§ In Practice: Debugging Strategy
- Overfit a small batch: Train on a small number of samples and check if the model can overfit. If not, thereβs a bug or model design flaw.
- Visualize activations and filters: Check if the CNN is learning any meaningful features.
- Try a pretrained model: Fine-tune a known architecture like ResNet on your data as a sanity check.
4.2 Adjusting the loss function or optimizer
β adjusting the loss function or optimizer can help reduce underfitting, but only in certain situations.
π 1. Adjusting the Loss Function
Case | Explanation | Impact on Underfitting |
---|---|---|
β Wrong loss function | Using mean squared error (MSE) for classification instead of cross-entropy | May cause poor learning; switching to correct loss helps |
β Class imbalance | Loss doesnβt reflect imbalance (e.g., using vanilla cross-entropy) | Use weighted loss (e.g., focal loss or weighted cross-entropy) to help model focus on hard examples |
β οΈ Correct loss, but poor performance | Adjusting wonβt help much unless the loss is fundamentally mismatched | Limited effect on underfitting |
π§ 2. Adjusting the Optimizer
Optimizer | Behavior | Impact on Underfitting |
---|---|---|
SGD | May be too slow or get stuck | Switching to Adam, RMSProp may speed up learning |
Adam / RMSProp | Adaptive learning rates | Can help escape flat regions and converge faster |
Learning Rate | Too high β skips minima, too low β learns too slowly | Tuning this is critical for fixing underfitting |
β So yes, changing the optimizer or tuning its hyperparameters (especially the learning rate) can significantly help if your model isnβt learning well.
π§ Summary
Action | Helps Underfitting? | When to Try |
---|---|---|
βοΈ Use correct loss function | β Yes | If youβre using the wrong one (e.g., MSE instead of cross-entropy) |
βοΈ Tune optimizer | β Yes | If training is very slow or loss isnβt decreasing |
βοΈ Change learning rate | β Yes | If gradients arenβt flowing effectively |
4.3 π Problem 02: Causes and Solutions for Overfitting
1. Not Enough Training Data
-
Cause: Model memorizes the limited training examples.
-
Solution:
- Collect more data if possible.
- Use data augmentation (e.g., random crop, flip, rotate, color jitter).
- Try synthetic data generation if feasible.
2. Lack of Regularization
-
Cause: Model learns noise or irrelevant details from the training set.
-
Solution:
- Apply dropout (e.g., 0.3β0.5 between layers).
- Use L2 regularization (weight decay).
- Use early stopping based on validation loss.
3. Model is Too Complex
-
Cause: Too many parameters relative to the amount of data.
-
Solution:
- Reduce number of layers or filters.
- Try a simpler architecture.
- Apply model pruning or reduce width/depth.
4. Training Too Long
-
Cause: Model starts to memorize training data after a point.
-
Solution:
- Use early stopping on validation accuracy/loss.
- Track the gap between training and validation curves.
5. Train-Validation Mismatch
-
Cause: Different distributions (data leakage, preprocessing issues).
-
Solution:
- Ensure consistent preprocessing across train and test sets.
- Check for data leakage (e.g., same subjects in train/test).
6. Advanced Techniques (optional)
- Transfer learning: Use pretrained models and fine-tune.
- Ensembling: Combine predictions from multiple models.
- Label smoothing: Reduce confidence on predictions to improve generalization.
π§ Debugging Checklist
β Does validation loss increase while training loss decreases? β Are you using augmentation during training? β Are preprocessing steps the same across training and test sets?