Skip to content

2.1 CNN Basics

1. Convolutional Layers vs Fully-connected Layers

Convolutional layers and fully-connected layers are both fundamental components of neural networks, especially in deep learning. They differ significantly in structure, function, and use cases.

1.1 πŸ” Comparison Table

AspectConvolutional LayersFully-Connected Layers
StructureUse filters (kernels) sliding across inputEvery neuron connected to all neurons in previous layer
Parameter SharingYes β€” same weights (filter) reused across spatial locationsNo β€” each connection has a unique weight
Sparsity of ConnectionsSparse β€” each neuron connects to a local region (receptive field)Dense β€” all neurons are fully connected
Input TypeTypically used with grid-like data (e.g. images)Accepts 1D vectors
Spatial InformationPreserves spatial structure (e.g. image locality)Discards spatial layout after flattening
ParametersFewer β€” due to weight sharingMore β€” every connection has its own weight
Computational CostLower (per parameter)Higher (scales with input and layer size)
Translation InvarianceYes β€” detects features regardless of positionNo β€” position sensitivity is lost
Typical UseFeature extraction in CNNsFinal classification layers

1.2 🧠 Key Insights

  • Convolutional Layers: Ideal for detecting spatial features like edges or textures in images. Filters help generalize better across positions with fewer parameters.

  • Fully-Connected Layers: Effective for decision making once features have been extracted. Commonly used near the output of deep networks (e.g., classifiers in CNNs).

1.3 🧠 Example in a CNN (e.g., image classification):

  1. Conv Layers: Extract patterns (e.g., edges, shapes, object parts).
  2. Pooling Layers: Reduce size while keeping important features.
  3. Fully-Connected Layer(s): Combine high-level features to make predictions (e.g., dog vs. cat).

2. Role of pooling layers in CNN

Pooling layers are a key component of Convolutional Neural Networks (CNNs) that help in downsampling feature maps while preserving important information. They reduce the spatial dimensions (width and height) of the input, helping control overfitting, speed up computation, and extract dominant features.

2.1 🎯 Main Functions of Pooling Layers

FunctionDescription
Dimensionality ReductionShrinks feature maps, lowering computational cost and memory usage.
Translation InvarianceMakes detection of features more robust to small shifts or distortions in input.
Noise SuppressionEmphasizes dominant features and reduces the influence of less relevant details.
Control OverfittingBy reducing parameters and complexity, pooling helps prevent the model from memorizing noise. (0 parameter in pooling layer)
Expanding the receptive fieldPooling increases the effective receptive field of neurons in subsequent layers.

2.2 πŸ§ͺ Types of Pooling

Pooling TypeDescriptionUse Case
Max PoolingTakes the maximum value in each regionMost common; captures strongest activation
Average PoolingTakes the average value in each regionSmooths features; sometimes used in older architectures
Global Average PoolingAverages the entire spatial map into a single value per feature mapUsed just before classification (e.g., in modern CNNs like GoogLeNet)

2.3 πŸ“ Example: Max Pooling

Input Feature Map (4Γ—4):

[[1, 3, 2, 4],
[5, 6, 1, 2],
[1, 2, 0, 1],
[3, 4, 2, 0]]

After applying 2Γ—2 Max Pooling with stride 2:

[[6, 4],
[4, 2]]

2.4 πŸ”š Where Pooling Layers Appear

Pooling layers typically follow one or more convolutional layers. In modern CNNs, they’re used less frequently as strided convolutions or attention mechanisms sometimes replace them.

3. Receptive Field of Neurons

3.1 🎯 Concept: Receptive Field of Neurons

In a neural network, especially CNNs, the receptive field of a neuron refers to the region of the input that influences the neuron’s activation.

  • In the first convolutional layer, it’s just the size of the filter (e.g., ).
  • In deeper layers, a neuron’s receptive field grows as it depends on multiple earlier neurons, each of which sees a portion of the input.

3.2 🧠 Effective Receptive Field

The effective receptive field is the total area in the input image that affects a specific output neuron after multiple layers.

3.3 Example

πŸ“ Given:

  • Three 3Γ—3 convolutional layers
  • Stride = 1
  • No padding (assumed unless stated otherwise)

πŸ”Ž Calculation of Effective Receptive Field

For layers with kernel size , stride :

Effective receptive field after layers:

So, the effective receptive field is 7Γ—7.

Conv
  • The number of parameters of three 3x3 conv layers is:
    • Each conv filer: 3 x 3 x C + 1 = 9C + 1 (assuming input channel is C)
    • Each layer: (9C+1) x C (assuming output channel is c as well)
    • Three layers: 3 x () =
  • The number of parameter os 1 7x7 conv layers is:
    • Each filer: 7 x 7 x C + 1 = 49C + 1
    • Each layer: (49C+1) x C = which means the stacking of smaller conv filers has fewer parameters!

βœ… Summary

ItemValue
Effective Receptive Field
Number of Parameters (assuming same in/out channels)

4. Issues in CNN Model Training

4.1 πŸ” Problem 01: Possible Causes and Solutions

1. Model is Too Simple

  • Cause: Not enough layers, filters, or complexity to learn the patterns in your data.

  • Solution:

    • Add more convolutional layers.
    • Increase the number of filters per layer.
    • Use deeper architectures (e.g., ResNet, VGG).

2. Insufficient Training Time

  • Cause: The model hasn’t trained for enough epochs.

  • Solution:

    • Increase the number of epochs.
    • Monitor the training/validation loss curves.

3. Learning Rate is Too High or Too Low

  • Cause: Poor optimization due to a bad learning rate.

  • Solution:

    • Try a smaller learning rate (e.g., 1e-4 or 1e-5).
    • Use learning rate scheduling or adaptive optimizers like Adam.

4. Input Data Issues

  • Cause: Bad quality data, unnormalized inputs, or incorrect labels.

  • Solution:

    • Normalize/standardize input images.
    • Check dataset for label errors or imbalances.
    • Use data augmentation (e.g., flipping, cropping, color jittering).

5. Inappropriate Loss Function or Evaluation Metric

  • Cause: Loss function not suitable for the task.

  • Solution:

    • Use cross-entropy loss for classification.
    • Make sure accuracy is being computed correctly (e.g., after applying softmax or argmax).

6. Over-regularization

  • Cause: Too much dropout, weight decay, or early stopping.

  • Solution:

    • Reduce dropout rate or regularization strength.
    • Allow more training before early stopping.

🧠 In Practice: Debugging Strategy

  1. Overfit a small batch: Train on a small number of samples and check if the model can overfit. If not, there’s a bug or model design flaw.
  2. Visualize activations and filters: Check if the CNN is learning any meaningful features.
  3. Try a pretrained model: Fine-tune a known architecture like ResNet on your data as a sanity check.

4.2 Adjusting the loss function or optimizer

βœ… adjusting the loss function or optimizer can help reduce underfitting, but only in certain situations.

πŸ” 1. Adjusting the Loss Function

CaseExplanationImpact on Underfitting
βœ… Wrong loss functionUsing mean squared error (MSE) for classification instead of cross-entropyMay cause poor learning; switching to correct loss helps
βœ… Class imbalanceLoss doesn’t reflect imbalance (e.g., using vanilla cross-entropy)Use weighted loss (e.g., focal loss or weighted cross-entropy) to help model focus on hard examples
⚠️ Correct loss, but poor performanceAdjusting won’t help much unless the loss is fundamentally mismatchedLimited effect on underfitting

πŸ”§ 2. Adjusting the Optimizer

OptimizerBehaviorImpact on Underfitting
SGDMay be too slow or get stuckSwitching to Adam, RMSProp may speed up learning
Adam / RMSPropAdaptive learning ratesCan help escape flat regions and converge faster
Learning RateToo high β†’ skips minima, too low β†’ learns too slowlyTuning this is critical for fixing underfitting

βœ… So yes, changing the optimizer or tuning its hyperparameters (especially the learning rate) can significantly help if your model isn’t learning well.

🧠 Summary

ActionHelps Underfitting?When to Try
βœ”οΈ Use correct loss functionβœ… YesIf you’re using the wrong one (e.g., MSE instead of cross-entropy)
βœ”οΈ Tune optimizerβœ… YesIf training is very slow or loss isn’t decreasing
βœ”οΈ Change learning rateβœ… YesIf gradients aren’t flowing effectively

4.3 πŸ” Problem 02: Causes and Solutions for Overfitting

1. Not Enough Training Data

  • Cause: Model memorizes the limited training examples.

  • Solution:

    • Collect more data if possible.
    • Use data augmentation (e.g., random crop, flip, rotate, color jitter).
    • Try synthetic data generation if feasible.

2. Lack of Regularization

  • Cause: Model learns noise or irrelevant details from the training set.

  • Solution:

    • Apply dropout (e.g., 0.3–0.5 between layers).
    • Use L2 regularization (weight decay).
    • Use early stopping based on validation loss.

3. Model is Too Complex

  • Cause: Too many parameters relative to the amount of data.

  • Solution:

    • Reduce number of layers or filters.
    • Try a simpler architecture.
    • Apply model pruning or reduce width/depth.

4. Training Too Long

  • Cause: Model starts to memorize training data after a point.

  • Solution:

    • Use early stopping on validation accuracy/loss.
    • Track the gap between training and validation curves.

5. Train-Validation Mismatch

  • Cause: Different distributions (data leakage, preprocessing issues).

  • Solution:

    • Ensure consistent preprocessing across train and test sets.
    • Check for data leakage (e.g., same subjects in train/test).

6. Advanced Techniques (optional)

  • Transfer learning: Use pretrained models and fine-tune.
  • Ensembling: Combine predictions from multiple models.
  • Label smoothing: Reduce confidence on predictions to improve generalization.

🧠 Debugging Checklist

βœ… Does validation loss increase while training loss decreases? βœ… Are you using augmentation during training? βœ… Are preprocessing steps the same across training and test sets?