Skip to content

1.3 MLP

1. MLP (Multi-Layer Perceptron)

An MLP (Multilayer Perceptron) is a type of artificial neural network composed of layers of neurons. Itโ€™s one of the simplest and most foundational neural network architectures.

1.1 ๐Ÿงฑ Structure of an MLP

At a high level, an MLP has:

  1. Input layer โ€“ takes the input features.
  2. One or more hidden layers โ€“ where computation happens using weights and activation functions.
  3. Output layer โ€“ produces the final prediction (regression value or classification label).

Each layer is fully connected (i.e., each neuron in one layer connects to every neuron in the next). Hence MLPs are often called fully connected networks or dense networks.

1.2 ๐Ÿงฎ How does it work?

Each neuron in a layer performs:

Where:

  • : input vector
  • : weight vector
  • : bias
  • : activation function (e.g., ReLU, sigmoid)

1.3 ๐Ÿ”„ Example Flow

Input โ†’ [Dense Layer + Activation] โ†’ [Dense Layer + Activation] โ†’ Output

E.g., for a 3-layer MLP:

x (input)
โ†“
Layer 1: W1ยทx + b1 โ†’ ReLU
โ†“
Layer 2: W2ยทh1 + b2 โ†’ ReLU
โ†“
Output Layer: W3ยทh2 + b3 โ†’ Output

1.4 ๐Ÿง  What can MLPs do?

MLPs can approximate any continuous function (thanks to the Universal Approximation Theorem), and are used for:

  • Regression
  • Classification
  • Function approximation
  • Time-series prediction (when used with context)

1.5 ๐Ÿง  MLP Diagram

Input Layer Hidden Layer(s) Output Layer
[xโ‚] โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ” o o [ลท]
[xโ‚‚] โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ–บ|โ”€โ–บ o ... o โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ
[xโ‚ƒ] โ”€โ”€โ”˜ โ””โ”€โ”€ o o
(e.g. ReLU activation)

Each circle is a neuron. Each layer is fully connected to the next. Hidden layers apply a nonlinear function like ReLU.

1.6 ๐Ÿ”ง Code Example

import torch
import torch.nn as nn
import torch.nn.functional as F
class MLP(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(MLP, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size) # input โ†’ hidden
self.fc2 = nn.Linear(hidden_size, output_size) # hidden โ†’ output
def forward(self, x):
x = F.relu(self.fc1(x)) # activation after first layer
x = self.fc2(x) # no activation if doing regression
return x
# Example usage
model = MLP(input_size=3, hidden_size=5, output_size=1) # 3 inputs โ†’ 5 hidden โ†’ 1 output
input_data = torch.tensor([[0.1, 0.2, 0.3]])
output = model(input_data.float())
print(output)

You can tweak:

  • output_size = 1 for regression
  • output_size = 2 or more with softmax for classification

2. MLP for Regression vs Classification

Great question! Adapting Multilayer Perceptrons (MLPs) for regression vs classification tasks mainly involves changes in:

  1. Output layer architecture
  2. Activation functions
  3. Loss functions

2.1 ๐Ÿ” Shared parts

Regardless of the task, MLPs usually have:

  • Input layer (based on feature size)
  • One or more hidden layers
  • Non-linear activations (e.g., ReLU, tanh) in hidden layers

2.2 ๐Ÿ”ต For Regression Tasks

1. Output layer:

  • Usually 1 neuron (or more if multi-output regression).
  • No activation function (i.e., linear output):

2. Loss function:

  • Mean Squared Error (MSE) or Mean Absolute Error (MAE):

3. Interpretation:

  • Output is a continuous value, modeling things like temperature, price, etc.

2.3 ๐Ÿ”ด For Classification Tasks

1. Binary Classification:

Output layer
  • 1 neuron
  • Sigmoid activation to squash into [0, 1]:
Loss function
  • Binary Cross-Entropy:

2. Multi-class Classification:

Output layer
  • One neuron per class (i.e., size = number of classes)
  • Softmax activation to get probabilities that sum to 1:
Loss function
  • Categorical Cross-Entropy:

2.4 ๐Ÿง  Summary

TaskOutput NeuronsOutput ActivationLoss Function
Regression1 (or more)None (Linear)MSE / MAE
Binary Class.1SigmoidBinary Cross-Entropy
Multi-class# of classesSoftmaxCategorical Cross-Entropy