1.3 MLP
1. MLP (Multi-Layer Perceptron)
An MLP (Multilayer Perceptron) is a type of artificial neural network composed of layers of neurons. Itโs one of the simplest and most foundational neural network architectures.
1.1 ๐งฑ Structure of an MLP
At a high level, an MLP has:
- Input layer โ takes the input features.
- One or more hidden layers โ where computation happens using weights and activation functions.
- Output layer โ produces the final prediction (regression value or classification label).
Each layer is fully connected (i.e., each neuron in one layer connects to every neuron in the next). Hence MLPs are often called fully connected networks or dense networks.
1.2 ๐งฎ How does it work?
Each neuron in a layer performs:
Where:
: input vector : weight vector : bias : activation function (e.g., ReLU, sigmoid)
1.3 ๐ Example Flow
Input โ [Dense Layer + Activation] โ [Dense Layer + Activation] โ Output
E.g., for a 3-layer MLP:
x (input)โLayer 1: W1ยทx + b1 โ ReLUโLayer 2: W2ยทh1 + b2 โ ReLUโOutput Layer: W3ยทh2 + b3 โ Output
1.4 ๐ง What can MLPs do?
MLPs can approximate any continuous function (thanks to the Universal Approximation Theorem), and are used for:
- Regression
- Classification
- Function approximation
- Time-series prediction (when used with context)
1.5 ๐ง MLP Diagram
Input Layer Hidden Layer(s) Output Layer [xโ] โโโฌโโโโโโโ o o [ลท] [xโ] โโโผโโโโโโบ|โโบ o ... o โโโโโโโโบ [xโ] โโโ โโโ o o (e.g. ReLU activation)
Each circle is a neuron. Each layer is fully connected to the next. Hidden layers apply a nonlinear function like ReLU.
1.6 ๐ง Code Example
import torchimport torch.nn as nnimport torch.nn.functional as F
class MLP(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(MLP, self).__init__() self.fc1 = nn.Linear(input_size, hidden_size) # input โ hidden self.fc2 = nn.Linear(hidden_size, output_size) # hidden โ output
def forward(self, x): x = F.relu(self.fc1(x)) # activation after first layer x = self.fc2(x) # no activation if doing regression return x
# Example usagemodel = MLP(input_size=3, hidden_size=5, output_size=1) # 3 inputs โ 5 hidden โ 1 outputinput_data = torch.tensor([[0.1, 0.2, 0.3]])output = model(input_data.float())
print(output)
You can tweak:
output_size = 1
for regressionoutput_size = 2 or more
with softmax for classification
2. MLP for Regression vs Classification
Great question! Adapting Multilayer Perceptrons (MLPs) for regression vs classification tasks mainly involves changes in:
- Output layer architecture
- Activation functions
- Loss functions
2.1 ๐ Shared parts
Regardless of the task, MLPs usually have:
- Input layer (based on feature size)
- One or more hidden layers
- Non-linear activations (e.g., ReLU, tanh) in hidden layers
2.2 ๐ต For Regression Tasks
1. Output layer:
- Usually 1 neuron (or more if multi-output regression).
- No activation function (i.e., linear output):
2. Loss function:
- Mean Squared Error (MSE) or Mean Absolute Error (MAE):
3. Interpretation:
- Output is a continuous value, modeling things like temperature, price, etc.
2.3 ๐ด For Classification Tasks
1. Binary Classification:
Output layer
- 1 neuron
- Sigmoid activation to squash into [0, 1]:
Loss function
- Binary Cross-Entropy:
2. Multi-class Classification:
Output layer
- One neuron per class (i.e., size = number of classes)
- Softmax activation to get probabilities that sum to 1:
Loss function
- Categorical Cross-Entropy:
2.4 ๐ง Summary
Task | Output Neurons | Output Activation | Loss Function |
---|---|---|---|
Regression | 1 (or more) | None (Linear) | MSE / MAE |
Binary Class. | 1 | Sigmoid | Binary Cross-Entropy |
Multi-class | # of classes | Softmax | Categorical Cross-Entropy |