1.2 Model Mechanics
1. Calculation of Gradient
Gradient calculation of a simple linear regression problem to make it easy to understand.
Problem Setup
Suppose we have the following data points for a linear regression problem:
1 | 2 |
2 | 4 |
3 | 6 |
We want to fit a linear model:
where:
is the intercept. is the slope.
Our goal is to find the values of
Here,
Step 1: Initialize Parameters
Let’s start with initial guesses for the parameters:
Step 2: Compute the Loss
Using the initial parameters, compute the predicted values and the loss.
Predicted Values:
For
For
For
Loss (MSE):
Step 3: Compute the Gradients
The gradients of the loss function with respect to
Compute :
Compute :
Step 4: Update the Parameters
Let’s use a learning rate
Update :
Update :
Step 5: Repeat the Process
Now, we repeat the process with the updated parameters
-
Compute the new predictions:
-
Compute the new loss:
-
Compute the new gradients and update the parameters again.
Key Insight
- The gradient
tells us that increasing will reduce the loss. - The gradient
tells us that increasing will reduce the loss. - By moving in the opposite direction of the gradient (i.e., updating
and positively), we reduce the loss.
2. Convexity of the Cost Function
In the context of machine learning, the convexity of a function refers to whether the function has a unique minimum, and if this minimum can be found efficiently. A convex function has a single global minimum, and gradient-based optimization methods (like Gradient Descent) will converge to this minimum.
Linear regression typically refers to the process of fitting a line (or hyperplane in higher dimensions) to the data, and it is formulated as an optimization problem. To understand whether linear regression is convex, let’s look at the cost function and how the optimization process works.
2.1 Linear Regression Objective Function (Cost Function)
In linear regression, we aim to find the optimal parameters
where:
is the number of training samples, is the predicted value for the -th data point, is the true value for the -th data point.
2.2 Convexity of the Cost Functio
The mean squared error (MSE) is a quadratic function in terms of
- The cost function
is convex because it is a sum of squared terms that are quadratic with respect to and . - Since it is a quadratic function, it has a unique global minimum and no local minima, making the optimization problem convex.
- This convexity ensures that gradient descent or any other convex optimization method will converge to the global minimum, provided the learning rate is chosen correctly.
2.3 Why is the Linear Regression Cost Function Convex
To demonstrate the convexity, let’s consider the second derivative (Hessian matrix) of the cost function with respect to the parameters
In more detail:
- The first derivative (gradient) of the cost function is:
- The second derivative (Hessian) with respect to
is constant and positive semi-definite: This matrix is positive semi-definite, meaning it doesn’t have any negative eigenvalues, which confirms the convexity of the function.
2.4 Graphical Intuition
- If you plot the cost function
as a function of and , you’ll see that it forms a convex bowl shape. - The gradient descent algorithm will converge to the lowest point of the bowl, which represents the global minimum of the cost function.
3. Closed Form Solution?
In linear regression, we want to find the values of the parameters
3.1 Linear Regression Model
The linear regression model is given by:
Where:
is the input feature vector, is the vector of weights, is the bias term, is the predicted value.
3.2 Cost Function (Mean Squared Error)
The cost function
where:
is the number of training samples, is the predicted value for the -th training sample, is the true value.
3.3 Objective
We aim to minimize the cost function
3.4 Closed-Form Solution Derivation
The closed-form solution involves finding the optimal parameters
Step 1: Derivative with respect to
The gradient of the cost function with respect to
Where
Setting the gradient equal to zero to find the optimal
Step 2: Derivative with respect to
Similarly, the gradient of the cost function with respect to
Setting this equal to zero to find the optimal
Step 3: Solve for and
By solving these equations simultaneously, we can obtain the closed-form solution. However, it’s more efficient to express the solution in matrix form, especially for multiple features.
The problem in matrix notation:
is the matrix of input features, where each row corresponds to one data point. is the vector of true labels. is the vector of weights.
The model equation becomes:
where
The cost function in matrix form is:
By minimizing this cost function, we get the closed-form solution for the weights
3.5 Closed-Form Solution in Simple Terms
For a dataset
Where:
is the matrix of input features (with dimensions , where is the number of data points and is the number of features), is the vector of true labels (with dimension ), is the vector of optimal weights (with dimension ).
3.6 Why Is It Called the Closed-Form Solution?
This is called the closed-form solution because it provides an explicit formula for computing the optimal parameters