Skip to content

1.2 Model Mechanics

1. Calculation of Gradient

Gradient calculation of a simple linear regression problem to make it easy to understand.

Problem Setup

Suppose we have the following data points for a linear regression problem:

(Feature) (Target)
12
24
36

We want to fit a linear model:

where:

  • is the intercept.
  • is the slope.

Our goal is to find the values of and that minimize the Mean Squared Error (MSE) loss function:

Here, (number of data points).

Step 1: Initialize Parameters

Let’s start with initial guesses for the parameters:

Step 2: Compute the Loss

Using the initial parameters, compute the predicted values and the loss.

Predicted Values:

For :

For :

For :

Loss (MSE):

Step 3: Compute the Gradients

The gradients of the loss function with respect to and are:

Compute :
Compute :

Step 4: Update the Parameters

Let’s use a learning rate . The update rules are:

Update :
Update :

Step 5: Repeat the Process

Now, we repeat the process with the updated parameters and .

  1. Compute the new predictions:

  2. Compute the new loss:

  3. Compute the new gradients and update the parameters again.

Key Insight

  • The gradient tells us that increasing will reduce the loss.
  • The gradient tells us that increasing will reduce the loss.
  • By moving in the opposite direction of the gradient (i.e., updating and positively), we reduce the loss.

2. Convexity of the Cost Function

In the context of machine learning, the convexity of a function refers to whether the function has a unique minimum, and if this minimum can be found efficiently. A convex function has a single global minimum, and gradient-based optimization methods (like Gradient Descent) will converge to this minimum.

Linear regression typically refers to the process of fitting a line (or hyperplane in higher dimensions) to the data, and it is formulated as an optimization problem. To understand whether linear regression is convex, let’s look at the cost function and how the optimization process works.

2.1 Linear Regression Objective Function (Cost Function)

In linear regression, we aim to find the optimal parameters (weights) and (bias) that minimize the mean squared error (MSE) between the predicted values and the actual labels. The objective function to minimize is:

where:

  • is the number of training samples,
  • is the predicted value for the -th data point,
  • is the true value for the -th data point.

2.2 Convexity of the Cost Functio

The mean squared error (MSE) is a quadratic function in terms of and . Quadratic functions are always convex because they have a parabolic shape, which means they have a unique global minimum. Specifically:

  • The cost function is convex because it is a sum of squared terms that are quadratic with respect to and .
  • Since it is a quadratic function, it has a unique global minimum and no local minima, making the optimization problem convex.
  • This convexity ensures that gradient descent or any other convex optimization method will converge to the global minimum, provided the learning rate is chosen correctly.

2.3 Why is the Linear Regression Cost Function Convex

To demonstrate the convexity, let’s consider the second derivative (Hessian matrix) of the cost function with respect to the parameters and . The cost function is quadratic in form, and quadratic functions always have a positive semi-definite Hessian, which guarantees convexity.

In more detail:

  • The first derivative (gradient) of the cost function is:
  • The second derivative (Hessian) with respect to is constant and positive semi-definite: This matrix is positive semi-definite, meaning it doesn’t have any negative eigenvalues, which confirms the convexity of the function.

2.4 Graphical Intuition

  • If you plot the cost function as a function of and , you’ll see that it forms a convex bowl shape.
  • The gradient descent algorithm will converge to the lowest point of the bowl, which represents the global minimum of the cost function.

3. Closed Form Solution?

In linear regression, we want to find the values of the parameters (weights) and (bias) that minimize the cost function, typically the mean squared error (MSE). The closed-form solution allows us to solve for the parameters directly without using iterative optimization methods like gradient descent.

3.1 Linear Regression Model

The linear regression model is given by:

Where:

  • is the input feature vector,
  • is the vector of weights,
  • is the bias term,
  • is the predicted value.

3.2 Cost Function (Mean Squared Error)

The cost function for linear regression is the mean squared error (MSE), which is the average of the squared differences between the predicted values and the true values :

where:

  • is the number of training samples,
  • is the predicted value for the -th training sample,
  • is the true value.

3.3 Objective

We aim to minimize the cost function with respect to the parameters and .

3.4 Closed-Form Solution Derivation

The closed-form solution involves finding the optimal parameters and by taking the derivative of the cost function with respect to and , setting them to zero, and solving the resulting system of equations.

Step 1: Derivative with respect to

The gradient of the cost function with respect to is:

Where .

Setting the gradient equal to zero to find the optimal :

Step 2: Derivative with respect to

Similarly, the gradient of the cost function with respect to is:

Setting this equal to zero to find the optimal :

Step 3: Solve for and

By solving these equations simultaneously, we can obtain the closed-form solution. However, it’s more efficient to express the solution in matrix form, especially for multiple features.

The problem in matrix notation:

  • is the matrix of input features, where each row corresponds to one data point.
  • is the vector of true labels.
  • is the vector of weights.

The model equation becomes:

where is the vector of predicted values.

The cost function in matrix form is:

By minimizing this cost function, we get the closed-form solution for the weights (ignoring for simplicity):

3.5 Closed-Form Solution in Simple Terms

For a dataset and corresponding labels , the closed-form solution to find the optimal weights is:

Where:

  • is the matrix of input features (with dimensions , where is the number of data points and is the number of features),
  • is the vector of true labels (with dimension ),
  • is the vector of optimal weights (with dimension ).

3.6 Why Is It Called the Closed-Form Solution?

This is called the closed-form solution because it provides an explicit formula for computing the optimal parameters without requiring iterative optimization methods (like gradient descent). Once you have the data, you can directly calculate using matrix operations.