GitHub - sensoyyasin/linear_regression: Linear regression with Gradient Descent Algorithm

Introduction

Linear Regression with Gradient Descent Algorithm.

Gradient Descent is an optimization algorithm that is used to find the values of the parameters of a function (linear regression, logistic regression etc.) that is used to reduce a cost function. In very simple terms, it helps us to find the best fit line.

screen_record.mov

This project demonstrates the fundamental principles of machine learning by implementing a simple linear regression model without using high-level ML libraries. The goal is to predict car prices based on their mileage using a linear function optimized through gradient descent.

Understanding Derivatives

To understand what derivatives are, we must first know what a slope is.

A slope is defined as the ratio between the vertical change and the horizontal change between any two points in a line:

Slope = Δy / Δx

It is used to describe both:

Direction of the line
Steepness of the line

How does Gradient Descent work?

Gradient descent is a first-order, iterative optimization algorithm to find a local/global minimum of a differentiable function.

We will first plot a 3D graph between the coefficients: slope (m), intercept (b), and MSE (cost) as seen above.

Since the gradient descent algorithm is an iterative approach, we first randomly take the values of m and b and then change them such that the cost function becomes less and less until we reach the local/global minimum.

Step-by-Step Process:

1. First, let m = 0 and b = 0. For these initial values, we get a higher value of MSE (according to the plot above, the value will be approximately 1000).

2. Then we take a step (step here refers to the change in the coefficients m and b) that reduces the value of m and b so that the MSE reduces (approximately 900). This step is done iteratively until we reach the local/global minimum.

3. Once we reach the local/global minimum, we can use these optimal values of m and b in our prediction function:

y = mx + b

Understanding Step Size Reduction

Now we will see how the values of m and b are changed such that we get less MSE for each step. For this, let us plot the graph of MSE and the intercept (b):

We see from the above graph that, the step size keeps reducing each time we take a step and thus finally the gradient descent converges to the local/global minima. But now the question is how do we take steps with reducing sizes? For this, we have to find the slope (or the tangent) at each point so that we will know which direction to go in. This slope is nothing but the derivative of that particular point. So since there are two values, slope (m) and intercept (b), we have to find the partial derivative of the MSE (cost) with respect to both m and b. We have seen how to find the partial derivatives in the above sections, so the end result will be as below:

Partial derivatives of MSE

The partial derivatives are:

∂MSE/∂m = (2/n) × Σ(ŷᵢ - yᵢ) × xᵢ

∂MSE/∂b = (2/n) × Σ(ŷᵢ - yᵢ)

Where:

n = number of data points
yᵢ = actual value
ŷᵢ = predicted value = m × xᵢ + b
xᵢ = feature value (mileage)

Along with this, we have another parameter called the learning rate, that decides the step size that we take.

Learning Rate

Along with the partial derivatives, we have another parameter called the learning rate (α), which decides the step size that we take.

Learning Rate (α):

The learning parameter (α) is a hyperparameter that is responsible for determining how large the steps will be—that is, how much the coefficients can change.

Typically, the user will set the learning rate for the gradient descent algorithm, and it will be constant for the entire algorithm.

Once we calculate our derivative and decide on the learning rate, our next step is to use these two values to change the coefficients m and b. This is done by the following formula:

This is done by the following formulas:

m = m - learning_rate × ∂MSE/∂m
m = m - α × ∂MSE/∂m

b = b - learning_rate × ∂MSE/∂b
b = b - α × ∂MSE/∂b

Algorithm Overview

STEP 1:

Take some random values for the coefficients m and b and calculate the MSE (cost function).

Initial: m = 0, b = 0
MSE = (1/n) × Σ(yᵢ - ŷᵢ)²

STEP 2:

Calculate the partial derivatives of MSE with respect to m and b.

∂MSE/∂m = (2/n) × Σ(ŷᵢ - yᵢ) × xᵢ
∂MSE/∂b = (2/n) × Σ(ŷᵢ - yᵢ)

STEP 3:

Set a value for the learning rate. And calculate the change in m and b using the following formulas:

m = m - α × ∂MSE/∂m
b = b - α × ∂MSE/∂b

STEP 4:

Use these new values of m and b to calculate the new MSE.

STEP 5:

Repeat steps 2, 3, and 4 until the changes in m and b do not significantly reduce the MSE (cost).

This is the point where the algorithm has converged to the optimal solution.

Installation & Usage

Requirements

pip install numpy pandas matplotlib scikit-learn

Training the Model

python train.py

This will:

Load data from data.csv
Train the model using gradient descent
Display real-time visualization of the learning process
Save the trained parameters (m and b)
Print the R² score

Making Predictions

python predict.py

Enter a mileage value to get the estimated price based on the trained model.

Source:

https://medium.com/geekculture/mathematics-behind-gradient-descent-f2a49a0b714f

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
data.csv		data.csv
predict.py		predict.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Understanding Derivatives

How does Gradient Descent work?

Step-by-Step Process:

Understanding Step Size Reduction

Partial derivatives of MSE

Learning Rate

Learning Rate (α):

Algorithm Overview

STEP 1:

STEP 2:

STEP 3:

STEP 4:

STEP 5:

Installation & Usage

Requirements

Training the Model

Making Predictions

Source:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Understanding Derivatives

How does Gradient Descent work?

Step-by-Step Process:

Understanding Step Size Reduction

Partial derivatives of MSE

Learning Rate

Learning Rate (α):

Algorithm Overview

STEP 1:

STEP 2:

STEP 3:

STEP 4:

STEP 5:

Installation & Usage

Requirements

Training the Model

Making Predictions

Source:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages