Linear Regression with Gradient Descent Algorithm.
Gradient Descent is an optimization algorithm that is used to find the values of the parameters of a function (linear regression, logistic regression etc.) that is used to reduce a cost function. In very simple terms, it helps us to find the best fit line.
screen_record.mov
This project demonstrates the fundamental principles of machine learning by implementing a simple linear regression model without using high-level ML libraries. The goal is to predict car prices based on their mileage using a linear function optimized through gradient descent.
To understand what derivatives are, we must first know what a slope is.
A slope is defined as the ratio between the vertical change and the horizontal change between any two points in a line:
Slope = Δy / Δx
It is used to describe both:
- Direction of the line
- Steepness of the line
Gradient descent is a first-order, iterative optimization algorithm to find a local/global minimum of a differentiable function.
We will first plot a 3D graph between the coefficients: slope (m), intercept (b), and MSE (cost) as seen above.
Since the gradient descent algorithm is an iterative approach, we first randomly take the values of m and b and then change them such that the cost function becomes less and less until we reach the local/global minimum.
1. First, let m = 0 and b = 0. For these initial values, we get a higher value of MSE (according to the plot above, the value will be approximately 1000).
2. Then we take a step (step here refers to the change in the coefficients m and b) that reduces the value of m and b so that the MSE reduces (approximately 900). This step is done iteratively until we reach the local/global minimum.
3. Once we reach the local/global minimum, we can use these optimal values of m and b in our prediction function:
y = mx + b
Now we will see how the values of m and b are changed such that we get less MSE for each step. For this, let us plot the graph of MSE and the intercept (b):
We see from the above graph that, the step size keeps reducing each time we take a step and thus finally the gradient descent converges to the local/global minima. But now the question is how do we take steps with reducing sizes? For this, we have to find the slope (or the tangent) at each point so that we will know which direction to go in. This slope is nothing but the derivative of that particular point. So since there are two values, slope (m) and intercept (b), we have to find the partial derivative of the MSE (cost) with respect to both m and b. We have seen how to find the partial derivatives in the above sections, so the end result will be as below:
The partial derivatives are:
∂MSE/∂m = (2/n) × Σ(ŷᵢ - yᵢ) × xᵢ
∂MSE/∂b = (2/n) × Σ(ŷᵢ - yᵢ)
Where:
n= number of data pointsyᵢ= actual valueŷᵢ= predicted value =m × xᵢ + bxᵢ= feature value (mileage)
Along with this, we have another parameter called the learning rate, that decides the step size that we take.
Along with the partial derivatives, we have another parameter called the learning rate (α), which decides the step size that we take.
The learning parameter (α) is a hyperparameter that is responsible for determining how large the steps will be—that is, how much the coefficients can change.
Typically, the user will set the learning rate for the gradient descent algorithm, and it will be constant for the entire algorithm.
Once we calculate our derivative and decide on the learning rate, our next step is to use these two values to change the coefficients m and b. This is done by the following formula:
This is done by the following formulas:
m = m - learning_rate × ∂MSE/∂m
m = m - α × ∂MSE/∂m
b = b - learning_rate × ∂MSE/∂b
b = b - α × ∂MSE/∂b
Take some random values for the coefficients m and b and calculate the MSE (cost function).
Initial: m = 0, b = 0
MSE = (1/n) × Σ(yᵢ - ŷᵢ)²
Calculate the partial derivatives of MSE with respect to m and b.
∂MSE/∂m = (2/n) × Σ(ŷᵢ - yᵢ) × xᵢ
∂MSE/∂b = (2/n) × Σ(ŷᵢ - yᵢ)
Set a value for the learning rate. And calculate the change in m and b using the following formulas:
m = m - α × ∂MSE/∂m
b = b - α × ∂MSE/∂b
Use these new values of m and b to calculate the new MSE.
Repeat steps 2, 3, and 4 until the changes in m and b do not significantly reduce the MSE (cost).
This is the point where the algorithm has converged to the optimal solution.
pip install numpy pandas matplotlib scikit-learnpython train.pyThis will:
- Load data from
data.csv - Train the model using gradient descent
- Display real-time visualization of the learning process
- Save the trained parameters (
mandb) - Print the R² score
python predict.pyEnter a mileage value to get the estimated price based on the trained model.
https://medium.com/geekculture/mathematics-behind-gradient-descent-f2a49a0b714f



