Learnt what a ADAptive LInear NEuron (adaline) is, how it is defined, the math behind it, and the math behind **batch gradient descent** using an **Identity** activation function and **MSE loss**. Implemented a basic adaline in python and used it to fit data from the iris flowers dataset and looked at how the learning rate affects algorithm convergence.
Derived the mathematical background for a perceptron and Adaline by hand. Learnt about feature scaling and one feature scaling technique — standardization. Fixed a bug about the predictions made by Adaline that I had overlooked earlier. Used scaled features to train an Adaline and compared convergence of the Adaline before and after.

Implemented the stochastic gradient descent algorithm for an Adaline. Saw how it affects the convergence rate (Adaline reached convergence faster now). Also implemented online learning using the SGD Adaline

Used scikit-learn's implementation of a Perceptron and tried to fit it so it could classify points into three different classes. The classes were not lineary separable, so Perceptron did not reach convergence (one of the limitations of perceptrons). Also learned about train-test-splitting.
Learned how logistic regression is modelled using logits and derived by hand the mathematics behind the working of logistic regression, including the maximun likelihood and log maximum likelihood, loss function, and gradient descent along the loss function (and ascent). Examined the behaviour of the loss function. Implemented a simple logistic regression algorithm that uses full batch gradient descent to learn.
Used scikit-learn's implementation of Logistic Regression to train it on the iris-flower dataset and make predictions on three class labels using OneVsRest classification. Derived the loss function when we add a regularization term to it, and explored the effect of increasing the regularization term on weights.