This repository contains the four course projects for the course Probabilistic Artificial Intelligence (Fall 2020) held at ETH Zürich by Prof. Krause. The projects include Gaussian Processes and Bayesian Optimization, Bayesian Neural Networks (using Bayes by Backprop) and Reinforcement Learning (Actor-Critic algorithm). A short overview of the results of each project is presented in this document.
This project is mostly about classifying two-dimensional datapoints given a dataset. Since the dataset is rather large, computing the inverse of the kernel matrix is impossible and a workaround has to be found. We implemented the Nystrom method, essentially using a subset of the data and using the Schur-Complement to efficiently compute the inverse matrix.
In this project, the goal was to create a ''well calibrated'' neural network classifier, i.e. a NN for which the class prediction probabilities actually match up with the frequency of ground truth label (e.g. if a two-class problem as probabilities
MNIST dataset
Fashion MNIST dataaset

We approached the problem using Bayesian Neural Networks with a gaussian prior around
Here, the goal is maximizing an unknown function
Mathematical / Implementation details
Assuming we can draw a noisy sample
$$ k'(x, x') = k(x, x') - k_{x,A}(K_{AA} + \sigma^2 I )^{-1}k_{x',A}^T $$
where
Note the matrix inverse
Unfortunately, the result does turn out not to be numerically stable if applied a large number of times. Thus, we additionally apply a few steps of an iterative refinement scheme after each inversion, increasing the precision of the inverse (see Wu, X. and Wu, J., 2009. Iterative refinement of computing inverse matrix.), i.e.
$$
M \leftarrow M(I+(I - MM^{-1} )).
$$
Note that this update step can be derived from minimizing the error
To our knowledge, these two methods have not previously been combined in this way in the context of GPs and Bayesian Optimization.
In this project we implement a general Actor-Critic Agent and apply it on the LunarLander gym environment. We use a simple MLP for both the Actor and the Critic, a trajectory storage buffer and the Policy-Gradient algorithm to train the networks. After about 110 iterations our agent converges to a reasonable result (see video).

