The code in this repository is for a course in Experiments in NLP at the Vrije Universiteit Amsterdam (2025).
The experiments performed in this repository are designed to analyze the learning dynamics of input and contextual embeddings of BERT-like models.
This repository contains several notebooks, for:
- Training an LTG-BERT model on the
WikiText-103dataset. - Training a Wiki2Vec model on the
WikiText-103dataset. - Create a dataset of target words for embedding analysis.
- Analyze the embeddings of the LTG-BERT model.
- Analyze the embeddings of the Wiki2Vec model.
All notebooks were written to be ran in Google Colab; file paths point to Google Drive and requirements are resolved automatically by the Colab runtime.