This repository contains the python code used in the dissertation "Deep Sustainable Finance: An End-to-End Text Analysis of the Financial and Environmental Narratives in Corporate Disclosures". The code for each model can be found in the folder "Models". The files containing the names of the 10-K and 10-Q reports, the corresponding labels, market capitalization, and industry are in the folder "Files". The labels are already split into a training, validation, and test set (i.e., "train_data_rs0.txt", "val_data_rs0.txt", and "test_data_rs0.txt"). The environmental word list (see chapter 5.5) is also contained in the folder "Files".
Note: This repository does not contain the 10-K and 10-Q filings. The pre-processed filings can be downloaded from SRAF. The unprocessed files can be downloaded via the U.S. Securities and Exchange Commission.
- Install Anaconda
- Create a virtual environment and install TensorFlow and PyTorch.
- Install transformers
- Install pandas, numpy, nltk, scikit-learn, and lime.
This repository was tested on the following versions: numpy 1.17.0, pandas 1.2.4, tensorflow 2.5.0, keras 2.3.1, nltk 3.4.5, pytorch 1.0.1, and scikit-learn 0.24.2. Please refer to the according installation pages for the specific install command.