- Download the data sample from: http://repo.pi.ingv.it/instance/Instance_sample_dataset_v2.tar.bz2
- Create an environment variable file, called
.envat the root of the project. This is to set thedataandoutputpaths. Change the variable value to your actual data and output location. The folder and files should exist before you run the code.
- EVENT_HDF5_FILE="data/instance_samples/Instance_events_counts_10k.hdf5"
- EVENT_METADATA_FILE="data/instance_samples/metadata_Instance_events_10k.csv"
- NOISE_HDF5_FILE="data/instance_samples/Instance_noise_1k.hdf5"
- NOISE_METADATA_FILE="data/instance_samples/metadata_Instance_noise_1k.csv"
- FINAL_OUTPUT_DIR="output"
- TEMP_DIR="temp"
- Create a
venvenvironment using python 3.11 and install the packages in therequirements.txtfound at the root of the project. Use this environment to run the code.
- Run
python train_my_eq.pyfrom the root of the project - Check progress in the terminal. At the end, the result will be added to the
outputfile mentionned in the.envfile.
- Run
python train_my_cnn1.pyorpython train_my_cnn2.py - Check progress in the terminal. At the end, the result will be added to the
outputfile mentionned in the.envfile.
- Copy one of the file
train_my_eq.pyand edit it to change the model and hyperparameters - Train your model by raining
python train_my_own_model.py
Instance data have been downloaded and available here: ~/projects/def-sponsor00/earthquake/data/instance
STEAD data are not yet downloaded but can be added here: ~/projects/def-sponsor00/earthquake/data/stead
You want to downlaod in parallel for STEAD:
- put all your the urls in a file (files.txt)
- and do:
cat files.txt | xargs -n 1 -P 0 wget -q
-P 0 let xargs choose the number of parallels work. You can assign a hard number if you want
- Use
tmuxto run your session. Once connected on the server:
- Type
tmuxto start a new session ortmux attachto recover from an old session. I suggest using it as tmux will keep your terminal session running even if you loose connection. Otherwise you might need to start from scratch - Type
ctrl-b + %to split your screen andctrl-b <arrow>to navigate through the panes. I use it to be able to run simultaneously multiple terminals as one might be blocked by a long running task. - Use
ctrl-b + zto toggle one pane full screen or not - Type
exitto close tmux pane
- Using the terminal, login to your cluster, preferrably through ssh (e.g: ssh username@ift6759.calculquebec.cloud)
- Create a folder where you will clone your repo:
mkdir documents - Get into the folder and clone the repo or pull if already cloned before (using ssh preferrably):
cd documentsgit clone git@github.com:damoursm/earthquake.gitORgit pull
- Get into the scripts folder and run the setup script. That will create sbatch and scratch folder and move code and scripts to proper location
cd scripts./setup.sh
- Go back to home Add your
.envfile in the code foldercd ~vim scratch/code-snapshots/earthquake/.env- Set the variable as in the above section. You can leave all variables except
FINAL_OUTPUT_DIRwith empty values ("") as code is detecting cluster and selecting where the code is. - Set
FINAL_OUTPUT_DIR="scratch/<your username>/output/default-train".default-trainis used as default but you can change it if you want to save output of different experiments. Just make sure the folder exists
- Keep in the home folder and start training. Replace
train_transformer_elisee.pywith the file containing the code (For EqModel, usetrain_my_eq.pyand for Cnn, usetrain_my_cnn.py)cd ~./sbatch/run.sh -p train_transformer_elisee.py. Here you can optionally specifiy few arguments.-m 16Gbfor memory (by default8Gb).-t hh:mm:ssfor how long to run (by default 1H).-p /train_xxx.pyfor the file to execute (by default it will run train.py).-c 1for the number of cpu to use.g 1for the number of gpu to use.
- Once training is done, the files will be in the
FINAL_OUTPUT_DIRspecified in the.env. To download them 1 by 1 on your local computer, use this command line:scp <username>@ift6759.calculquebec.cloud:/scratch/<username>/output/default-train/<filename> <local path e.g /Users/ekabore/Downloads> - Useful slurm commands:
- squeue -u username : will show the current job being submitted
- scontrol show job jobid : show details about the job
- scancel jobid : cancels a job
- Start the mlflow server by running
mlflow serverin the terminal - Fill you hyperparameters and configuration in the config file config.py
- Activate the environment earthquake
- Run the script
python main.py - You can access the MLflow experiment in the UI by going to
http://localhost:5000in your browser