This is an implementation of Deep Convolutional Generative Adversarial Networks modified to create 128*128 sized spectrograms to be able to be converted to WAV files using Analysis & Resynthesis Sound Spectrograph or ARSS for short.
- install ffmpeg
- install arss
pip install tqdm
Put wav files (songs of the same genre) in a folder and run
python process.py
Follow the prompts after running, process.py will split the wav files into 2 second segments and convert them into spectrograms of file type bmp using ARSS. Then since the bmp files are greyscale the files will be converted to png with just 1 image channel in a folder called png.
python train.py
will prompt the name of the images data (in this case png) and appropriate batch and epoch size.
After saving the model to a pth file generate.py and subsequently bmp_to_wav.py will generate new spectrograms and wav files
128DCGAN-Spectogram/
└── data
└── bmp
└── png
└── bmp_generated
└── wav_generated
└── bmp_to_wav.py
└── data.py
└── generate.py
└── model.py
└── process.py
└── train.py

