S2I Translation

Implementation of a sound-to-image (S2I) translation system

Research

The motivation of our research is to develop a sound-to-image (S2I) translation system for enabling a human receiver to visually infer the occurrence of sound related events. We expect the computer to ‘imagine’ the scene from the captured sound, generating original images that picture the sound emitting source.

Published papers

Sound-to-imagination: an exploratory study on cross-modal translation using diverse audiovisual data
Leonardo A. Fanzeres, Climent Nadeu
Appl. Sci., vol. 13, no. 19, p. 10833, Jan. 2023, doi: 10.3390/app131910833 | MDPI

Sound-to-image translation through direct cross-modal connection using a convolutional–attention generative model
Leonardo A. Fanzeres, Climent Nadeu, José Adrián R. Fonollosa
Appl. Sci., vol. 16, no. 6, p. 2942, Jan. 2026, doi: 10.3390/app16062942 | MDPI

Setup

Requirements (tested versions)

csv (1.0)
matplotlib (2.2.2 to 3.1.1)
numpy (1.14.2 to 1.17.2)
python (3.5.2 to 3.7.4)
scipy (1.0.1 to 1.3.1)
torch (1.1.0)
torchvision (0.3.0)
Can be executed in CPU mode, but it is recommended to run in GPU with cuda (9.0.176) + CuDNN

Get Started

Install Pytorch and the other required packages listed above
Clone or download this repository
Download data binary files from https://github.com/leofanzeres/s2i_data.git
Execute a quantitative test using the interpretability classifiers
Execute a qualitative test generating the translated images
Train the autoencoder model from scratch executing actions/train_net_audio_autoencoder.py
Train the visual generator model from scratch and report the achieved interpretability executing ... (to be made available)

Acknowledgments

The present work was supported in part by the Brazilian National Council for Scientific and Technological Development (CNPq) under the PhD grant 200884/2015-8. Also, the work was partly supported by the Spanish State Research Agency (AEI) project PID2019-107579RB-I00/AEI/10.13039/501100011033. Furthermore, the authors are thankful to Santiago Pascual for his advice on the implementation of GANs. We also thank Josep Pujal for his support in using the computational resources of the Signal Theory and Communications Department at the Polytechnic University of Catalonia (UPC).

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
actions		actions
datasets		datasets
images		images
models		models
trained_models		trained_models
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
utils.py		utils.py
values.py		values.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S2I Translation

Research

Published papers

Setup

Requirements (tested versions)

Get Started

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

S2I Translation

Research

Published papers

Setup

Requirements (tested versions)

Get Started

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages