Skip to content

Conversation

@LTMeyer
Copy link
Collaborator

@LTMeyer LTMeyer commented Apr 10, 2025

This PR implements the codec for desi spectrum (stored at /mnt/ceph/users/polymathic/MMOMA/outputs/mmoma_codec_sdss+desi/6kzi0iz9/checkpoints/last.pt).

I checked I could reproduce the same encoded data as the original codec from the same random input.

Reflecting on this PR related to the other, we may want to reorganize a bit the codec. For instance, removing the pytorch-lightning dependencies make the codecs standard classes, whereas we would like them to be torch.nn.Module and ultimately offer HF support.

@LTMeyer LTMeyer changed the base branch from main to add_tokenizers April 10, 2025 11:26
Base automatically changed from add_tokenizers to main May 22, 2025 13:44
@EiffL EiffL requested review from EiffL and Copilot May 23, 2025 18:58
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new autoencoder‐based codec for processing desi spectra in the Aion framework. Key changes include introducing a Spectrum modality with dedicated fields, implementing the SpectrumCodec along with its encoding/decoding logic using ConvNeXt-based modules and quantizers, and adding supporting test data and dependency updates.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/tokenizers/test_spectrum_tokenizer.py Added tests for the new spectrum codec using a Hugging Face–pretrained model.
tests/test_data/SPECTRUM_input_batch.pt Added sample input data for spectrum modality.
tests/test_data/SPECTRUM_encoded_batch.pt Added sample encoded output data for spectrum codec verification.
tests/test_data/SPECTRUM_decoded_batch.pt Added sample decoded output data for spectrum codec verification.
pyproject.toml Updated dependencies to include vector_quantize_pytorch.
aion/modalities.py Introduced the Spectrum modality with fields for flux, ivar, mask, and wavelength.
aion/codecs/tokenizers/spectrum.py Implemented a Spectrum codec class with autoencoder logic and quantization integration.
aion/codecs/quantizers/init.py Added new LFQ and scalar quantizers for handling the latent space in the codec.
aion/codecs/modules/utils.py Added custom LayerNorm and GRN utility modules.
aion/codecs/modules/spectrum.py Provided interpolation functions and a latent spectral grid for converting between grids.
aion/codecs/modules/convnext.py Added 1D ConvNeXt-based encoder and decoder modules for processing spectral data.

@EiffL EiffL merged commit c2626a4 into main May 23, 2025
2 checks passed
@EiffL EiffL deleted the add-spectrum-tokenizer branch May 23, 2025 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants