Add Nucleotide Transformer v2 SeqTrainer tutorial for promoter tasks#6
Add Nucleotide Transformer v2 SeqTrainer tutorial for promoter tasks#6Gonza10V wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba4856a8a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| " per_device_train_batch_size=2,\n", | ||
| " per_device_eval_batch_size=2,\n", | ||
| " num_train_epochs=1,\n", | ||
| " eval_strategy=\"epoch\",\n", |
There was a problem hiding this comment.
Use
evaluation_strategy for Transformers 4.30 compatibility
This notebook passes eval_strategy to TrainingArguments, but this parameter is not accepted in the project's declared minimum transformers versions (e.g., 4.30.x uses evaluation_strategy), so running the tutorial in a compliant environment raises TypeError before training starts. The same incompatible argument appears in both classification and regression argument blocks, which makes the tutorial fail for users installing seqtrainer[torch] with the documented dependency floor.
Useful? React with 👍 / 👎.
Motivation
Trainerto serve as a reproducible example for users.Description
src/seqtrainer/torch/hf.pywhich definesNucleotideTransformerV2Specand providesregister_default_torch_backbonesandget_nucleotide_transformer_v2_backboneto register/resolve a built-in backbone spec forInstaDeepAI/nucleotide-transformer-v2-500m-multi-species.seqtrainer.torchby updatingsrc/seqtrainer/torch/__init__.pyso the backbone can be resolved viafrom seqtrainer.torch import get_nucleotide_transformer_v2_backbone.notebooks/tutorials/06_ntv2_promoter_tasks.ipynbthat loads the example promoter dataset, builds SeqTrainerMaterializedDatasetsplits, tokenizes with the resolved tokenizer, and shows HFTrainersetup for both classification (median-thresholded) and regression (raw target); training/eval calls are commented to avoid accidental heavy runs.README.mdto include the new notebook in the tutorial list.Testing
python -m compileall src/seqtrainerto ensure files compile and the package files are syntactically valid, which succeeded.PYTHONPATH=src python - <<'PY' ... get_nucleotide_transformer_v2_backbone() ... PY, which returned the expectedBackboneSpecsuccessfully.PYTHONPATH=src; this import failure is environment-specific and not a code defect.Codex Task