Describe the bug
write_structure in openfold3/core/data/io/structure/cif.py raises NotImplementedError: Only .cif, .bcif, and .pkl formats are supported when writing .pdb output for queries whose IDs contain dots (e.g. 7txk__1__1.A__1.C).
Introduced in Commit 0e84102, This commit changed the suffix extraction in cif.py from:
suffix = output_path.suffix
to:
suffix = "".join(output_path.suffixes) # to handle .cif.gz
The intent was to detect .cif.gz as a compound extension. However, Path.suffixes returns all dot-separated segments of the filename, not just the file extension. When the output filename contains dots from the query ID (e.g. 7txk__1__1.A__1.C_seed_101001_sample_1_model.pdb), output_path.suffixes returns ['.A__1', '.C_seed_101001_sample_1_model', '.pdb'], and joining them produces ".A__1.C_seed_101001_sample_1_model.pdb" — which matches none of the case branches and falls through to the catch-all NotImplementedError.
To Reproduce
Steps to reproduce the behavior. If possible please include:
- A query (e.g. json) that triggers the issue:
{
"queries": {
"8q0u__1__1.A__1.D": {
"chains": [
{
"molecule_type": "protein",
"chain_ids": [
"A"
],
"sequence": "SNAQIDGFVRTLRARPEAGGKVPVFVFHPAGGSTVVYEPLLGRLPADTPMYGFERVEGSIEERAQQYVPKLIEMQGDGPYVLVGWSLGGVLAYACAIGLRRLGKDVRFVGLIDAVRAGEEIPQTKEEIRKRWDRYAAFAEKTFNVTIPAIPYEQLEELDDEGQVRFVLDAVSQSGVQIPAGIIEHQRTSYLDNRAIDTAQIQPYDGHVTLYMADRYHDDAIMFEPRYAVRQPDGGWGEYVSDLEVVPIGGEHIQAIDEPIIAKVGEHMSRALGQIEADRTSEVGKQ",
"main_msa_file_paths": "/data/of3/colabfold_msas/main/65d30549a1d4539a138d452e53d382751619acfb757b94ad03782111643321e6.npz",
"template_alignment_file_path": "/data/of3/colabfold_msas/template/65d30549a1d4539a138d452e53d382751619acfb757b94ad03782111643321e6/colabfold_template.m8"
},
{
"molecule_type": "ligand",
"chain_ids": [
"B"
],
"smiles": "COc1ccc(-c2noc(C(=O)N[C@@H](C#N)Cc3ccc(C(=O)N4CCC4)cc3)n2)cc1OC"
}
]
},
Run inference with structure_format: pdb (the default) on any query whose ID contains a dot. Every such prediction fails to write output.
Stack trace
ERROR:openfold3.core.runners.writer:Failed to write predictions for query_id(s) 8q0u__1__1.A__1.D: Only .cif, .bcif, and .pkl formats are supported
Traceback (most recent call last):
File "/opt/openfold3/openfold3/core/runners/writer.py", line 331, in on_predict_batch_end
self.write_all_outputs(
File "/opt/openfold3/openfold3/core/runners/writer.py", line 263, in write_all_outputs
self.write_structure_prediction(
File "/opt/openfold3/openfold3/core/runners/writer.py", line 126, in write_structure_prediction
write_structure(
File "/opt/openfold3/openfold3/core/data/io/structure/cif.py", line 371, in write_structure
raise NotImplementedError(
NotImplementedError: Only .cif, .bcif, and .pkl formats are supported
Suggested Fix:
Use output_path.suffix (which only returns the last extension) and detect .cif.gz explicitly:
Replace line 329 in cif.py (suffix = "".join(output_path.suffixes)) with:
suffix = output_path.suffix
if suffix == ".gz" and output_path.stem.endswith(".cif"):
suffix = ".cif.gz"
Describe the bug
write_structurein openfold3/core/data/io/structure/cif.py raises NotImplementedError: Only .cif, .bcif, and .pkl formats are supported when writing .pdb output for queries whose IDs contain dots (e.g. 7txk__1__1.A__1.C).Introduced in Commit 0e84102, This commit changed the suffix extraction in cif.py from:
to:
The intent was to detect .cif.gz as a compound extension. However, Path.suffixes returns all dot-separated segments of the filename, not just the file extension. When the output filename contains dots from the query ID (e.g. 7txk__1__1.A__1.C_seed_101001_sample_1_model.pdb), output_path.suffixes returns ['.A__1', '.C_seed_101001_sample_1_model', '.pdb'], and joining them produces ".A__1.C_seed_101001_sample_1_model.pdb" — which matches none of the case branches and falls through to the catch-all NotImplementedError.
To Reproduce
Steps to reproduce the behavior. If possible please include:
Run inference with structure_format: pdb (the default) on any query whose ID contains a dot. Every such prediction fails to write output.
Stack trace
Suggested Fix:
Use
output_path.suffix(which only returns the last extension) and detect.cif.gzexplicitly:Replace line 329 in cif.py (
suffix = "".join(output_path.suffixes)) with: