[BUG] Query IDs with dots in the name is not allowed.


**Describe the bug**
`write_structure` in openfold3/core/data/io/structure/cif.py raises NotImplementedError: Only .cif, .bcif, and .pkl formats are supported when writing .pdb output for queries whose IDs contain dots (e.g. 7txk__1__1.A__1.C).

Introduced in [Commit 0e84102](https://github.com/aqlaboratory/openfold-3/commit/0e84102efff5ab774d82c877cbd57ec2520f568c), This commit changed the suffix extraction in cif.py from:
```
suffix = output_path.suffix
```
to:
```
suffix = "".join(output_path.suffixes)  # to handle .cif.gz
```
The intent was to detect .cif.gz as a compound extension. However, Path.suffixes returns all dot-separated segments of the filename, not just the file extension. When the output filename contains dots from the query ID (e.g. 7txk__1__1.A__1.C_seed_101001_sample_1_model.pdb), output_path.suffixes returns ['.A__1', '.C_seed_101001_sample_1_model', '.pdb'], and joining them produces ".A__1.C_seed_101001_sample_1_model.pdb" — which matches none of the case branches and falls through to the catch-all NotImplementedError.

**To Reproduce**
Steps to reproduce the behavior. If possible please include:
- A query (e.g. json) that triggers the issue:
```
{
    "queries": {
        "8q0u__1__1.A__1.D": {
            "chains": [
                {
                    "molecule_type": "protein",
                    "chain_ids": [
                        "A"
                    ],
                    "sequence": "SNAQIDGFVRTLRARPEAGGKVPVFVFHPAGGSTVVYEPLLGRLPADTPMYGFERVEGSIEERAQQYVPKLIEMQGDGPYVLVGWSLGGVLAYACAIGLRRLGKDVRFVGLIDAVRAGEEIPQTKEEIRKRWDRYAAFAEKTFNVTIPAIPYEQLEELDDEGQVRFVLDAVSQSGVQIPAGIIEHQRTSYLDNRAIDTAQIQPYDGHVTLYMADRYHDDAIMFEPRYAVRQPDGGWGEYVSDLEVVPIGGEHIQAIDEPIIAKVGEHMSRALGQIEADRTSEVGKQ",
                    "main_msa_file_paths": "/data/of3/colabfold_msas/main/65d30549a1d4539a138d452e53d382751619acfb757b94ad03782111643321e6.npz",
                    "template_alignment_file_path": "/data/of3/colabfold_msas/template/65d30549a1d4539a138d452e53d382751619acfb757b94ad03782111643321e6/colabfold_template.m8"
                },
                {
                    "molecule_type": "ligand",
                    "chain_ids": [
                        "B"
                    ],
                    "smiles": "COc1ccc(-c2noc(C(=O)N[C@@H](C#N)Cc3ccc(C(=O)N4CCC4)cc3)n2)cc1OC"
                }
            ]
        },
```

Run inference with structure_format: pdb (the default) on any query whose ID contains a dot. Every such prediction fails to write output.

**Stack trace**
```
ERROR:openfold3.core.runners.writer:Failed to write predictions for query_id(s) 8q0u__1__1.A__1.D: Only .cif, .bcif, and .pkl formats are supported
Traceback (most recent call last):
  File "/opt/openfold3/openfold3/core/runners/writer.py", line 331, in on_predict_batch_end
    self.write_all_outputs(
  File "/opt/openfold3/openfold3/core/runners/writer.py", line 263, in write_all_outputs
    self.write_structure_prediction(
  File "/opt/openfold3/openfold3/core/runners/writer.py", line 126, in write_structure_prediction
    write_structure(
  File "/opt/openfold3/openfold3/core/data/io/structure/cif.py", line 371, in write_structure
    raise NotImplementedError(
NotImplementedError: Only .cif, .bcif, and .pkl formats are supported
```
**Suggested Fix**:
Use `output_path.suffix` (which only returns the last extension) and detect `.cif.gz` explicitly:

Replace line 329 in cif.py (`suffix = "".join(output_path.suffixes)`) with:

```
    suffix = output_path.suffix
    if suffix == ".gz" and output_path.stem.endswith(".cif"):
        suffix = ".cif.gz"
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Query IDs with dots in the name is not allowed. #176

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Query IDs with dots in the name is not allowed. #176

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions