Skip to content

Tool accessibility: allow use of simplefold script outside of the installed repo #33

@sterrettjd-idt

Description

@sterrettjd-idt

Hey Simplefold team,

Thanks for publishing this tool - it works great if installed according to the instructions in the readme.

However, the structure of accessing model architecture YAMLs via the configs/ directory in this repo doesn't allow for simplefold to be run from another directory or included as a dependency for another project.

Example 1 - simplefold can't be included as a package dependency

For example, if I install my own package "folding-tools" (via poetry), where I include simplefold as a dependency from this git repo, the config files aren't included in my installation of simplefold.

[project]
name = "folding-tools"
version = "0.0.0"
description = "folding"
authors = [
    {name = "John Sterrett", email = "jsterrett@idtdna.com"}
]
requires-python = ">=3.10,<3.13"
dependencies = [
    "simplefold @ git+ssh://git@github.com/apple/ml-simplefold.git",
    "mlx==0.28.0",
    "fair-esm @ git+ssh://git@github.com/facebookresearch/esm.git"
]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

This will install simplefold without any errors, but running the simplefold script raises a runtime error

# assume local directory /path/to/dir/folding-tools
# install
conda create -n protein_folding_dev python=3.10
conda activate protein_folding_dev
pip install poetry

poetry install

# run simplefold script
simplefold --simplefold_model "simplefold_100M" --num_steps 100 --tau 0.01  --nsample_per_protein 1  --plddt --fasta_path test.fa --output_dir simplefold_output_test --backend mlx

Error:

Running protein folding with SimpleFold ...
Seed set to 42
Traceback (most recent call last):
  File "/path/to/miniconda3/envs/protein_folding_dev/bin/simplefold", line 8, in <module>
    sys.exit(main())
  File "/path/to/miniconda3/envs/protein_folding_dev/lib/python3.10/site-packages/simplefold/cli.py", line 39, in main
    predict_structures_from_fastas(args)
  File "/path/to/miniconda3/envs/protein_folding_dev/lib/python3.10/site-packages/simplefold/inference.py", line 271, in predict_structures_from_fastas
    model, device = initialize_folding_model(args)
  File "/path/to/miniconda3/envs/protein_folding_dev/lib/python3.10/site-packages/simplefold/inference.py", line 72, in initialize_folding_model
    model_config = omegaconf.OmegaConf.load(cfg_path)
  File "/path/to/miniconda3/envs/protein_folding_dev/lib/python3.10/site-packages/omegaconf/omegaconf.py", line 189, in load
    with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/path/to/dir/folding-tools/configs/model/architecture/foldingdit_100M.yaml'

Source of error

The following lines in simplefold cause this error by referencing a file that wasn't included in the package build. They look locally for the configs/directory.

cfg_path = os.path.join("configs/model/architecture", f"foldingdit_{simplefold_model[11:]}.yaml")
plddt_module_path = "configs/model/architecture/plddt_module.yaml"
and
plddt_latent_config_path = "configs/model/architecture/foldingdit_1.6B.yaml"

Example 2 - simplefold can't be run from another directory

If I install simple fold according to the readme, cd .., then try to run the simplefold script the new directory, there are also issues due to the relative path to the configs/ dir.

simplefold --simplefold_model "simplefold_100M" --num_steps 100 --tau 0.01  --nsample_per_protein 1  --plddt --fasta_path test.fa --output_dir simplefold_output_test --backend mlx --ckpt_dir ml-simplefold/artifacts

Running protein folding with SimpleFold ...
Traceback (most recent call last):
  File "/path/to/miniforge3/envs/simplefold/bin/simplefold", line 7, in <module>
    sys.exit(main())
  File "/path/to/ml-simplefold/src/simplefold/cli.py", line 38, in main
    predict_structures_from_fastas(args)
  File "/path/to/ml-simplefold/src/simplefold/inference.py", line 267, in predict_structures_from_fastas
    model, device = initialize_folding_model(args)
  File "/path/to/ml-simplefold/src/simplefold/inference.py", line 78, in initialize_folding_model
    with open(cfg_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'configs/model/architecture/foldingdit_100M.yaml'

Allowing simplefold script to be run from anywhere

In my fork, I've copied the relevant config files into src/simplefold/configs and used importlib.resources to access these from anywhere. Now, simplefold can be included as a dependency by other packages.

[project]
name = "folding-tools"
version = "0.0.0"
description = "folding"
authors = [
    {name = "John Sterrett", email = "jsterrett@idtdna.com"}
]
requires-python = ">=3.10,<3.13"
dependencies = [
    "simplefold @ git+ssh://git@github.com/sterrettjd-idt/ml-simplefold.git",
    "mlx==0.28.0",
    "fair-esm @ git+ssh://git@github.com/facebookresearch/esm.git"
]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

I can open a PR with these changes so you can see them - would it be possible to incorporate these so that simplefold can be more widely used as a dependency for other tools?

Thanks!
John

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions