Skip to content

odcambc/DIMPLE

 
 

Repository files navigation

Open In Colab

DIMPLE: Deep Indel Missense Programmable Library Engineering

Protein domain insertion via programmed oligo libraries

A Python script for generating oligo libraries and PCR primers for Deep Mutational Scanning library generation incorporating indel variation.

Take a look at the protocol on protocols.io for more information on generating and assembling libraries as well.

Note: This is the active repository for DIMPLE development. The archived repository containing the code used in the publication is here, and is also archived at Zenodo.

Installation

Simple method

Run DIMPLE in a Google Colab notebook (also linked above) without downloading or installing on your computer. Follow the prompts and generate a library.

Local install

Install requirements

Using uv (recommended)

Use uv to create a local virtual environment and install the project plus dependencies from pyproject.toml and uv.lock.

Install and sync dependencies:

uv sync

Run commands in the environment:

uv run python run_dimple.py -h
uv run python run_dimple_gui.py
uv run pytest

Using Conda (alternative)

Use the supplied Conda environment file to install and manage dependencies. This creates a new environment called dimple_env:

conda env create -f dimple_env.yml
conda activate dimple_env

Using pip requirements (alternative)

DIMPLE requires the following packages:

  • python
  • numpy
  • biopython
  • tkinter (only required for GUI version)

Install with:

python -m pip install -r requirements.txt

Note: DIMPLE has been tested on Python version 3.12. Biopython is currently incompatible with Python 3.13 in some cases, and we recommend using Python 3.12 for now.

Inputs

Target gene file

Targeted genes should be supplied in FASTA format. To allow DIMPLE to check for nonspecific amplification, include the entire plasmid sequence of the library generation construct in the file.

The ORF can be specified in the fasta header for each target gene. If desired, the header should include the start and end positions of the gene in the plasmid, as follows:

>gene1 start:10 end:100
ATGTT...

The start position should be the first base of the first codon, and the end position should be the last base of the last codon. Otherwise specify the ORF in the command line.

Headless / non-interactive runs

DIMPLE auto-detects ORFs by scanning all six reading frames and looking for stretches longer than 100 amino acids. In an interactive run this prompts the user to pick one. Headless runs (Colab, CI, run_dimple.py --non_interactive, automation pipelines) cannot prompt, so you must disambiguate the ORF up-front in one of two ways:

  1. Embed start: and end: in the FASTA header (recommended — eliminates ambiguity entirely; ORF auto-detection is skipped):

    >gene1 start:10 end:100
    ATGTT...
    
  2. Pass --non_interactive --orf_index N to run_dimple.py to pick the Nth detected ORF, where N is 1-based:

    uv run python run_dimple.py -geneFile gene.fa -DMS \
        --non_interactive --orf_index 1
    

If neither is provided and --non_interactive is set, DIMPLE raises a clear ValueError rather than hanging on a prompt:

  • No ORF candidates found → no ≥100-aa frame matched. Add explicit start:/end: header.
  • Multiple ORF candidates found → multiple ≥100-aa frames matched. Add start:/end: header or --orf_index N.
  • Preferred ORF index N is out of range--orf_index value larger than the detected count.

The Colab notebook defaults to non_interactive=True and exposes orf_index as a form field for the same reason.

Running DIMPLE

Colab version

Using the Google Colab notebook, follow the prompts and explanations. Also check the options below for additional usage.

Local version

We have supplied two methods to run DIMPLE: a command-line version, and a GUI. Both offer the same functionality, but the GUI is more user-friendly.

GUI usage

Start the GUI with the following command:

uv run python run_dimple_gui.py
# or, without uv:
python run_dimple_gui.py

DIMPLE_GUI

The following are required:

  • Target gene file (see below for format requirements)
  • One or more of the mutations to make to the target gene

Supply options, then generate library by pressing 'Run DIMPLE' button.

Command-line usage

See a description of options for command-line version:

uv run python run_dimple.py -h
# or, without uv:
python run_dimple.py -h

Full list of options:

options:
  -h, --help            show this help message and exit
  -wDir WDIR            Working directory for fasta files and output folder
  -geneFile GENEFILE    Input all gene sequences including backbone in a fasta format. Place all in one fasta file. Name description can include start and end points (>gene1 start:1
                        end:2)
  -handle HANDLE        Genetic handle for domain insertion. This is important for defining the linker. Currently uses BsaI (4 base overhang), but this can be swapped for SapI (3
                        base overhang).
  -dis DIS              use the handle to insert domains at every position in POI
  -matchSequences       Find similar sequences between genes to avoid printing the same oligos multiple times. Default: No matching
  -oligoLen OLIGOLEN    Synthesized oligo length
  -fragmentLen FRAGMENTLEN
                        Maximum length of gene fragment
  -overlap OVERLAP      Enter number of bases to extend each fragment for overlap. This will help with insertions close to fragment boundary
  -DMS                  Choose if you will run deep deep mutation scan
  -custom_mutations CUSTOM_MUTATIONS
                        Path to file that includes custom mutations with the format position:AA
  -usage USAGE          Default is "human". Or select "ecoli. Or change code"
  -insertions INSERTIONS [INSERTIONS ...]
                        Enter a list of insertions (nucleotides) to make at every position. Note, you should enter multiples of 3 nucleotides to maintain reading frame
  -deletions DELETIONS [DELETIONS ...]
                        Enter a list of deletions (number of nucleotides) to symmetrically delete (it will make deletions in multiples of 2x). Note you should enter multiples of 3 to
                        maintain reading frame
  -include_substitutions INCLUDE_SUBSTITUTIONS
                        If you are running DMS but only want to insert or delete AA
  -barcode_start BARCODE_START
                        To run DIMPLE multiple times, you will need to avoid using the same barcodes. This allows you to start at a different barcode.
  -restriction_sequence RESTRICTION_SEQUENCE
                        Recommended using BsmBI - CGTCTC(G)1/5 or BsaI - GGTCTC(G)1/5. Do not use N
  -avoid_sequence AVOID_SEQUENCE [AVOID_SEQUENCE ...]
                        Avoid these sequences in the backbone - BsaI and BsmBI. For multiple sequnces use a space between inputs. Example -avoid_sequence CGTCTC GGTCTC
  -include_stop_codons  Include stop codons in the list of scanning mutations.
  -include_synonymous   Include synonymous codons in the list of scanning mutations.
  -make_double          Make each combination of mutations within a fragment
  -maximize_nucleotide_change
                        Maximize the number of nucleotide changes in each codon for easier detection in NGS

Example output

Example output files are located in the examples directory.

Running test

To test DIMPLE, run the following command from the root directory:

uv run pytest
# or, without uv:
pytest

This should pass without any errors. If you encounter any issues, please open an issue on the GitHub repository.

Citing DIMPLE

If you found DIMPLE useful, feel free to cite the publication describing it:

License

This code is licensed under the terms of the MIT license: License

Contributing

Contributions and feedback are welcome. Please submit an issue or pull request.

Getting help

For any issues, please open an issue on the GitHub repository. For questions or feedback, email Chris.

About

Coyote-Maestas Final Version

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 96.1%
  • Jupyter Notebook 3.9%