This is a filtering/hypothesis-generating workflow for plasmid detection and characterization. Outputs can be used to identify which isolates are worth committing long-read sequencing resources to for higher-resolution plasmid and MGE characterization.
PlasmidPreview previews plasmid and resistance gene and mutation content from short-read data before long-read sequencing. By combining MOB-recon and AMRFinderPlus across an isolate dataset, it can surface candidates where plasmid context is ambiguous or where mobile resistance elements could warrant additional resolution. Plasmid and AMR detections are summarized and visualized with an R notebook which also generates files that can be used to visualize plasmid and AMR sequence presence across phylogenetic relationships. This workflow can be used to assist labs and surveillance programs to direct resources for long-read sequencing capacity to isolates that are most likley to contain plasmids and mobile resistance. This workflow was developed at the Washington State Department of Health for genomic surveillance workflows.
This workflow detects and characterizes plasmids in bacterial whole genome sequencing (WGS) assemblies generated from Illumina short-read shotgun sequencing. It is based on the plasmid characterization approach described in Sauerborn et al. 2026 (doi: 10.1099/mgen.0.001644), adapted for short-read assemblies.
- Tools
- Input
- Installation
- Usage
- How Results Are Combined
- R Analysis
- Output Files
- Notes on Short-Read Assemblies
- Citations
- MOB-suite v3.1.9 — plasmid contig classification, reconstruction and typing
- AMRFinderPlus v4.0.3 — antimicrobial resistance gene detection
- Database: NCBI Reference Gene Catalog
Assembled contigs in FASTA format, one file per sample inside a folder named assemblies
The scripts in this repo reference the paths listed in the file config/params.sh. Edit config/params.sh to point to your assemblies directory and results path (Note you may need to create directories for your assemblies and results (or use a S3 URI for assemblies):
THREADS=8
RESULTS_DIR=~/plasmid-triage/results
AMR_DB=~/plasmid-triage/amrfinderplus_db
INPUT_PATH=~/plasmid-triage/assembliesInstallation using mamba
conda create -n mamba-env -c conda-forge mamba -y
conda activate mamba-env
mamba create -n plasmid -c conda-forge -c bioconda python=3.9 ncbi-amrfinderplus -y
conda activate plasmid
# MOB-suite is not available on conda and must be installed via pip.
#Dependencies must be installed separately via conda **before** running `mob_init`:
conda install -c bioconda -c conda-forge mash blast muscle
pip install mob-suite
mob_init --database_directory ~/mobsuite_db
# Download AMRFinderPlus database
amrfinder_update --database ./amrfinderplus_db
After creating and activating the conda environment, the bundled database may be outdated. Always update the database before running:
cd ~
mkdir -p ~/tmp
export TMPDIR=~/tmp
amrfinder --updateVerify the software and database versions:
amrfinder --database_versionAs of this writing, the expected versions are:
- Software: 4.2.7
- Database: 2026-05-15.1
Note:
amrfinder --updatemust be run from~(not a subdirectory that may not persist), andTMPDIRmust point to a directory with sufficient space. On EC2 instances,/tmpis often too small to build the BLAST index — redirecting to~/tmpresolves this.
Edit params.sh with paths or S3 buckets to where your assemblies are located and where you want the output data to go.
By default the workflow uses whatever AWS credentials are active in your shell. If you need to access assemblies in a S34 bucket in a different group (ie waphl ) do a one time set up and set the profile name in params.sh
aws configure --profile waphlThen in 'config/params.sh':
AWS_PROFILE_NAME="waphl"Leave 'AWS_PROFILE_NAME=""' to use default credentials The profile is only used for syncing assemblies from S3.
Run scripts in this order from your working directory (plasmid-triage):
note this runs each sample sequentially and will take several minutes per sample for mob-recon TODO parallelize workflow and/or move to sequera or aws batch to make hundreds of samples take the time it takes to run one.
# 1. Classify and reconstruct plasmids
bash scripts/run_mobrecon.sh
# 2. Detect AMR genes on all contigs (plasmid and chromosomal-as determined by mobrecon)
bash scripts/run_amrfinder.sh
# 3. Combine outputs across all samples
bash scripts/combine_contig_reports.sh results/ combined_contig_report.tsv
bash scripts/combine_amrfinder.sh results/ combined_amrfinder.tsvIf the plasmid/chromosome for a given plasmid result is ambiguous, treat the contig as a possible plasmid sequence for filtering- this workflow is a screen, and we want to increase the chances of detecting plasmids at the expense of potentially having some false positives- we do not want to have false negative plasmid calls (ie miss samples that should really go to long-read sequencing for more definitive plasmid detection analysis)
AMRfinderPlus can apply species-specific point mutation information when you provide the species name for some species.
To evaluate if a species is supported run:
amrfinder --list_organismseither leave organism in params.sh blank, or insert species of interest per formatting above in --list_organisms ie Klebsiella_pneumoniae
Note: A single organism value is applied to all samples in the run, for mixed species runs, run separately.
MOB-recon and AMRFinderPlus answer different but complementary questions:
MOB-recon takes your full assembly and classifies every contig as either chromosome or plasmid. It groups plasmid contigs into bins (one bin per reconstructed plasmid) and types each bin for:
- Incompatibility group / replicon type (e.g. IncN, IncF)
- Mobility class (conjugative, mobilizable, non-mobilizable)
- Relaxase and mate-pair formation (MPF) type
AMRFinderPlus searches the plasmid contig FASTAs produced by MOB-recon for antimicrobial resistance genes, point mutations, and virulence factors.
The key linkage is the contig bin — each bin appears in both:
contig_report.txtfrom MOB-recon (which plasmid bin it belongs to, replicon type, mobility- called: primary_cluster_id)- AMRFinderPlus output (which resistance genes it carries- called: plasmid_bin)
Joining on plasmid bins lets you filter (triage) short read shotgun genome sequences for samples where the following questions may be relevant :
- Which Inc groups are carrying which resistance genes?
- Are carbapenemase genes on conjugative plasmids?
- Which plasmid bins carry multiple resistance genes?
Note: these questions can only definitively be addressed with long-read data
Read in the combined outputs into R notebook and join on contig ID to link AMR hits to plasmid metadata using the Rmd notebook in the notebooks section. This notebook creates summary tables, visualizations and exported files that can be used to visualize plasmid and AMR detections across WGS-based phylotenetic relationships.
| File | Description |
|---|---|
results/{sample}/contig_report.txt |
Per-contig MOB-recon classification for each sample |
results/{sample}/plasmid_*.fasta |
Reconstructed plasmid bin sequences |
results/{sample}/chromosome.fasta |
Chromosomal contigs |
results/{sample}/plasmid_*_amrfinder.tsv |
AMR genes per plasmid bin |
results/{sample}/chromosome_amrfinder.tsv |
AMR genes on chromosomal contigs |
combined_contig_report.tsv |
All contig reports merged with sample column |
combined_amrfinder.tsv |
All AMR results merged with sample and plasmid bin columns |
| Column | Description |
|---|---|
sample |
Sample name |
contig_id |
Contig identifier — links to AMRFinderPlus output |
molecule_type |
plasmid or chromosome |
primary_cluster_id |
Plasmid bin identifier |
rep_type |
Replicon/incompatibility group |
relaxase_type |
Relaxase classification |
predicted_mobility |
conjugative / mobilizable / non-mobilizable |
contig_size |
Contig length in bp |
| Column | Description |
|---|---|
sample |
Sample name |
molecule_type |
plasmid or chromosome |
contig_bin |
Which plasmid bin or chromosome this hit came from |
Gene symbol |
Resistance gene name |
Class |
Antibiotic class |
Subclass |
Antibiotic subclass |
% Coverage of reference sequence |
Gene coverage |
% Identity to reference sequence |
Gene identity |
-this workflow may misclassify chromosomal sequences as plasmids- this can happen with multireplicon and large plasmids. This could manifest as no plasmids detected, but mob_recon results multiple contigs marked as chromosomes-ie plasmids may be present but not detected by mob_recon.
- Large plasmids (>100kb) will often be fragmented across multiple contigs assigned to the same
primary_cluster_id— sumcontig_sizewithin a bin to estimate total plasmid size - Some plasmid contigs may be unclassified if they lack known replicon or relaxase sequences
- Results should be interpreted at the plasmid bin level rather than individual contig level
This workflow was conceptulaized and written by Shawn Hawken PhD, MPH at the Washington State Department of Health Molecular Epidemiology Program, Bacterial & Mycotic Unit.
Alpha/Beta testing by Marcela Torres and Dahlia Walters, Washington State Department of Health Molecular Epidemiology Program, Bacterial & Mycotic Unit
If you use this workflow please also cite the following tools:
MOB-suite Robertson J, Nash JHE. (2018) MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microbial Genomics 4(8). doi: 10.1099/mgen.0.000206
AMRFinderPlus Feldgarden M, et al. (2021) AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence. Scientific Reports 11:12728. doi: 10.1038/s41598-021-91456-0
Feldgarden M, et al. (2022) Curation of the AMRFinderPlus databases: applications, functionality and impact. Microbial Genomics 8:mgen000832. doi: 10.1099/mgen.0.000832
Reference workflow inspiration Sauerborn et al. (2026) Resolving plasmid-encoded carbapenem resistance dynamics and reservoirs in a hospital setting through nanopore sequencing. Microbial Genomics 12(2). doi: 10.1099/mgen.0.001644
MIT Licence