A reverse-engineered parser and writer for SnapGene .dna files (DNA, RNA, protein). Supports all 17 known block types with typed Python models, a chainable builder pattern, and a history operations API.
Important
Found an unknown block type? Run sff check your_file.dna -l — blocks marked [NEW] are genuinely unknown, [*] are known but undecoded. Please report [NEW] blocks in #1 with a dump (sff check your_file.dna -d).
pip install sgffpRequires Python 3.10+.
from sgffp import SgffReader, SgffWriter, SgffObject
# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")
# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)
# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")
# Create a new file from scratch
sgff = (
SgffObject.new("ATGCATGCATGC", topology="circular")
.add_feature("GFP", "CDS", 0, 8)
.add_primer("fwd", "ATGC", bind_position=0)
)
SgffWriter.to_file(sgff, "new_plasmid.dna")Record cloning operations with automatic history tracking:
sgff.ops.insert_fragment("ATCGATCG")
sgff.ops.digest("GGCC", InputSummary={"manipulation": "digest"})
# Or build an entire tree from multiple source files
vector = SgffReader.from_file("vector.dna")
insert = SgffReader.from_file("insert.dna")
sgff.ops.build_from_spec(
[
{"id": 1, "operation": "insertFragment", "sequence": "...",
"name": "Final", "children": [2, 3]},
{"id": 2, "source": vector},
{"id": 3, "source": insert},
],
final_sequence="...",
)SnapGene files use a TLV (Type-Length-Value) binary format after a 19-byte header. Each block has a 1-byte type ID and a 4-byte length, with encoding varying by type: UTF-8 for sequences, XML for annotations, 2-bit GATC encoding for compressed DNA, LZMA for history, and ZTR for chromatogram traces.
SgffReader parses blocks via the SCHEME dispatch table and stores them in SgffObject.blocks (a Dict[int, List]). Typed model properties (sgff.sequence, sgff.features, sgff.history, etc.) are lazily loaded from the blocks dict and sync changes back automatically. SgffWriter serializes blocks back to binary in sorted order.
| ID | Block Type | Format | Model |
|---|---|---|---|
| 0 | DNA Sequence | UTF-8 | SgffSequence |
| 1 | Compressed DNA | 2-bit GATC | SgffSequence |
| 5 | Primers | XML | SgffPrimerList |
| 6 | Notes | XML | SgffNotes |
| 7 | History Tree | LZMA + XML | SgffHistory |
| 8 | Sequence Properties | XML | SgffProperties |
| 10 | Features | XML | SgffFeatureList |
| 11 | History Nodes | Binary + TLV | SgffHistory |
| 14 | Custom Enzyme Sets | XML | |
| 16 | Trace Container | Binary + TLV | SgffTraceList |
| 17 | Alignable Sequences | XML | SgffAlignmentList |
| 18 | ZTR Trace (in 16) | ZTR | SgffTrace |
| 20 | Strand Colors | XML | |
| 21 | Protein Sequence | UTF-8 | SgffSequence |
| 23 | File Attachments | Binary + zlib XML | SgffAttachmentList |
| 27 | Trace Alignment | BGZF + BAM | SgffTraceAlignment |
| 28 | Enzyme Visibilities | XML | |
| 29 | History Modifier | LZMA + XML | SgffHistory |
| 30 | History Content | LZMA + TLV | SgffHistory |
| 32 | RNA Sequence | UTF-8 | SgffSequence |
| 34 | RNA Structure | LZMA + JSON |
Blocks 2, 3, 13, 35 are auto-generated by SnapGene and intentionally skipped.
sff parse plasmid.dna # Export to JSON
sff info plasmid.dna -v # Show detailed file info
sff tree plasmid.dna # Display history timeline
sff check plasmid.dna -l # List block types
sff filter plasmid.dna -k 0,10 -o minimal.dnaAll read commands accept stdin (cat file.dna | sff info).
git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync --dev
# Run tests
uv run pytest tests/ -v
# Docs (VitePress)
cd docs && bun install && bun run docs:devFull guides, API reference, CLI reference, and binary format specification:
This project would not have been possible without previous work done by
- Damien Goutte-Gattat, see his PDF on SGFF structure: https://incenp.org/dvlpt/docs/binary-sequence-formats/binary-sequence-formats.pdf
- Isaac Luo, for his version of SnapGene reader: https://github.com/IsaacLuo/SnapGeneFileReader
- Kale Kundert, for autosnapgene, a SnapGene automation tool: https://github.com/kalekundert/autosnapgene
Also would like to say thank for the people who helped the project:
- Manuel Lera-Ramirez (@manulera) for his PRs and suggestions
- Cory Tobin (@cory-mozza) for reviewing new blocks
Distributed under MIT licence, see LICENSE for more.