SnapGene File Format Parser

A reverse-engineered parser and writer for SnapGene .dna files (DNA, RNA, protein). Supports all 17 known block types with typed Python models, a chainable builder pattern, and a history operations API.

Important

Found an unknown block type? Run sff check your_file.dna -l — blocks marked [NEW] are genuinely unknown, [*] are known but undecoded. Please report [NEW] blocks in #1 with a dump (sff check your_file.dna -d).

Installation

pip install sgffp

Requires Python 3.10+.

Quick Start

from sgffp import SgffReader, SgffWriter, SgffObject

# Read a SnapGene file
sgff = SgffReader.from_file("plasmid.dna")

# Access data via typed properties
print(sgff.sequence.value)
print(sgff.features[0].name)

# Modify and write back
sgff.sequence.topology = "circular"
SgffWriter.to_file(sgff, "output.dna")

# Create a new file from scratch
sgff = (
    SgffObject.new("ATGCATGCATGC", topology="circular")
    .add_feature("GFP", "CDS", 0, 8)
    .add_primer("fwd", "ATGC", bind_position=0)
)
SgffWriter.to_file(sgff, "new_plasmid.dna")

History Operations

Record cloning operations with automatic history tracking:

sgff.ops.insert_fragment("ATCGATCG")
sgff.ops.digest("GGCC", InputSummary={"manipulation": "digest"})

# Or build an entire tree from multiple source files
vector = SgffReader.from_file("vector.dna")
insert = SgffReader.from_file("insert.dna")

sgff.ops.build_from_spec(
    [
        {"id": 1, "operation": "insertFragment", "sequence": "...",
         "name": "Final", "children": [2, 3]},
        {"id": 2, "source": vector},
        {"id": 3, "source": insert},
    ],
    final_sequence="...",
)

How It Works

SnapGene files use a TLV (Type-Length-Value) binary format after a 19-byte header. Each block has a 1-byte type ID and a 4-byte length, with encoding varying by type: UTF-8 for sequences, XML for annotations, 2-bit GATC encoding for compressed DNA, LZMA for history, and ZTR for chromatogram traces.

SgffReader parses blocks via the SCHEME dispatch table and stores them in SgffObject.blocks (a Dict[int, List]). Typed model properties (sgff.sequence, sgff.features, sgff.history, etc.) are lazily loaded from the blocks dict and sync changes back automatically. SgffWriter serializes blocks back to binary in sorted order.

Supported Block Types

ID	Block Type	Format	Model
0	DNA Sequence	UTF-8	SgffSequence
1	Compressed DNA	2-bit GATC	SgffSequence
5	Primers	XML	SgffPrimerList
6	Notes	XML	SgffNotes
7	History Tree	LZMA + XML	SgffHistory
8	Sequence Properties	XML	SgffProperties
10	Features	XML	SgffFeatureList
11	History Nodes	Binary + TLV	SgffHistory
14	Custom Enzyme Sets	XML
16	Trace Container	Binary + TLV	SgffTraceList
17	Alignable Sequences	XML	SgffAlignmentList
18	ZTR Trace (in 16)	ZTR	SgffTrace
20	Strand Colors	XML
21	Protein Sequence	UTF-8	SgffSequence
23	File Attachments	Binary + zlib XML	SgffAttachmentList
27	Trace Alignment	BGZF + BAM	SgffTraceAlignment
28	Enzyme Visibilities	XML
29	History Modifier	LZMA + XML	SgffHistory
30	History Content	LZMA + TLV	SgffHistory
32	RNA Sequence	UTF-8	SgffSequence
34	RNA Structure	LZMA + JSON

Blocks 2, 3, 13, 35 are auto-generated by SnapGene and intentionally skipped.

CLI

sff parse plasmid.dna           # Export to JSON
sff info plasmid.dna -v         # Show detailed file info
sff tree plasmid.dna            # Display history timeline
sff check plasmid.dna -l        # List block types
sff filter plasmid.dna -k 0,10 -o minimal.dna

All read commands accept stdin (cat file.dna | sff info).

Development

git clone https://github.com/merv1n34k/sgffp.git
cd sgffp
uv sync --dev

# Run tests
uv run pytest tests/ -v

# Docs (VitePress)
cd docs && bun install && bun run docs:dev

Documentation

Full guides, API reference, CLI reference, and binary format specification:

merv1n34k.github.io/sgffp

Acknowledgments

This project would not have been possible without previous work done by

Damien Goutte-Gattat, see his PDF on SGFF structure: https://incenp.org/dvlpt/docs/binary-sequence-formats/binary-sequence-formats.pdf
Isaac Luo, for his version of SnapGene reader: https://github.com/IsaacLuo/SnapGeneFileReader
Kale Kundert, for autosnapgene, a SnapGene automation tool: https://github.com/kalekundert/autosnapgene

Contributions

Also would like to say thank for the people who helped the project:

Manuel Lera-Ramirez (@manulera) for his PRs and suggestions
Cory Tobin (@cory-mozza) for reviewing new blocks

License

Distributed under MIT licence, see LICENSE for more.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
docs		docs
src/sgffp		src/sgffp
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
flake.nix		flake.nix
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SnapGene File Format Parser

Installation

Quick Start

History Operations

How It Works

Supported Block Types

CLI

Development

Documentation

Acknowledgments

Contributions

License

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SnapGene File Format Parser

Installation

Quick Start

History Operations

How It Works

Supported Block Types

CLI

Development

Documentation

Acknowledgments

Contributions

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages