Skip to content

testing + overall architecture refactors#2

Merged
odcambc merged 26 commits into
masterfrom
architecture
May 24, 2026
Merged

testing + overall architecture refactors#2
odcambc merged 26 commits into
masterfrom
architecture

Conversation

@odcambc
Copy link
Copy Markdown
Owner

@odcambc odcambc commented May 24, 2026

overly large pr containing three big strands - thanks for the help claude.

  1. Pool migration — split the DIMPLE god-class into a per-gene DIMPLE object plus a Pool carrying a typed DimpleRuntimeConfig. All runtime config now lives on pool.config; no class-level mutable state remains. The runners (run_dimple.py, run_dimple_gui.py) were converted to the explicit-config flow.
  2. Module split — moved function bodies out of the ~2.3k-line DIMPLE/DIMPLE.py into focused modules: core, pool, fragment_layout, primers, qc, oligo_assembly, outputs, run_settings. DIMPLE.py is now ~1.2k lines and holds only generate_DMS_fragments + align_genevariation, plus back-compat re-exports so existing from DIMPLE.DIMPLE import ... callers are unaffected.
  3. Test + tooling foundation — added pyproject.toml (hatchling, dev extras: pytest, pytest-cov, black, isort, ruff, mypy), uv.lock, devcontainer (Python 3.12-bookworm), GitHub Actions CI on ubuntu + macos, pinned numpy==1.26.4 / biopython==1.84 / pydna==5.4.0, and grew the test suite from one brittle integration test to 63 passing tests across tests/unit/ and tests/regression/.

Highlights

Architecture

  • New DIMPLE/pool.py owns the Pool object + DimpleRuntimeConfig (dataclass) + addgene factory. addgene now requires an explicit config — no implicit singleton fallback.
  • New DIMPLE/core.py holds only the per-gene DIMPLE class with instance state + fixed constants (maxfrag_offset, minfrag, primerTm). Pure leaf in the import graph.
  • New DIMPLE/run_settings.py is the shared CLI/GUI helper layer (apply_handle, resolve_codon_usage, apply_restriction_settings, compute_overlaps_and_maxfrag, apply_barcode_start, normalize_avoid_list, apply_random_seed, apply_runtime_policies, validate_insertions, apply_instance_settings, configure_dimple_logging).
  • All three interactive input() sites (findORF, the breaksites setter, the linked-gene prompt in align_genevariation) are now gated by cfg.non_interactive plus link_policy / breaksite_change_policy. Headless runs never block on a prompt.
  • New CLI flags: --non_interactive, --orf_index, --link_policy, --breaksite_change_policy.

Tests

  • pytest is the configured runner; markers (slow, interactive) + --update-golden flag for regenerating goldens.
  • tests/unit/ (~61 cases across 8 modules): addgene, check_overhangs, codon_usage, find_fragment_primer, findorf, parse_custom_mutations, qc paths, full run_settings surface (including the new validator).
  • tests/regression/: full Kir DMS pipeline + Kir DIS pipeline, byte-compared against tests/expected/ goldens.
  • tests/conftest.py: lean fixtures (dimple_human_usage, kir_fa, update_golden). The old dimple_state class-attribute snapshot/restore fixture is gone — tests are isolated by construction now.

Tooling / packaging

  • uv + pyproject.toml is the primary local setup path; Conda envs remain as alternatives.
  • README updated to document both paths.
  • CI matrix: ubuntu-latest + macos-latest, Python 3.12. (3.13 deferred — biopython compatibility.)

Bug fixes that came along for the ride

  • DIS sequences weren't being registered in gene.designed_variants — caused KeyError during oligo assembly. Fixed; covered by tests/regression/test_dis_pipeline_kir.py.
  • Runner CLI/GUI were reading stale class attributes (DIMPLE.enzyme, DIMPLE.cutsite, DIMPLE.dms) — switched to runtime_config.*. CLI smoke-tested on Kir DMS; GUI verified on Shaker.
  • align_genevariation had a 2024 regression that wrapped a list-comp in (...) parens, turning it into [<generator>]. Fixed.
  • Dead code removed: find_mutations (utilities.py, never called), ochre/amber/opal methods on DIMPLE (broken since 2019 first commit — referenced self.usage_ecoli/self.usage_human instance attrs that have never been assigned in any version of the codebase), empty __getitem__ stub.
  • resolve_codon_usage now validates resolved tables (64 codons, values in [0,1]) and the file-path branch emits a "did you mean ecoli/human?" hint on a missing path instead of a bare FileNotFoundError.

Migration impact

  • Public surfacefrom DIMPLE.DIMPLE import ... callers (e.g. addgene, generate_DMS_fragments, post_qc, print_all) keep working via back-compat re-exports. New code should prefer the explicit modules (DIMPLE.pool, DIMPLE.run_settings, etc.).
  • addgene signature changed — now addgene(genefile, config) instead of addgene(genefile) reading class-level state. Returns a Pool, not a bare list (the Pool object is iterable/indexable so most call patterns still work).
  • The Colab notebook on this branch is stale — it still uses the pre-Pool API. The canonical user-facing Colab lives on the colab branch and will need a follow-up update.

odcambc and others added 26 commits May 20, 2026 16:42
Phase 0 of the module-migration refactor. The DIMPLE class and addgene
FASTA loader move verbatim into a new leaf module, DIMPLE/core.py, which
depends only on DIMPLE.utilities and biopython. DIMPLE.py imports and
re-exports both for back-compat; run_settings.py imports the class from
core directly.

This breaks the import cycle that would otherwise arise once the
functional modules (primers, qc, fragment_layout, oligo_assembly,
outputs) own their implementations: core is now a true leaf, functional
modules depend on it, and the DIMPLE.py orchestrator depends on both.

No behavior change; full suite green (63 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 + 2 of the module-migration refactor. The ten functions that the
architecture skeleton had stubbed as re-export shims now own real
implementations in their focused modules:

  - primers.py          find_geneprimer, find_fragment_primer, check_nonspecific
  - fragment_layout.py  recalculate_num_fragments, switch_fragmentsize, check_overhangs
  - oligo_assembly.py   combine_fragments
  - outputs.py          print_all
  - qc.py               post_qc, check_final_assembly

Each module imports the DIMPLE class from the core.py leaf, so the import
graph is a clean DAG: core <- {functional modules} <- DIMPLE.py.

DIMPLE.py is reduced from ~2.3k to ~1.2k lines: it now holds only the
generate_DMS_fragments orchestrator and align_genevariation, and
re-exports the migrated functions at end-of-file so existing
`from DIMPLE.DIMPLE import ...` callers (run_dimple, run_dimple_gui,
tests) are unaffected. The _legacy_* aliases are removed and now-unused
imports (numpy, pydna, MeltingTemp, findORF) pruned.

Also fixed in passing:
  - combine_fragments: the old shim declared a wrong 1-arg signature;
    the real (tandem, num_frag_per_oligo, split) signature is restored.
  - test_qc.py: mock patch target moved to DIMPLE.qc.check_final_assembly
    since post_qc now resolves that call within the qc module.

Full suite green (63 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First phase of splitting run-wide configuration off the DIMPLE class.

New DIMPLE/pool.py owns three cohesive things:
  - DimpleRuntimeConfig  (moved verbatim from run_settings.py)
  - Pool                 (new: a run's genes + their shared config)
  - addgene              (moved from core.py; now builds and returns a Pool)

A DIMPLE run is a batch of genes producing one oligo pool; the pipeline
has always passed this around as a bare `OLS` list. Pool gives it a class
and a home for the run config. Pool is iterable/indexable over its genes,
so it substitutes transparently for the old list. Each gene gets a
`gene.pool` back-reference.

addgene moves out of core.py because returning a Pool would otherwise
make core import pool and re-create the cycle Phase 0 removed. core.py is
now purely the DIMPLE gene class. addgene's `config` arg is transitional-
optional (falls back to the run_settings singleton) so existing callers
are unaffected; Phase 3 makes it required.

Read paths are unchanged: the dual-write keeps the DIMPLE class attrs
authoritative, so pool.config is stored but not yet read. No behavior
change. Suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`OLS` (Oligo Library Synthesis) is Agilent's term for an oligo pool. The
pipeline used it as the name of the gene-list / pool passed between
stages. Rename every `OLS` identifier to `pool` so the codebase is
vendor-agnostic and the name matches the type it now holds (a `Pool`
since Phase 1 of the Pool migration).

Pure mechanical rename of an identifier; all call sites pass positionally
so no signatures break. Suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First slice of Phase 2: pipeline functions read run configuration from
the Pool's config object instead of DIMPLE class attributes.

  - qc.py: post_qc / check_final_assembly read enzyme via pool.config /
    gene.pool.config.
  - fragment_layout.py: primerBuffer and cutsite_overhang reads flip to
    gene.pool.config (primer_buffer / cutsite_overhang).
  - conftest.py: dimple_state fixture now resets the run_settings runtime
    config singleton per test, so each test's addgene() rebuilds the
    config from that test's class state (no stale cache leak now that
    pool.config is actually read).
  - test_qc.py: builds a real Pool + DimpleRuntimeConfig instead of
    poking DIMPLE.enzyme and passing bare lists/objects.

The dual-write still keeps DIMPLE class attrs valid for not-yet-flipped
readers. Suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se 2)

generate_DMS_fragments and align_genevariation now read run configuration
from pool.config (105 DIMPLE.<attr> reads flipped); the config-splat
block at the top of generate_DMS_fragments stays as dual-write for
not-yet-flipped readers (core.py).

find_geneprimer and combine_fragments take raw sequences with no gene in
scope, so they gain a `pool` parameter; their three call sites in
generate_DMS_fragments pass it. primers.py keeps DIMPLE.primerTm as a
class constant (find_fragment_primer); oligo_assembly.py no longer needs
the DIMPLE import.

maxfrag fix: the insert/delete/dis branch adjusts pool.config.maxfrag,
but switch_fragmentsize runs before the per-gene gene.maxfrag is seeded.
Previously gene.maxfrag fell through to the class attr that line 248
wrote; now the per-gene maxfrag is explicitly seeded from
pool.config.maxfrag before that early switch_fragmentsize call. Golden
regression output is byte-identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DIMPLE.__init__ gains a `pool` parameter, sets self.pool, and reads run
configuration from pool.config; addgene passes the pool into the
constructor (the post-construction `instance.pool = pool` assignment is
gone). The breaksites setter reads policy fields from self.pool.config.

The maxfrag try/except in __init__ (which lazily created the DIMPLE.maxfrag
class attr) becomes an explicit "use config.maxfrag, else derive from
synth_len" — config.maxfrag is a real dataclass field, always present.

addgene's barcode-count print reads config.barcode_f.

test_run_settings.py: the two breaksites-policy tests build a Pool +
DimpleRuntimeConfig instead of poking DIMPLE class attrs.

With this, all pipeline code reads run config from pool.config. The only
remaining DIMPLE.<config> references are the dual-write splats and
run_settings internals, all removed in Phase 3. Suite green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 begins: move callers off implicit class-attr configuration onto
an explicit DimpleRuntimeConfig, while the dual-write still stands.

- Bundled barcode FASTAs now load in pool.py (module level); the
  barcode_f/barcode_r config fields default-factory to a fresh copy each
  (barcodes are consumed per run). This is where the DIMPLE class body's
  barcode load is headed.
- core.py __init__ sets self.maxfrag (per-gene, seeded from config) so a
  gene fresh from addgene has it as a real instance attribute -- needed
  by standalone switch_fragmentsize/check_overhangs calls.
- test_addgene, test_check_overhangs, test_dms_pipeline_kir build a
  DimpleRuntimeConfig and pass it to addgene instead of poking DIMPLE
  class attributes. Regression golden output is byte-identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_dis_pipeline_kir.py builds a DimpleRuntimeConfig and passes it to
addgene, matching test_dms_pipeline_kir.

The runners already threaded config=runtime_config into every apply_*
call; the only gaps were two lines each -- they now build the config
with DimpleRuntimeConfig() instead of the get_runtime_config() singleton
and pass it to addgene. All addgene callers now supply an explicit
config, so the singleton + dual-write scaffolding can be deleted next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ol Phase 3)

The dual-write / singleton / class-attribute scaffolding is removed. Run
configuration lives solely on the Pool's DimpleRuntimeConfig.

run_settings.py: deleted get_runtime_config, _RUNTIME_CONFIG,
reset_runtime_config, and _sync_dimple_from_config. The apply_* helpers
now take a config and mutate it directly -- no dual-write to class attrs.
validate_insertions gains a config parameter.

DIMPLE class (core.py): the ~12 mutable config class-attributes are gone
(cutsite, enzyme, barcodes, usage, policies, ...). What remains is three
genuine constants (maxfrag_offset, minfrag, primerTm) and per-gene
instance state. The dead synth_len property and allhangF/allhangR
accumulators are removed; barcode FASTAs load in pool.py.

generate_DMS_fragments / qc: config-splat blocks deleted. addgene's
config parameter is now required.

Tests: test_run_settings rebuilt against the config-only API; conftest's
dimple_state snapshot fixture removed (tests isolate by constructing
their own config); test_dimple_state_isolation retired (it tested that
fixture). Runners build DimpleRuntimeConfig() directly.

Full suite green (59).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The comment referenced DIMPLE.primerBuffer, a class attribute removed in
the config-scaffolding deletion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@odcambc odcambc merged commit 0b5092b into master May 24, 2026
2 checks passed
@odcambc odcambc deleted the architecture branch May 24, 2026 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant