testing + overall architecture refactors#2
Merged
Merged
Conversation
Moved tests.
Phase 0 of the module-migration refactor. The DIMPLE class and addgene FASTA loader move verbatim into a new leaf module, DIMPLE/core.py, which depends only on DIMPLE.utilities and biopython. DIMPLE.py imports and re-exports both for back-compat; run_settings.py imports the class from core directly. This breaks the import cycle that would otherwise arise once the functional modules (primers, qc, fragment_layout, oligo_assembly, outputs) own their implementations: core is now a true leaf, functional modules depend on it, and the DIMPLE.py orchestrator depends on both. No behavior change; full suite green (63 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 + 2 of the module-migration refactor. The ten functions that the
architecture skeleton had stubbed as re-export shims now own real
implementations in their focused modules:
- primers.py find_geneprimer, find_fragment_primer, check_nonspecific
- fragment_layout.py recalculate_num_fragments, switch_fragmentsize, check_overhangs
- oligo_assembly.py combine_fragments
- outputs.py print_all
- qc.py post_qc, check_final_assembly
Each module imports the DIMPLE class from the core.py leaf, so the import
graph is a clean DAG: core <- {functional modules} <- DIMPLE.py.
DIMPLE.py is reduced from ~2.3k to ~1.2k lines: it now holds only the
generate_DMS_fragments orchestrator and align_genevariation, and
re-exports the migrated functions at end-of-file so existing
`from DIMPLE.DIMPLE import ...` callers (run_dimple, run_dimple_gui,
tests) are unaffected. The _legacy_* aliases are removed and now-unused
imports (numpy, pydna, MeltingTemp, findORF) pruned.
Also fixed in passing:
- combine_fragments: the old shim declared a wrong 1-arg signature;
the real (tandem, num_frag_per_oligo, split) signature is restored.
- test_qc.py: mock patch target moved to DIMPLE.qc.check_final_assembly
since post_qc now resolves that call within the qc module.
Full suite green (63 passed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First phase of splitting run-wide configuration off the DIMPLE class. New DIMPLE/pool.py owns three cohesive things: - DimpleRuntimeConfig (moved verbatim from run_settings.py) - Pool (new: a run's genes + their shared config) - addgene (moved from core.py; now builds and returns a Pool) A DIMPLE run is a batch of genes producing one oligo pool; the pipeline has always passed this around as a bare `OLS` list. Pool gives it a class and a home for the run config. Pool is iterable/indexable over its genes, so it substitutes transparently for the old list. Each gene gets a `gene.pool` back-reference. addgene moves out of core.py because returning a Pool would otherwise make core import pool and re-create the cycle Phase 0 removed. core.py is now purely the DIMPLE gene class. addgene's `config` arg is transitional- optional (falls back to the run_settings singleton) so existing callers are unaffected; Phase 3 makes it required. Read paths are unchanged: the dual-write keeps the DIMPLE class attrs authoritative, so pool.config is stored but not yet read. No behavior change. Suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`OLS` (Oligo Library Synthesis) is Agilent's term for an oligo pool. The pipeline used it as the name of the gene-list / pool passed between stages. Rename every `OLS` identifier to `pool` so the codebase is vendor-agnostic and the name matches the type it now holds (a `Pool` since Phase 1 of the Pool migration). Pure mechanical rename of an identifier; all call sites pass positionally so no signatures break. Suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First slice of Phase 2: pipeline functions read run configuration from
the Pool's config object instead of DIMPLE class attributes.
- qc.py: post_qc / check_final_assembly read enzyme via pool.config /
gene.pool.config.
- fragment_layout.py: primerBuffer and cutsite_overhang reads flip to
gene.pool.config (primer_buffer / cutsite_overhang).
- conftest.py: dimple_state fixture now resets the run_settings runtime
config singleton per test, so each test's addgene() rebuilds the
config from that test's class state (no stale cache leak now that
pool.config is actually read).
- test_qc.py: builds a real Pool + DimpleRuntimeConfig instead of
poking DIMPLE.enzyme and passing bare lists/objects.
The dual-write still keeps DIMPLE class attrs valid for not-yet-flipped
readers. Suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se 2) generate_DMS_fragments and align_genevariation now read run configuration from pool.config (105 DIMPLE.<attr> reads flipped); the config-splat block at the top of generate_DMS_fragments stays as dual-write for not-yet-flipped readers (core.py). find_geneprimer and combine_fragments take raw sequences with no gene in scope, so they gain a `pool` parameter; their three call sites in generate_DMS_fragments pass it. primers.py keeps DIMPLE.primerTm as a class constant (find_fragment_primer); oligo_assembly.py no longer needs the DIMPLE import. maxfrag fix: the insert/delete/dis branch adjusts pool.config.maxfrag, but switch_fragmentsize runs before the per-gene gene.maxfrag is seeded. Previously gene.maxfrag fell through to the class attr that line 248 wrote; now the per-gene maxfrag is explicitly seeded from pool.config.maxfrag before that early switch_fragmentsize call. Golden regression output is byte-identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DIMPLE.__init__ gains a `pool` parameter, sets self.pool, and reads run configuration from pool.config; addgene passes the pool into the constructor (the post-construction `instance.pool = pool` assignment is gone). The breaksites setter reads policy fields from self.pool.config. The maxfrag try/except in __init__ (which lazily created the DIMPLE.maxfrag class attr) becomes an explicit "use config.maxfrag, else derive from synth_len" — config.maxfrag is a real dataclass field, always present. addgene's barcode-count print reads config.barcode_f. test_run_settings.py: the two breaksites-policy tests build a Pool + DimpleRuntimeConfig instead of poking DIMPLE class attrs. With this, all pipeline code reads run config from pool.config. The only remaining DIMPLE.<config> references are the dual-write splats and run_settings internals, all removed in Phase 3. Suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 begins: move callers off implicit class-attr configuration onto an explicit DimpleRuntimeConfig, while the dual-write still stands. - Bundled barcode FASTAs now load in pool.py (module level); the barcode_f/barcode_r config fields default-factory to a fresh copy each (barcodes are consumed per run). This is where the DIMPLE class body's barcode load is headed. - core.py __init__ sets self.maxfrag (per-gene, seeded from config) so a gene fresh from addgene has it as a real instance attribute -- needed by standalone switch_fragmentsize/check_overhangs calls. - test_addgene, test_check_overhangs, test_dms_pipeline_kir build a DimpleRuntimeConfig and pass it to addgene instead of poking DIMPLE class attributes. Regression golden output is byte-identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_dis_pipeline_kir.py builds a DimpleRuntimeConfig and passes it to addgene, matching test_dms_pipeline_kir. The runners already threaded config=runtime_config into every apply_* call; the only gaps were two lines each -- they now build the config with DimpleRuntimeConfig() instead of the get_runtime_config() singleton and pass it to addgene. All addgene callers now supply an explicit config, so the singleton + dual-write scaffolding can be deleted next. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ol Phase 3) The dual-write / singleton / class-attribute scaffolding is removed. Run configuration lives solely on the Pool's DimpleRuntimeConfig. run_settings.py: deleted get_runtime_config, _RUNTIME_CONFIG, reset_runtime_config, and _sync_dimple_from_config. The apply_* helpers now take a config and mutate it directly -- no dual-write to class attrs. validate_insertions gains a config parameter. DIMPLE class (core.py): the ~12 mutable config class-attributes are gone (cutsite, enzyme, barcodes, usage, policies, ...). What remains is three genuine constants (maxfrag_offset, minfrag, primerTm) and per-gene instance state. The dead synth_len property and allhangF/allhangR accumulators are removed; barcode FASTAs load in pool.py. generate_DMS_fragments / qc: config-splat blocks deleted. addgene's config parameter is now required. Tests: test_run_settings rebuilt against the config-only API; conftest's dimple_state snapshot fixture removed (tests isolate by constructing their own config); test_dimple_state_isolation retired (it tested that fixture). Runners build DimpleRuntimeConfig() directly. Full suite green (59). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The comment referenced DIMPLE.primerBuffer, a class attribute removed in the config-scaffolding deletion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d test files, add input validator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
overly large pr containing three big strands - thanks for the help claude.
DIMPLEgod-class into a per-geneDIMPLEobject plus aPoolcarrying a typedDimpleRuntimeConfig. All runtime config now lives onpool.config; no class-level mutable state remains. The runners (run_dimple.py,run_dimple_gui.py) were converted to the explicit-config flow.DIMPLE/DIMPLE.pyinto focused modules:core,pool,fragment_layout,primers,qc,oligo_assembly,outputs,run_settings.DIMPLE.pyis now ~1.2k lines and holds onlygenerate_DMS_fragments+align_genevariation, plus back-compat re-exports so existingfrom DIMPLE.DIMPLE import ...callers are unaffected.pyproject.toml(hatchling, dev extras: pytest, pytest-cov, black, isort, ruff, mypy),uv.lock, devcontainer (Python 3.12-bookworm), GitHub Actions CI on ubuntu + macos, pinnednumpy==1.26.4/biopython==1.84/pydna==5.4.0, and grew the test suite from one brittle integration test to 63 passing tests acrosstests/unit/andtests/regression/.Highlights
Architecture
DIMPLE/pool.pyowns thePoolobject +DimpleRuntimeConfig(dataclass) +addgenefactory.addgenenow requires an explicit config — no implicit singleton fallback.DIMPLE/core.pyholds only the per-geneDIMPLEclass with instance state + fixed constants (maxfrag_offset,minfrag,primerTm). Pure leaf in the import graph.DIMPLE/run_settings.pyis the shared CLI/GUI helper layer (apply_handle,resolve_codon_usage,apply_restriction_settings,compute_overlaps_and_maxfrag,apply_barcode_start,normalize_avoid_list,apply_random_seed,apply_runtime_policies,validate_insertions,apply_instance_settings,configure_dimple_logging).input()sites (findORF, thebreaksitessetter, the linked-gene prompt inalign_genevariation) are now gated bycfg.non_interactivepluslink_policy/breaksite_change_policy. Headless runs never block on a prompt.--non_interactive,--orf_index,--link_policy,--breaksite_change_policy.Tests
slow,interactive) +--update-goldenflag for regenerating goldens.tests/unit/(~61 cases across 8 modules): addgene, check_overhangs, codon_usage, find_fragment_primer, findorf, parse_custom_mutations, qc paths, full run_settings surface (including the new validator).tests/regression/: full Kir DMS pipeline + Kir DIS pipeline, byte-compared againsttests/expected/goldens.tests/conftest.py: lean fixtures (dimple_human_usage,kir_fa,update_golden). The olddimple_stateclass-attribute snapshot/restore fixture is gone — tests are isolated by construction now.Tooling / packaging
uv+pyproject.tomlis the primary local setup path; Conda envs remain as alternatives.Bug fixes that came along for the ride
gene.designed_variants— causedKeyErrorduring oligo assembly. Fixed; covered bytests/regression/test_dis_pipeline_kir.py.DIMPLE.enzyme,DIMPLE.cutsite,DIMPLE.dms) — switched toruntime_config.*. CLI smoke-tested on Kir DMS; GUI verified on Shaker.align_genevariationhad a 2024 regression that wrapped a list-comp in(...)parens, turning it into[<generator>]. Fixed.find_mutations(utilities.py, never called),ochre/amber/opalmethods onDIMPLE(broken since 2019 first commit — referencedself.usage_ecoli/self.usage_humaninstance attrs that have never been assigned in any version of the codebase), empty__getitem__stub.resolve_codon_usagenow validates resolved tables (64 codons, values in [0,1]) and the file-path branch emits a "did you mean ecoli/human?" hint on a missing path instead of a bareFileNotFoundError.Migration impact
from DIMPLE.DIMPLE import ...callers (e.g.addgene,generate_DMS_fragments,post_qc,print_all) keep working via back-compat re-exports. New code should prefer the explicit modules (DIMPLE.pool,DIMPLE.run_settings, etc.).addgenesignature changed — nowaddgene(genefile, config)instead ofaddgene(genefile)reading class-level state. Returns aPool, not a bare list (the Pool object is iterable/indexable so most call patterns still work).colabbranch and will need a follow-up update.