Skip to content

Releases: NygenAnalytics/CyteType

0.19.4

02 Apr 03:13
7dbb3c5

Choose a tag to compare

What's Changed

Full Changelog: 0.19.3...0.19.4

0.19.3

08 Mar 22:34
e27bd80

Choose a tag to compare

What's Changed

Full Changelog: 0.19.2...0.19.3

0.19.2

08 Mar 10:33
7a9b4a1

Choose a tag to compare

What's Changed

  • Update version to 0.19.2 and enhance metadata handling in CyteType by @parashardhapola in #71

Full Changelog: 0.19.1...0.19.2

0.19.1

07 Mar 18:21
98cbecd

Choose a tag to compare

What's Changed

Full Changelog: 0.19.0...0.19.1

0.19.0

07 Mar 13:04
bd0a977

Choose a tag to compare

What's Changed

Full Changelog: 0.18.1...0.19.0

0.18.1

03 Mar 20:13
78cc799

Choose a tag to compare

What's Changed

Full Changelog: 0.18.0...0.18.1

0.18.0

03 Mar 15:30
27ae521

Choose a tag to compare

✨ What's New

🧬 Raw Counts in vars.h5 Artifact

  • The save_features_matrix function now writes an optional raw group to the H5 artifact containing integer raw counts (LZ4-compressed CSR).
  • CyteType.__init__ auto-resolves raw counts from adata.layers['counts'], adata.raw.X, or adata.X (if integer-valued), and embeds them alongside normalized counts.

🚀 rank_genes_groups_backed — Memory-Efficient Differential Expression

  • New public function — a drop-in replacement for sc.tl.rank_genes_groups that works on backed/on-disk _CSRDataset matrices.
  • Streams cell chunks in a single pass, computes Welch's t-test (one-vs-rest) with BH or Bonferroni correction, and writes scanpy-compatible output to adata.uns.
  • Exported at cytetype.rank_genes_groups_backed.

✂️ subsample_by_group — Per-Group Cell Subsampling

  • New preprocessing utility that caps each cluster to a configurable maximum number of cells (max_cells_per_group), keeping smaller groups intact.
  • Works with both in-memory and backed AnnData objects.

🔍 Auto-Detection of Gene Symbols Column

  • gene_symbols_column now defaults to None and auto-detects by checking well-known column names (feature_name, gene_symbols, etc.), then adata.var_names, then a heuristic scan of all var columns.
  • Detects and skips composite gene values (e.g., TSPAN6_ENSG00000000003).
  • Candidates are scored by ID-like percentage, uniqueness ratio, and priority — the best non-ID column wins.

🎨 marker_dotplot — Category-Grouped Dot Plot

  • New plotting module (cytetype.plotting) with marker_dotplot that reads stored CyteType results and creates a scanpy dotplot grouped by cluster categories with top supporting marker genes.

⚡ Improvements

🏗️ Artifact Pipeline Restructuring

  • Artifacts (vars.h5, obs.duckdb) are now built during __init__ and uploaded during run(), decoupling build from upload.
  • vars_h5_path and obs_duckdb_path moved from run() to __init__() parameters.
  • New cleanup() method replaces the removed cleanup_artifacts parameter on run().

💾 CSR-Backed Write Path for Normalized Counts

  • New two-pass column-group scatter algorithm (_write_csc_via_row_batches) converts CSR-backed data to CSC in the H5 file without loading the full matrix.
  • Configurable memory budget via WRITE_MEM_BUDGET (default 4 GB).

📊 Expression Percentage Calculation

  • Refactored to use single-pass row-batched accumulation (reuses _accumulate_group_stats) instead of gene-batched pandas groupby.
  • Default pcent_batch_size increased from 2000 to 5000.

☁️ Upload Enhancements

  • vars_h5 max upload size increased from 10 GB to 50 GB.
  • Upload progress now uses tqdm progress bars when available.
  • Default connect timeout increased from 30s to 60s.

📈 Progress Reporting

  • tqdm progress bars added throughout: rank_genes_groups, subsampling, raw counts writing, normalized counts writing, and chunk uploads.

🐛 Bug Fixes / Error Handling

  • 🆕 New ClientDisconnectedError exception for HTTP 499 / CLIENT_DISCONNECTED responses.
  • 🗑️ Removed stale hasattr(adata.var, gene_symbols_col) check (was always True for DataFrames).
  • 🛡️ Raw counts write failures are caught gracefully — the raw group is cleaned up and skipped with a warning.

⚠️ Breaking Changes

  • gene_symbols_column default changed from "gene_symbols" to None (auto-detect).
  • vars_h5_path and obs_duckdb_path moved from run() to CyteType.__init__().
  • cleanup_artifacts parameter removed from run(); use cleanup() method instead.
  • pcent_batch_size default changed from 2000 to 5000.
  • batch_size parameter in aggregate_expression_percentages renamed to cell_batch_size.

0.17.0

23 Feb 23:05
9a8fcf4

Choose a tag to compare

What's Changed

Full Changelog: 0.16.1...0.17.0

0.16.1

20 Feb 22:09
fab6711

Choose a tag to compare

What's Changed

Full Changelog: 0.16.0...0.16.1

0.16.0

19 Feb 14:07
bee38ca

Choose a tag to compare

What's Changed

Full Changelog: 0.15.0...0.16.0