MemoryManager.remember(sync=True) dominates bulk ingest; YARA p95 plyara tail

Spun out from PR #70 (Phase 3: detection rules first-class), Phase 4
performance benchmark findings #1 and #2.

## Finding #1 — sync=True is the bulk-ingest bottleneck

Both Sigma and YARA ingest currently call ``mm.remember(..., sync=True)``
so the note is persisted + vector-indexed + enrichment-flushed inline
before the caller gets control back. Under bulk ingest (SigmaHQ ~3k
rules, CCCS-Yara ~400 rules) this is the dominant cost.

Phase 4 bench numbers (paste from the perf report when wiring this up):
- Sigma: ``ingest_rules_dir`` on 4 fixtures — ~4s wall, ~95% in
  ``remember(sync=True)``.
- YARA: same pattern — parse is microseconds, persistence is seconds.

## Finding #2 — YARA p95 plyara tail

``plyara.Plyara().parse_string`` has a fat tail under repeated
invocations on large rule files. p50 is fine; p95 can exceed p50 by 10x
on ~50kB multi-rule files (observed on the 3 CCCS-Yara fixtures under a
tight loop).

## Ask

Benchmark ``mm.remember(..., sync=False)`` + explicit ``mm.flush()`` at
end of ``ingest_rules_dir``, and/or introduce a ``bulk=True`` path on
MemoryManager that defers the vector index write. Add a CI bench (pytest
bench plugin is OK) that fails if p95 exceeds a threshold on the
fixtures tree.

For plyara: cache the ``Plyara()`` instance per directory walk (it's
already created per-call, so we're paying grammar-compile cost 400x on
CCCS-Yara). Confirm thread-safety before sharing across workers.

## Deliberately NOT in PR #70

Changing the sync/async boundary risks regressions in existing ingest
paths (OpenCTI sync, enrichment worker). Wants its own PR with real
before/after numbers and a regression bench.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryManager.remember(sync=True) dominates bulk ingest; YARA p95 plyara tail #72

Finding #1 — sync=True is the bulk-ingest bottleneck

Finding #2 — YARA p95 plyara tail

Ask

Deliberately NOT in PR #70

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

MemoryManager.remember(sync=True) dominates bulk ingest; YARA p95 plyara tail #72

Description

Finding #1 — sync=True is the bulk-ingest bottleneck

Finding #2 — YARA p95 plyara tail

Ask

Deliberately NOT in PR #70

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions