Skip to content

ProjectBlockRegistry: add shared block directories with recursive scan #19

@nicolasguelfi

Description

@nicolasguelfi

Description

Enhance ProjectBlockRegistry to support shared block directories with recursive scanning, enabling multi-module projects to share common blocks (glossary, references, trainers, exercise templates) from a single source of truth.

Currently, ProjectBlockRegistry scans only its own blocks_dir (flat, non-recursive). Projects with multiple modules that need to share blocks (e.g., a training with ai4se6d_genai_intro, ai4se6d_vibecoding, ai4se6d_gensem sharing a glossary) must duplicate block files across modules, leading to definition drift and maintenance burden.

Motivation

Real-world problem encountered: A 6-day training project has 3 modules that each need the same glossary, trainer profile slides, and reference bibliography blocks. Despite creating a shared-blocks/blocks/ directory with a shared glossary (bck_shared_glossary.py), the ProjectBlockRegistry cannot discover it. Each module ended up with its own bck_glossary.py with diverging definitions of the same terms (e.g., VibeCoding defined as "pair programmer" in one module and "without closely reviewing" in another).

The setup.py convention already adds shared-blocks/ to sys.path, and book.py already adds shared-blocks/static/ to static sources via set_static_sources(). Block sharing is the missing piece.

Use cases:

  1. Intra-project sharing (immediate): Multiple modules in a training share glossary, references, trainer profiles
  2. Inter-project sharing (future): A library of reusable block templates installed as a Python package
  3. Organized shared blocks: Shared blocks organized in subdirectories by theme (closing/, trainers/, exercises/)

Proposed Solution

1. New constructor parameters

class ProjectBlockRegistry:
    def __init__(
        self,
        blocks_dir: Path,
        shared: list[Path] | None = None,   # Explicit shared directories
        auto_shared: bool = True,            # Auto-detect convention dir
    ):
  • shared: explicit list of additional directories to scan for bck_*.py files
  • auto_shared (default True): automatically detect blocks_dir.parent.parent / "shared-blocks" / "blocks" if it exists. ON by default because the convention already exists (setup.py adds shared-blocks/ to sys.path, book.py adds shared-blocks/static/ to static sources).

2. Resolution priority (first match wins)

1. Local blocks_dir (project's own blocks/)
2. Explicit shared dirs (in order given)
3. Auto-detected shared-blocks/blocks/ (convention)

A local block with the same name as a shared block shadows it (local wins).

3. Recursive scan (rglob)

Replace glob("bck_*.py") with rglob("bck_*.py") for all directories (local and shared). This enables subdirectory organization:

shared-blocks/blocks/
  closing/
    bck_shared_glossary.py
    bck_shared_references.py
  trainers/
    bck_trainer_ng.py
    bck_trainer_ts.py
  exercises/
    bck_exercise_template.py

All discovered as blocks.bck_shared_glossary, blocks.bck_trainer_ng, etc. — subdirectories are organizational, not namespacing.

Exclusion: Files inside _atomic/ subdirectories are skipped (already loaded explicitly by composite blocks via load_atomic_block()).

4. Name collision handling: warning + first wins

If two bck_*.py files in different subdirectories share the same stem, log a warning and keep the first one found (deterministic via sorted()):

logger.warning(
    "Block name collision: '%s' found in both '%s' and '%s'. Keeping first occurrence.",
    name, existing_path, new_path,
)

5. Enriched manifest

manifest[name] = {
    "path": str(path),
    "loaded": False,
    "type": "composite" | "atomic",
    "source": "local" | "shared",           # NEW
    "source_dir": str(source_directory),     # NEW
    "subdirectory": "trainers" | None,       # NEW — relative subdir or None if root
    "shadows": str(shadowed_path) | None,    # NEW — path of shadowed shared block
}

6. New/updated public methods

# Updated — new optional filter
def list_blocks(self, block_type=None, source=None) -> list:
    """Filter by type ('atomic'/'composite') and/or source ('local'/'shared')."""

# New
def get_source(self, block_name: str) -> str:
    """Return 'local' or 'shared'."""

# New
def get_shared_dirs(self) -> list[Path]:
    """Return the resolved shared directories (explicit + auto-detected)."""

# Updated — new fields
def get_stats(self) -> dict:
    """Now includes 'local', 'shared', 'shadowed' counts."""

# Updated — shows shared_dirs count
def __repr__(self) -> str

7. Internal refactoring

Extract a reusable _scan_directory(directory, source) method used by _build_manifest() for both local and shared dirs:

def _scan_directory(self, directory: Path, source: str) -> Dict[str, dict]:
    """Scan a directory recursively for bck_*.py files, skipping _atomic/."""
    composites = self._detect_composites_in(directory)
    found = {}
    for path in sorted(directory.rglob("bck_*.py")):
        if "_atomic" in path.parts:
            continue
        name = path.stem
        if name in found:
            logger.warning(
                "Block name collision: '%s' in '%s' and '%s'. Keeping first.",
                name, found[name]["path"], path,
            )
            continue
        found[name] = {
            "path": str(path),
            "loaded": False,
            "type": "composite" if name in composites else "atomic",
            "source": source,
            "source_dir": str(directory),
            "subdirectory": str(path.parent.relative_to(directory)) if path.parent != directory else None,
        }
    return found

Extract _detect_composites_in(directory) as a @staticmethod (currently _detect_composites is hardcoded to self.blocks_dir).

Files to modify

File Changes
streamtex/blocks.py ProjectBlockRegistry.__init__, _build_manifest, new _scan_directory, _detect_composites_in staticmethod, updated list_blocks, get_stats, __repr__, new get_source, get_shared_dirs
tests/test_blocks.py New test cases (see below)
README.md / docs Document shared blocks convention

Tests to add

Test Verifies
test_no_shared_dir Registry without shared-blocks works exactly as before
test_auto_shared_detection Convention shared-blocks/blocks/ auto-detected when exists
test_auto_shared_disabled auto_shared=False disables detection
test_auto_shared_missing_dir Missing convention dir silently ignored
test_explicit_shared shared=[Path(...)] adds blocks
test_explicit_shared_missing Non-existent explicit path silently ignored
test_local_overrides_shared Local block with same name shadows shared
test_shared_priority_order shared[0] wins over shared[1]
test_manifest_source_field Each entry has correct source
test_manifest_subdirectory_field Blocks in subdirs have subdirectory set
test_list_blocks_filter_source list_blocks(source="shared") works
test_get_source Returns correct source
test_shadowed_detection Shadowed block has shadows field set
test_recursive_scan bck_*.py in subdirectories found
test_atomic_excluded Files in _atomic/ skipped
test_name_collision_warning Same name in different subdirs logs warning, first wins
test_invalidate_rescans After invalidate(), shared dirs re-scanned
test_relative_shared_path Relative path resolved from blocks_dir
test_flat_backward_compat Flat project without subdirs works identically

Backward compatibility

  • No breaking changes: shared defaults to None, auto_shared defaults to True
  • Projects without shared-blocks/ directory: auto_shared=True detects nothing, behavior identical to current
  • Projects with flat blocks/ (no subdirs): rglob("bck_*.py") finds the same files as glob("bck_*.py")
  • get() method unchanged (uses manifest paths, already supports arbitrary locations)
  • Existing __init__.py files in projects work without modification (the registry handles everything internally)

Workaround (currently deployed)

Until this feature is implemented, the ai4se6d project uses a chained-registry workaround in each module's blocks/__init__.py:

# Fallback: shared blocks (convention: ../shared-blocks/blocks/)
_shared_dir = Path(__file__).resolve().parent.parent.parent / "shared-blocks" / "blocks"
_shared_registry = ProjectBlockRegistry(_shared_dir) if _shared_dir.exists() else None

def __getattr__(name: str):
    try:
        return registry.get(name)
    except (BlockNotFoundError, BlockImportError):
        pass
    if _shared_registry:
        try:
            return _shared_registry.get(name)
        except (BlockNotFoundError, BlockImportError):
            pass
    raise AttributeError(f"Block '{name}' not found in local or shared blocks")

This works but duplicates logic across every module. Solution B internalizes this in the library.

Environment

Key Value
StreamTeX 0.6.8
Python 3.10.8
OS Darwin 25.3.0 arm64
UV 0.10.2
Project ai4se6d
Branch main
Commit 91460e4

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions