Skip to content

dheerajbatra/adaptive-model-sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Adaptive Model SDK ** will be adding the kit soon **

Intelligent model selection, adaptive pruning for multi-model local or remote systems. Run larger workloads on smaller RAM by coordinating specialists and a coordinator, pruning and quantizing the KV cache adaptively, and improving specialists over time using Autoresearcher learning loops.

Vision

A self-improving hierarchy, not a single giant model:

  • Coordinator handles planning, long context, cross-domain reasoning
  • Specialists own narrow tasks (coding, math, vision, video) and stay small and fast
  • Autoresearcher watches escalations, teaches specialists from coordinator corrections, and promotes updates only when regression-guarded
  • Pruning + quantization are model-aware and pressure-aware, not global constants

The cycle: small model runs → overhang detected → coordinator answers → teaching trace → capability delta → regression-guarded promotion → leaner specialist next run.

Install & Quick Start

Install the package itself:

pip install .

For a home-computer setup that will run local MLX models and the benchmarking tools, install the optional extras:

pip install ".[mlx,bench,plots]"

Then ask the package what your machine can realistically do:

amk doctor

If you want a durable user-local workspace for registry evolution, Autoresearcher artifacts, and runtime evidence, initialize it once:

amk init

One command to see the whole loop

Once amk init and (optionally) the MLX venv are ready, a single command runs the full autoresearcher loop end-to-end — baseline measurement, adaptive KV-cache measurement, regression-guarded promotion, and a report with PNG charts:

amk auto --mlx-python /tmp/mlx-venv/bin/python

Outputs land in <workspace>/artifacts/auto/<timestamp>/:

  • report.md — scorecard with before/after KV footprint and throughput
  • kv_savings.png, throughput.png, convergence.png — charts
  • summary.json — machine-readable rollup

Reference: kv_savings

Each amk auto call also teaches the registry: the observer compares the measured kv_bytes_per_token against the predicted value and applies an EMA-damped, regression-guarded update (lineage + rollback intact) so the next run's policy decisions are grounded in this machine's reality, not the manufacturer's spec sheet.

See docs/closing-the-gap.md for the full story of how the predicted → measured gap was closed.

That reports:

  • bundled default registry
  • initialized user-local workspace when present
  • detected RAM tier
  • likely MLX Python runtimes
  • discovered local models under ~/.mlx/models
  • suggested next commands

After amk init, AMK prefers the user-local registry and artifact workspace. Without initialization, the bundled registry is still used by default. You only need --registry when you want to point at a custom file.

# route a task
amk route --task coding --context-length 16000 --requires-tools

# simulate a foreground session
amk session --task coding \
  --prompt "Update pricing logic for the new packaging model" \
  --requires-business-context --ambiguity-level 0.85

# benchmark a specialist, producing a FrontierVector
amk benchmark --target qwen-coder-specialist

Legacy scripts/*.py entry points still work for backward compat; new work should use python3 -m adaptive_model_kit.cli <subcommand>.

Design Rules

  • Prefer the smallest specialist that can likely succeed.
  • Escalate to the coordinator when ambiguity, cross-domain, or high-impact conditions appear.
  • Make pruning and quantization depend on model profile and runtime pressure, not a global constant.
  • Specialists own tasks strongly but never own truth absolutely — overhang always escalates.
  • High-impact outputs pass through consensus before commit.
  • Every new concept earns its place by closing a loop, not opening one.

Package Layout

src/adaptive_model_kit/
├── models/             # data types: routing, compression, lifecycle, evolution
├── routing.py          # model selection with context-safe fallbacks
├── policies.py         # runtime-aware pruning + quantization decisions
├── profiling.py        # static model priors (attention, modality, KV density)
├── pruning.py          # token scoring and cold/hot decisions
├── tiered_cache.py     # hot/cold KV demotion and restore
├── lifecycle.py        # foreground session loop with escalation + consensus
├── consensus.py        # overhang detection, agent contracts, consensus policy
├── specialist_evolution.py  # teaching traces, capability deltas, frontier
├── evolution/          # closed-loop: benchmark → train → promote
│   ├── benchmark.py    # BenchmarkSuite → FrontierVector from real runs
│   ├── trainer.py      # v0 prompt-hint injection distillation
│   └── promote.py      # registry swap with lineage + rollback
├── artifacts.py        # JSON serialization for all artifact types
├── registry.py         # JSON model registry loader
├── autoresearcher_runtime.py  # session persistence and update detection
├── mlx_executor.py     # real MLX backend executor
└── cli.py              # unified `amk` entry point

Core Design Concepts

Agent Contract. Every model carries owns, can_assist, blind_spots, overhang_conditions, escalation_targets, consensus_required, commit_mode. Specialists complete work locally but know when to defer.

Overhang. The zone where a specialist is locally competent but not globally authoritative: business context, architecture sensitivity, multimodal interpretation, high ambiguity or impact. Detected per-session, triggers escalation.

Consensus. First-class decision: none, reviewer, coordinator, arbitration.

Frontier. FrontierVector tracks 7 dimensions (quality, calibration, escalation quality, latency/token/memory/context efficiency). HeadroomReport surfaces the gap and ranks the cheapest next interventions.

Closed Loop. benchmark measures a specialist on a fixed suite, trainer applies a DistillationBatch, promote updates the registry with lineage once RegressionGuard passes. Rollback swaps back to the prior entry.

More Documentation

  • docs/closing-the-gap.md — technical article on how the predicted→measured gap was closed end-to-end (AdaptiveKVCache → runtime observer → amk auto)
  • docs/validation.md — how the three kit-level claims are tested (improves, fits, persists) + the amk validate scorecard
  • docs/benchmarks.md — KV footprint computed from real ~/.mlx configs (stage-by-stage)
  • docs/real-run-evidence.md — live MLX measurements + the "can a 35B dense run on 8 GiB?" feasibility answer
  • docs/testing.md — repeatable operator playbook with all CLI scenarios
  • docs/autoresearcher-test-cases.md — focused improvement-loop scenarios
  • examples/model-registry.json — reference registry for the five included specialists

About

Intelligent task orchestration and learning using adaptive model sdk.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors