Pocket-Agent

This repository implements a synthetic-data-first pipeline for the Pocket-Agent tool-calling challenge.

Current Status

Phase 1 is implemented:

schema-first tool contract source of truth
synthetic generation spec scaffold
strict output canonicalizer/validator
conversation history contract
smoke test for Phase 1 behavior

Phase 2 is implemented:

synthetic data generator with tool/refusal balance
multi-turn sample generation
prompt dedup and hash export
optional Gemini-based paraphrase augmentation using GEMINI_API_KEY
deterministic local augmentation fallback
smoke test for dataset quality and formatting

Phase 3 is implemented:

QLoRA fine-tuning pipeline for TinyLlama 1.1B (configurable)
training text formatter for prompt/history/response supervision
dry-run path for fast validation before GPU training
adapter export to artifacts/phase3/adapter

Phase 4 is implemented:

local 4-bit quantization pipeline (quanto backend)
adapter merge then quantize flow
artifact size report for <=500 MB gate checks
CPU latency benchmark with 20-turn default prompt pack
dry-run smoke checks for fast validation

Phase 5 is implemented:

grader-facing inference.py contract
runtime with model generation + rule fallback
strict output canonicalization before returning responses
multi-turn CLI chatbot demo with visible tool-call output
smoke test for contract and behavior checks

Phase 6 is implemented:

unified pre-submit gate checker for hard requirements
strict and non-strict verification modes
leakage overlap check against starter/public_test.jsonl when available
forbidden import and run() signature checks for inference.py
demo launch validation and report generation

Phase 1 Test

Run from repository root:

PYTHONPATH=. python scripts/phase1_smoke_test.py

Expected output:

Phase 1 smoke test passed.

Phase 2 Generate Synthetic Data

Place your key in .env:

GEMINI_API_KEY=your_key_here

Run generator from repository root:

PYTHONPATH=. python3 scripts/generate_synthetic_data.py

Optional deterministic mode (no network augmentation):

PYTHONPATH=. python3 scripts/generate_synthetic_data.py --disable-gemini

Outputs:

data/synthetic/train.jsonl
data/synthetic/validation.jsonl
data/synthetic/all.jsonl
data/synthetic/prompt_hashes.txt
data/synthetic/summary.json

Phase 2 Test

PYTHONPATH=. python3 scripts/phase2_smoke_test.py

Expected output:

Phase 2 smoke test passed.

Phase 3 Dry Run (No Model Load)

Validate formatting and dataset wiring quickly:

PYTHONPATH=. python3 scripts/phase3_smoke_test.py

Expected output:

Phase 3 smoke test passed.

You can also inspect dry-run summary directly:

PYTHONPATH=. python3 scripts/train_phase3.py --dry-run --max-train-samples 32 --max-eval-samples 8

Phase 3 Train (Colab T4)

Run full QLoRA fine-tuning:

PYTHONPATH=. python3 scripts/train_phase3.py

Outputs:

artifacts/phase3/adapter/ (LoRA adapter + tokenizer files)
artifacts/phase3/training_summary.json

Phase 4 Quantization (Gate: size <=500 MB)

Dry run first:

PYTHONPATH=. python3 scripts/phase4_smoke_test.py

Quantize adapter + base model:

PYTHONPATH=. python3 scripts/quantize_phase4.py

Quantization outputs:

artifacts/phase4/quantized/
artifacts/phase4/quantization_summary.json

Phase 4 CPU Latency Benchmark (Gate: mean <=200 ms)

Run benchmark on quantized model:

PYTHONPATH=. python3 scripts/benchmark_latency_phase4.py

Output:

artifacts/phase4/latency_summary.json

Phase 5 Inference Contract

Grader entry point is:

inference.py with run(prompt: str, history: list[dict]) -> str

Run smoke validation:

PYTHONPATH=. python3 scripts/phase5_smoke_test.py

Expected output:

Phase 5 smoke test passed.

Phase 5 CLI Demo

Run multi-turn local chatbot demo:

PYTHONPATH=. python3 scripts/demo_cli.py

Demo behavior:

keeps conversation history in memory
prints assistant output each turn
shows wrapped <tool_call>...</tool_call> outputs when tools are selected

Phase 6 Preflight (Final Gate Check)

Development smoke check (non-strict):

PYTHONPATH=. python3 scripts/phase6_smoke_test.py

Strict pre-submit check:

PYTHONPATH=. python3 scripts/preflight_phase6.py

If starter/public_test.jsonl is not present locally during development:

PYTHONPATH=. python3 scripts/preflight_phase6.py --non-strict --allow-pending-leakage

Preflight reports:

artifacts/phase6/preflight_report.json
artifacts/phase6/preflight_report.md

Files Added In Phase 1

configs/tool_schemas.json
configs/synthetic_generation_spec.json
pocket_agent/contracts.py
pocket_agent/schema_loader.py
pocket_agent/output_validator.py
scripts/phase1_smoke_test.py

Files Added In Phase 2

pocket_agent/synthetic_data.py
scripts/generate_synthetic_data.py
scripts/phase2_smoke_test.py

Files Added In Phase 3

configs/training_config.json
pocket_agent/training.py
scripts/train_phase3.py
scripts/phase3_smoke_test.py

Files Added In Phase 4

configs/phase4_config.json
configs/latency_prompts.json
pocket_agent/quantization.py
pocket_agent/latency.py
scripts/quantize_phase4.py
scripts/benchmark_latency_phase4.py
scripts/phase4_smoke_test.py

Files Added In Phase 5

configs/inference_config.json
pocket_agent/inference_runtime.py
inference.py
scripts/demo_cli.py
scripts/phase5_smoke_test.py

Files Added In Phase 6

configs/phase6_config.json
pocket_agent/preflight.py
scripts/preflight_phase6.py
scripts/phase6_smoke_test.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
pocket_agent		pocket_agent
scripts		scripts
.gitignore		.gitignore
ML-PS.docx		ML-PS.docx
ML-PS.md		ML-PS.md
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
~$ML-PS.docx		~$ML-PS.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pocket-Agent

Current Status

Phase 1 Test

Phase 2 Generate Synthetic Data

Phase 2 Test

Phase 3 Dry Run (No Model Load)

Phase 3 Train (Colab T4)

Phase 4 Quantization (Gate: size <=500 MB)

Phase 4 CPU Latency Benchmark (Gate: mean <=200 ms)

Phase 5 Inference Contract

Phase 5 CLI Demo

Phase 6 Preflight (Final Gate Check)

Files Added In Phase 1

Files Added In Phase 2

Files Added In Phase 3

Files Added In Phase 4

Files Added In Phase 5

Files Added In Phase 6

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pocket-Agent

Current Status

Phase 1 Test

Phase 2 Generate Synthetic Data

Phase 2 Test

Phase 3 Dry Run (No Model Load)

Phase 3 Train (Colab T4)

Phase 4 Quantization (Gate: size <=500 MB)

Phase 4 CPU Latency Benchmark (Gate: mean <=200 ms)

Phase 5 Inference Contract

Phase 5 CLI Demo

Phase 6 Preflight (Final Gate Check)

Files Added In Phase 1

Files Added In Phase 2

Files Added In Phase 3

Files Added In Phase 4

Files Added In Phase 5

Files Added In Phase 6

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages