Skip to content

iamalishayan/vyro-hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pocket-Agent

This repository implements a synthetic-data-first pipeline for the Pocket-Agent tool-calling challenge.

Current Status

Phase 1 is implemented:

  • schema-first tool contract source of truth
  • synthetic generation spec scaffold
  • strict output canonicalizer/validator
  • conversation history contract
  • smoke test for Phase 1 behavior

Phase 2 is implemented:

  • synthetic data generator with tool/refusal balance
  • multi-turn sample generation
  • prompt dedup and hash export
  • optional Gemini-based paraphrase augmentation using GEMINI_API_KEY
  • deterministic local augmentation fallback
  • smoke test for dataset quality and formatting

Phase 3 is implemented:

  • QLoRA fine-tuning pipeline for TinyLlama 1.1B (configurable)
  • training text formatter for prompt/history/response supervision
  • dry-run path for fast validation before GPU training
  • adapter export to artifacts/phase3/adapter

Phase 4 is implemented:

  • local 4-bit quantization pipeline (quanto backend)
  • adapter merge then quantize flow
  • artifact size report for <=500 MB gate checks
  • CPU latency benchmark with 20-turn default prompt pack
  • dry-run smoke checks for fast validation

Phase 5 is implemented:

  • grader-facing inference.py contract
  • runtime with model generation + rule fallback
  • strict output canonicalization before returning responses
  • multi-turn CLI chatbot demo with visible tool-call output
  • smoke test for contract and behavior checks

Phase 6 is implemented:

  • unified pre-submit gate checker for hard requirements
  • strict and non-strict verification modes
  • leakage overlap check against starter/public_test.jsonl when available
  • forbidden import and run() signature checks for inference.py
  • demo launch validation and report generation

Phase 1 Test

Run from repository root:

PYTHONPATH=. python scripts/phase1_smoke_test.py

Expected output:

Phase 1 smoke test passed.

Phase 2 Generate Synthetic Data

Place your key in .env:

GEMINI_API_KEY=your_key_here

Run generator from repository root:

PYTHONPATH=. python3 scripts/generate_synthetic_data.py

Optional deterministic mode (no network augmentation):

PYTHONPATH=. python3 scripts/generate_synthetic_data.py --disable-gemini

Outputs:

  • data/synthetic/train.jsonl
  • data/synthetic/validation.jsonl
  • data/synthetic/all.jsonl
  • data/synthetic/prompt_hashes.txt
  • data/synthetic/summary.json

Phase 2 Test

PYTHONPATH=. python3 scripts/phase2_smoke_test.py

Expected output:

Phase 2 smoke test passed.

Phase 3 Dry Run (No Model Load)

Validate formatting and dataset wiring quickly:

PYTHONPATH=. python3 scripts/phase3_smoke_test.py

Expected output:

Phase 3 smoke test passed.

You can also inspect dry-run summary directly:

PYTHONPATH=. python3 scripts/train_phase3.py --dry-run --max-train-samples 32 --max-eval-samples 8

Phase 3 Train (Colab T4)

Run full QLoRA fine-tuning:

PYTHONPATH=. python3 scripts/train_phase3.py

Outputs:

  • artifacts/phase3/adapter/ (LoRA adapter + tokenizer files)
  • artifacts/phase3/training_summary.json

Phase 4 Quantization (Gate: size <=500 MB)

Dry run first:

PYTHONPATH=. python3 scripts/phase4_smoke_test.py

Quantize adapter + base model:

PYTHONPATH=. python3 scripts/quantize_phase4.py

Quantization outputs:

  • artifacts/phase4/quantized/
  • artifacts/phase4/quantization_summary.json

Phase 4 CPU Latency Benchmark (Gate: mean <=200 ms)

Run benchmark on quantized model:

PYTHONPATH=. python3 scripts/benchmark_latency_phase4.py

Output:

  • artifacts/phase4/latency_summary.json

Phase 5 Inference Contract

Grader entry point is:

  • inference.py with run(prompt: str, history: list[dict]) -> str

Run smoke validation:

PYTHONPATH=. python3 scripts/phase5_smoke_test.py

Expected output:

Phase 5 smoke test passed.

Phase 5 CLI Demo

Run multi-turn local chatbot demo:

PYTHONPATH=. python3 scripts/demo_cli.py

Demo behavior:

  • keeps conversation history in memory
  • prints assistant output each turn
  • shows wrapped <tool_call>...</tool_call> outputs when tools are selected

Phase 6 Preflight (Final Gate Check)

Development smoke check (non-strict):

PYTHONPATH=. python3 scripts/phase6_smoke_test.py

Strict pre-submit check:

PYTHONPATH=. python3 scripts/preflight_phase6.py

If starter/public_test.jsonl is not present locally during development:

PYTHONPATH=. python3 scripts/preflight_phase6.py --non-strict --allow-pending-leakage

Preflight reports:

  • artifacts/phase6/preflight_report.json
  • artifacts/phase6/preflight_report.md

Files Added In Phase 1

  • configs/tool_schemas.json
  • configs/synthetic_generation_spec.json
  • pocket_agent/contracts.py
  • pocket_agent/schema_loader.py
  • pocket_agent/output_validator.py
  • scripts/phase1_smoke_test.py

Files Added In Phase 2

  • pocket_agent/synthetic_data.py
  • scripts/generate_synthetic_data.py
  • scripts/phase2_smoke_test.py

Files Added In Phase 3

  • configs/training_config.json
  • pocket_agent/training.py
  • scripts/train_phase3.py
  • scripts/phase3_smoke_test.py

Files Added In Phase 4

  • configs/phase4_config.json
  • configs/latency_prompts.json
  • pocket_agent/quantization.py
  • pocket_agent/latency.py
  • scripts/quantize_phase4.py
  • scripts/benchmark_latency_phase4.py
  • scripts/phase4_smoke_test.py

Files Added In Phase 5

  • configs/inference_config.json
  • pocket_agent/inference_runtime.py
  • inference.py
  • scripts/demo_cli.py
  • scripts/phase5_smoke_test.py

Files Added In Phase 6

  • configs/phase6_config.json
  • pocket_agent/preflight.py
  • scripts/preflight_phase6.py
  • scripts/phase6_smoke_test.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages