From 154f0f6f3d6f0ee7c81c9e56ccc198e1a75de3ed Mon Sep 17 00:00:00 2001 From: Puneeth <85823685+puneethkotha@users.noreply.github.com> Date: Tue, 17 Mar 2026 01:27:55 -0700 Subject: [PATCH] docs: add CHANGELOG.md for v1.0.0 release Documented notable changes for version 1.0.0, including new features, performance improvements, and infrastructure updates. --- CHANGELOG.md | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 CHANGELOG.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..947d078 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,55 @@ +# Changelog + +All notable changes to Falcon are documented here. +Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). +Versioning follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +--- + +## [1.0.0] - 2026-02-01 + +### Added +- Production ML inference API: `POST /infer` for single-text sentiment classification (negative, neutral, positive) +- Batch inference endpoint: `POST /infer/batch` supporting up to 50 texts per request +- Nginx load balancer routing across multiple FastAPI workers (least-connections algorithm) +- Redis response caching keyed by normalized input for sub-millisecond cache hits +- Redis-backed idempotency via `X-Idempotency-Key` header to prevent duplicate processing +- PostgreSQL async request logging with in-memory buffer (max 1,000 entries) for Postgres failover +- Circuit breaker: opens on 5 consecutive failures, 60s timeout, half-open recovery probe +- Exponential backoff retry: 3 attempts, 100ms–5s delay range +- Graceful shutdown: SIGTERM handler drains in-flight requests and flushes log buffer +- Prometheus metrics: 20+ metrics including latency histograms, cache hit ratio, circuit breaker state, dropped log count +- Grafana pre-provisioned dashboard: RPS, p50/p95/p99 latency, error rate, cache hit ratio, worker health +- Prometheus alert rules: high p95 latency, error rate spike, worker down, Redis/Postgres unhealthy +- k6 load testing suite: baseline (50 VUs/5 min), stress (ramp to 500 VUs), spike (10→300 VUs), soak (100 VUs/10 min) +- Failure injection scripts: `kill_worker.sh`, `redis_down.sh`, `postgres_slow.sh`, `cpu_spike.sh` +- Docker Compose local dev stack: API workers, Nginx, Redis, PostgreSQL, Prometheus, Grafana +- GitHub Actions CI/CD pipeline with automated deployment to GitHub Pages (demo site) +- Full documentation suite: `RUNBOOK.md`, `CAPACITY_PLAN.md`, `SECURITY.md`, `TRADEOFFS.md`, `PERFORMANCE_NOTES.md`, `UBUNTU_DEPLOYMENT.md` + +### Performance +- p95 latency reduced by 30% vs single-worker baseline under 500 VU load +- Cache hit ratio: 70%+ on repeated inference requests +- Graceful shutdown completes within configured `GRACEFUL_SHUTDOWN_TIMEOUT_SECONDS` +- Circuit breaker eliminates cascading failure propagation to upstream clients + +### Infrastructure +- Python 3.11 runtime +- FastAPI 0.109 + Uvicorn 0.27 +- Nginx 1.25 +- Redis 7 +- PostgreSQL 15 +- Prometheus 2.48 + Grafana 10.2 + +--- + +## [Unreleased] + +### Planned +- GPU inference support with CUDA-accelerated model serving +- Multi-model registry with hot-swap capability +- Horizontal autoscaling via Kubernetes HPA +- OpenTelemetry distributed tracing integration +- gRPC endpoint alongside REST for lower-latency clients +- A/B testing framework for model version comparison +- Token-based authentication (JWT) for API access control