From ea27346186446243df26162ecbf1a9ef187afb64 Mon Sep 17 00:00:00 2001 From: GopalRajD Date: Thu, 9 Apr 2026 10:18:03 -0700 Subject: [PATCH] Rename Inference Benchmarks to Inference Metrics Updated section titles and metrics for clarity. --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ae906c8..3b83da1 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ An AI-powered application that generates comprehensive system design specificati - [Project Structure](#project-structure) - [Usage Guide](#usage-guide) - [Performance Tips](#performance-tips) - - [Inference Benchmarks](#inference-benchmarks) + - [Inference Metrics](#inference-metrics) - [Model Capabilities](#model-capabilities) - [GPT-4o](#gpt-4o) - [Llama 3.2 3B Instruct](#llama-32-3b-instruct) @@ -321,7 +321,7 @@ SpecForge/ --- -## Inference Benchmarks +## Inference Metrics The table below compares inference performance across different providers and models using a standardized SpecForge workload (3 runs: questions generation + spec generation with 1000 max output tokens). @@ -333,7 +333,7 @@ The table below compares inference performance across different providers and mo > **Notes:** > -> - All benchmarks use identical SpecForge workflows: idea input → 5 questions → spec generation with `LLM_MAX_TOKENS=1000`. +> - All metrics use identical SpecForge workflows: idea input → 5 questions → spec generation with `LLM_MAX_TOKENS=1000`. > - Token counts are actual values from API responses (not estimates). > - GPT-4o delivers 2.5x faster P50 latency and 2.1x better throughput compared to Llama 3.2 3B on the tested infrastructure. > - Llama 3.2 3B performance is limited by CPU-only inference on the test gateway. Local GPU inference would significantly improve these numbers.