From b5718e15709995e1b17a389052441ffeba3552cf Mon Sep 17 00:00:00 2001 From: Xinyu Qu <47442607+XinyuQu@users.noreply.github.com> Date: Fri, 17 Apr 2026 11:46:41 -0400 Subject: [PATCH] feat: add codebase-documentor plugin MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a documentation plugin that analyzes codebases to produce a single CODEBASE_ANALYSIS.md with source-of-truth citations. Designed for legacy and AI-generated codebases where engineers need deep understanding to operate, debug, and extend the system. Key capabilities: - Outline-driven pipeline: file tree → outline → iterative analysis → assembly - Clickable citations: every finding links to source code via markdown links - Discrepancy detection: cross-references README/metadata vs actual code - Actionable failure modes: detection methods + recovery commands for oncall - Architecture diagrams: delegates to aws-architecture-diagram skill (deploy-on-aws plugin) for draw.io output; Mermaid fallback for flow diagrams and architecture overview when skill unavailable - Deep analysis: iterative deepening (scan → question → search → write) - Tool-agnostic: works on Claude Code, Cursor, Codex, and other tools - Large codebase support: tracked sequential analysis with resumable progress file; optional parallel workers when environment supports them Output sections: Architecture Overview, Code Analysis, Request Lifecycle, Domain Logic Deep-Dive, Startup & Initialization, Components, API Contracts, Data Models, Deployment, Configuration, Monitoring & Observability, Security, Local Development, Discrepancies, Failure Modes, Timeout/Dependency Chain, Runbook Hints, Business Context. Plugin structure: - One skill: document-service (auto-triggers on documentation requests) - Two MCP servers: awsknowledge (HTTP) and awsiac (stdio/uvx) - 8 reference files for progressive disclosure - Codex and Claude Code marketplace support --- .agents/plugins/marketplace.json | 12 + .claude-plugin/marketplace.json | 21 ++ .github/CODEOWNERS | 1 + README.md | 24 ++ .../.claude-plugin/plugin.json | 24 ++ .../.codex-plugin/plugin.json | 47 ++++ plugins/codebase-documentor/.mcp.json | 15 ++ plugins/codebase-documentor/README.md | 68 ++++++ .../skills/document-service/SKILL.md | 213 ++++++++++++++++++ .../references/business-context.md | 83 +++++++ .../references/citation-format.md | 86 +++++++ .../references/discovery-patterns.md | 131 +++++++++++ .../references/error-scenarios.md | 42 ++++ .../references/exclusion-patterns.md | 77 +++++++ .../references/framework-patterns.md | 105 +++++++++ .../references/recursive-analysis.md | 71 ++++++ .../references/technical-doc-template.md | 211 +++++++++++++++++ 17 files changed, 1231 insertions(+) create mode 100644 plugins/codebase-documentor/.claude-plugin/plugin.json create mode 100644 plugins/codebase-documentor/.codex-plugin/plugin.json create mode 100644 plugins/codebase-documentor/.mcp.json create mode 100644 plugins/codebase-documentor/README.md create mode 100644 plugins/codebase-documentor/skills/document-service/SKILL.md create mode 100644 plugins/codebase-documentor/skills/document-service/references/business-context.md create mode 100644 plugins/codebase-documentor/skills/document-service/references/citation-format.md create mode 100644 plugins/codebase-documentor/skills/document-service/references/discovery-patterns.md create mode 100644 plugins/codebase-documentor/skills/document-service/references/error-scenarios.md create mode 100644 plugins/codebase-documentor/skills/document-service/references/exclusion-patterns.md create mode 100644 plugins/codebase-documentor/skills/document-service/references/framework-patterns.md create mode 100644 plugins/codebase-documentor/skills/document-service/references/recursive-analysis.md create mode 100644 plugins/codebase-documentor/skills/document-service/references/technical-doc-template.md diff --git a/.agents/plugins/marketplace.json b/.agents/plugins/marketplace.json index b28bbad4..2f032e6b 100644 --- a/.agents/plugins/marketplace.json +++ b/.agents/plugins/marketplace.json @@ -40,6 +40,18 @@ }, "category": "Development" }, + { + "name": "codebase-documentor", + "source": { + "source": "local", + "path": "./plugins/codebase-documentor" + }, + "policy": { + "installation": "AVAILABLE", + "authentication": "ON_INSTALL" + }, + "category": "Documentation" + }, { "name": "databases-on-aws", "source": { diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 71301933..83285d7e 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -79,6 +79,27 @@ ], "version": "1.0.0" }, + { + "category": "documentation", + "description": "Analyze AWS-deployed services and codebases to generate structured technical documentation, architecture diagrams, and business documentation — all with source-of-truth citations linking every finding back to code. Delegates diagram generation to the aws-architecture-diagram skill (deploy-on-aws plugin) for professional draw.io output. Supports CDK, CloudFormation, and Terraform.", + "keywords": [ + "aws", + "aws agent skills", + "documentation", + "architecture", + "diagram", + "draw.io", + "codebase-analysis", + "cdk", + "cloudformation", + "terraform", + "onboarding" + ], + "name": "codebase-documentor", + "source": "./plugins/codebase-documentor", + "tags": ["aws", "documentation", "architecture", "diagram"], + "version": "0.1.0" + }, { "category": "database", "description": "Expert database guidance for the AWS database portfolio. Design schemas, execute queries, handle migrations, and choose the right database for your workload.", diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 2f282a92..18fc20a2 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -33,6 +33,7 @@ tools/ @awslabs/agent-plugins-admins plugins/amazon-location-service @awslabs/agent-plugins-admins @awslabs/agent-plugins-maintainers @awslabs/agent-plugins-amazon-location-service plugins/aws-amplify @awslabs/agent-plugins-admins @awslabs/agent-plugins-maintainers @awslabs/agent-plugins-amplify plugins/aws-serverless @awslabs/agent-plugins-admins @awslabs/agent-plugins-maintainers @awslabs/agent-plugins-aws-serverless +plugins/codebase-documentor @awslabs/agent-plugins-admins @awslabs/agent-plugins-maintainers plugins/databases-on-aws @awslabs/agent-plugins-admins @awslabs/agent-plugins-maintainers @awslabs/agent-plugins-dsql plugins/deploy-on-aws @awslabs/agent-plugins-admins @awslabs/agent-plugins-maintainers @awslabs/agent-plugins-deploy-on-aws plugins/migration-to-aws @awslabs/agent-plugins-admins @awslabs/agent-plugins-maintainers @awslabs/agent-plugins-migrate-to-aws diff --git a/README.md b/README.md index 1e51475e..fdb4d798 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,7 @@ To maximize the benefits of plugin-assisted development while maintaining securi | **amazon-location-service** | Add maps, geocoding, routing, places search, and geospatial features to applications with Amazon Location Service | Available | | **aws-amplify** | Build full-stack apps with AWS Amplify Gen 2 using guided workflows for auth, data, storage, and functions | Available | | **aws-serverless** | Build serverless applications with Lambda, API Gateway, EventBridge, Step Functions, and durable functions | Available | +| **codebase-documentor** | Analyze AWS-deployed services and codebases to generate structured technical documentation with source-of-truth citations | Available | | **databases-on-aws** | Database guidance for the AWS database portfolio — schema design, queries, migrations, and multi-tenant patterns | Some Services Available (Aurora DSQL) | | **deploy-on-aws** | Deploy applications to AWS with architecture recommendations, cost estimates, and IaC deployment | Available | | **migration-to-aws** | Migrate GCP infrastructure to AWS with resource discovery, architecture mapping, cost analysis, and execution planning | Available | @@ -90,6 +91,12 @@ or /plugin install sagemaker-ai@agent-plugins-for-aws ``` +or + +```bash +/plugin install codebase-documentor@agent-plugins-for-aws +``` + ### Codex Codex supports repo-local marketplaces and plugin manifests through @@ -227,6 +234,23 @@ Design, build, deploy, test, and debug serverless applications with AWS Lambda, | --------------------------- | --------------------------------------------- | --------------------------------------------- | | **SAM template validation** | After edits to `template.yaml`/`template.yml` | Runs `sam validate` and reports errors inline | +## codebase-documentor + +Analyzes codebases to generate structured technical documentation with source-of-truth citations linking every finding back to the exact code that produced it. Uses an outline-driven pipeline to systematically analyze codebases of any size with a persistent task board for resumability. + +### Agent Skill Triggers + +| Agent Skill | Triggers | +| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **document-service** | "analyze this codebase", "generate documentation", "document this service", "I inherited this code", "help me understand this system", "draw the architecture", "what does this system look like" | + +### MCP Servers + +| Server | Purpose | +| ---------------- | -------------------------------------------------------------------- | +| **awsknowledge** | AWS service descriptions, architecture guidance, documentation links | +| **awsiac** | CDK/CloudFormation resource schema validation and IaC best practices | + ## databases-on-aws Database guidance for the AWS database portfolio. Design schemas, execute queries, handle migrations, build applications, and choose the right database for your workload. Currently includes Aurora DSQL — a serverless, PostgreSQL-compatible distributed SQL database. diff --git a/plugins/codebase-documentor/.claude-plugin/plugin.json b/plugins/codebase-documentor/.claude-plugin/plugin.json new file mode 100644 index 00000000..66eae151 --- /dev/null +++ b/plugins/codebase-documentor/.claude-plugin/plugin.json @@ -0,0 +1,24 @@ +{ + "author": { + "name": "Amazon Web Services" + }, + "description": "Analyze AWS-deployed services and codebases to generate structured technical documentation, architecture diagrams, and business documentation — all with source-of-truth citations linking every finding back to code. Delegates diagram generation to the aws-architecture-diagram skill (deploy-on-aws plugin) for professional draw.io output. Supports CDK, CloudFormation, and Terraform.", + "homepage": "https://github.com/awslabs/agent-plugins", + "keywords": [ + "aws", + "aws agent skills", + "documentation", + "architecture", + "diagram", + "draw.io", + "codebase-analysis", + "cdk", + "cloudformation", + "terraform", + "onboarding" + ], + "license": "Apache-2.0", + "name": "codebase-documentor", + "repository": "https://github.com/awslabs/agent-plugins", + "version": "0.1.0" +} diff --git a/plugins/codebase-documentor/.codex-plugin/plugin.json b/plugins/codebase-documentor/.codex-plugin/plugin.json new file mode 100644 index 00000000..6ca307ec --- /dev/null +++ b/plugins/codebase-documentor/.codex-plugin/plugin.json @@ -0,0 +1,47 @@ +{ + "name": "codebase-documentor", + "version": "0.1.0", + "description": "Analyze AWS-deployed services and codebases to generate structured technical documentation, architecture diagrams, and business documentation — all with source-of-truth citations linking every finding back to code. Delegates diagram generation to the aws-architecture-diagram skill (deploy-on-aws plugin) for professional draw.io output. Supports CDK, CloudFormation, and Terraform.", + "author": { + "name": "Amazon Web Services", + "email": "aws-agent-plugins@amazon.com", + "url": "https://github.com/awslabs/agent-plugins" + }, + "homepage": "https://github.com/awslabs/agent-plugins", + "repository": "https://github.com/awslabs/agent-plugins", + "license": "Apache-2.0", + "keywords": [ + "aws", + "aws agent skills", + "documentation", + "architecture", + "diagram", + "draw.io", + "codebase-analysis", + "cdk", + "cloudformation", + "terraform", + "onboarding" + ], + "skills": "./skills/", + "mcpServers": "./.mcp.json", + "interface": { + "displayName": "Codebase Documentor", + "shortDescription": "Analyze codebases and generate structured technical documentation with citations.", + "longDescription": "Analyze AWS-deployed services and codebases to generate structured technical documentation with source-of-truth citations linking every finding back to code.", + "defaultPrompt": [ + "Analyze this codebase and generate documentation.", + "I inherited this code — help me understand and document it.", + "Document the architecture and components of this service." + ], + "developerName": "Amazon Web Services", + "category": "Documentation", + "capabilities": [ + "Read" + ], + "websiteURL": "https://github.com/awslabs/agent-plugins", + "privacyPolicyURL": "https://aws.amazon.com/privacy/", + "termsOfServiceURL": "https://aws.amazon.com/service-terms/", + "brandColor": "#FF9900" + } +} diff --git a/plugins/codebase-documentor/.mcp.json b/plugins/codebase-documentor/.mcp.json new file mode 100644 index 00000000..11023104 --- /dev/null +++ b/plugins/codebase-documentor/.mcp.json @@ -0,0 +1,15 @@ +{ + "mcpServers": { + "awsknowledge": { + "type": "http", + "url": "https://knowledge-mcp.global.api.aws" + }, + "awsiac": { + "args": [ + "awslabs.aws-iac-mcp-server@latest" + ], + "command": "uvx", + "type": "stdio" + } + } +} diff --git a/plugins/codebase-documentor/README.md b/plugins/codebase-documentor/README.md new file mode 100644 index 00000000..ddd0e2a3 --- /dev/null +++ b/plugins/codebase-documentor/README.md @@ -0,0 +1,68 @@ +# Codebase Documentor + +Analyze codebases — especially legacy and AWS-deployed services — to produce structured technical documentation and architecture diagrams with source-of-truth citations linking every finding back to the code. Understands CDK, CloudFormation, Terraform, and enriches output with AWS service context. Delegates architecture diagram generation to the `aws-architecture-diagram` skill (from the `deploy-on-aws` plugin) for professional draw.io output with official AWS4 icons, or falls back to inline Mermaid. + +## How It Works + +Unlike ad-hoc "explain this code" prompts, codebase-documentor uses an **outline-driven pipeline** to systematically analyze codebases of any size — from small microservices to large legacy monoliths. The entire process runs autonomously after initial context gathering. + +**The pipeline:** + +1. **Build file tree** — Recursively list all files, filter out noise, detect the project type and entry points +2. **Generate outline** — Analyze the file tree, README, and entry points to produce a documentation outline mapping each section to source files +3. **Analyze** — Two parallel paths: (A) application code analysis (APIs, data models, integrations) and (B) infrastructure-as-code analysis (CDK, CloudFormation, Terraform resources and relationships) +4. **Generate diagram** — Delegate to the `aws-architecture-diagram` skill (deploy-on-aws plugin) for draw.io output, or fall back to inline Mermaid +5. **Assemble** — Combine all sections into `CODEBASE_ANALYSIS.md` (single file with business context included) + +**For large codebases**, the outline sections are tracked on a persistent `.codebase-documentor-progress.md` task board, making the process **resumable** — if a session is interrupted, a new session reads the progress file and continues from where it left off. + +## Skills + +| Skill | Purpose | +| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `document-service` | Analyze codebases to produce `CODEBASE_ANALYSIS.md` with architecture diagrams (via `aws-architecture-diagram` skill, Mermaid fallback), business context, and source citations. Two core analysis paths: application code (APIs, data models, integrations) and IaC (CDK, CloudFormation, Terraform). | + +## MCP Servers + +| Server | Type | Purpose | When Used | +| -------------- | ----- | -------------------------------------------------------------------- | ----------------------------------------- | +| `awsknowledge` | HTTP | AWS service descriptions, architecture guidance, documentation links | When AWS services detected in code or IaC | +| `awsiac` | stdio | CDK/CloudFormation resource schema validation and IaC best practices | When CDK or CloudFormation files detected | + +## Installation + +```bash +/plugin install codebase-documentor@agent-plugins-for-aws +``` + +Or test locally: + +```bash +claude --plugin-dir ./plugins/codebase-documentor +``` + +## Examples + +**Analyze an inherited codebase:** + +```text +I inherited this codebase and need to understand it. Analyze and document it. +``` + +**Document a specific service:** + +```text +Analyze the order-processing service and generate documentation. +``` + +**Analyze with existing context:** + +```text +Analyze this service. Here's an existing design doc: [link] +``` + +**Understand a legacy system:** + +```text +Help me understand this legacy system. What does it do and how is it architected? +``` diff --git a/plugins/codebase-documentor/skills/document-service/SKILL.md b/plugins/codebase-documentor/skills/document-service/SKILL.md new file mode 100644 index 00000000..66c6fd63 --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/SKILL.md @@ -0,0 +1,213 @@ +--- +name: document-service +description: "This skill should be used when the user asks to \"analyze this codebase\", \"document this service\", \"generate technical docs\", \"I inherited this code\", \"help me understand this system\", \"create docs for this project\", \"what does this system look like\", \"onboard me to this codebase\", \"this codebase has no docs\", \"visualize the architecture from code\", or any explicit request to produce structured documentation or architecture diagrams from an existing codebase. Specifically optimized for AWS workloads (CDK, CloudFormation, Terraform) with source-of-truth citations. Do NOT activate for code reviews, single-function explanations, generating new code, or general coding tasks." +--- + +# Codebase Analyzer + +Analyze codebases to produce structured technical documentation and architecture diagrams with source-of-truth citations. Every finding links back to the exact file and line it was derived from. Optimized for AWS workloads but works with any codebase. + +## Core Principles + +- **Explain WHY, not just WHAT.** The reader inherited this codebase and has zero context. Listing components is not enough — explain why the architecture is shaped this way. Search for code comments, TODOs, and commit messages that reveal design rationale. When no rationale exists, mark it `[RATIONALE UNKNOWN]`. +- **Trace end-to-end flows.** For every API endpoint or message handler, trace the complete request path from entry to response. Note every intermediate step, transformation, timeout, and failure point. This is the "if it breaks at 3am, where do I look?" analysis. +- **Deep-dive complex logic.** Identify the most complex or domain-specific code paths (ML pipelines, business rule engines, state machines, custom algorithms). Document HOW they work at the implementation level — the algorithm, key parameters, edge cases, and where production bugs will occur. Surface-level summaries of complex code provide no value over a naive AI prompt. +- **Surface implicit knowledge.** Look for hardcoded values, magic numbers, environment-dependent behavior, and undocumented assumptions. These are the tribal knowledge items that disappear when teams leave. +- **Every claim must be traceable.** Include `file:line` citations for every finding. See [citation-format.md](references/citation-format.md). Verify citations precisely — re-read the cited file and confirm the line number is within ±3 lines. Anchor with function/variable names. +- **Code is the source of truth.** Document what actually exists in code, not what READMEs or wikis claim. Flag every discrepancy between documentation and reality. +- **Mark unknowns and risks explicitly.** Use `[UNKNOWN]` for items not inferable from code, `[RISK]` for unhandled failure modes, `[INFERRED]` for educated guesses, `[RATIONALE UNKNOWN]` for unexplained architecture choices. Omitting markers undermines trust. +- **Verify quantitative claims.** List directory entries programmatically and use exact counts. + +## Workflow + +The workflow runs autonomously from Step 2 onward. Step 1 is the only interactive step. + +### Step 1: Gather Context + +Gather from the user: + +- Target directory or service to analyze +- Any existing documentation, design docs, or business context (accept "nothing" — this skill is designed for undocumented codebases) + +If existing docs are provided, read them first to establish baseline context. If the target directory and context are already known (e.g., provided via automation or a pre-configured prompt), skip the interactive step and proceed directly to Step 2. + +Check whether `CODEBASE_ANALYSIS.md` already exists at the output path. If so, ask the user: "Overwrite or write to a different filename?" Resolve this before proceeding — the rest of the workflow runs autonomously. + +### Step 2: Build File Tree and Detect Project Type + +1. List all files recursively in the target directory +2. Apply exclusion patterns from [exclusion-patterns.md](references/exclusion-patterns.md). Also respect `.gitignore`. +3. Detect project type and framework from characteristic files. See [discovery-patterns.md](references/discovery-patterns.md). +4. Identify entry points based on detected project type. See [discovery-patterns.md](references/discovery-patterns.md). +5. Read the README, CLAUDE.md, or AGENTS.md if present — these contain project context. +6. Check git branch names (`git branch -a`) for strategic context (e.g., a `dev/rust` branch signals a language migration in progress). Note active branches in the Architecture Overview. + +### Step 3: Generate Documentation Outline + +Produce a hierarchical outline mapping each documentation section to specific source files: + +```markdown +## Documentation Outline + +1. Architecture Overview → [entry points, IaC stack files] — explain WHY, not just WHAT +2. [Module A: detected name] → [source files for module A] +3. [Module B: detected name] → [source files for module B] +4. Shared Utilities → [shared/common source files] +5. Request Lifecycle → [trace end-to-end flows through the system] +6. Domain Logic Deep-Dive → [core services at implementation level: algorithms, parameters, edge cases] +7. Startup & Initialization → [boot sequence, model loading, cache warmup, dependency checks] +8. API Contracts → [route definitions, OpenAPI specs] +9. Data Models → [schema files, ORM models] +10. Deployment & IaC → [IaC files, Dockerfiles] +11. Configuration & Secrets → [config files, .env.example, prompt templates, YAML configs] +12. Monitoring & Observability → [log groups, metrics, tracing, alarms, dashboards] +13. Security → [auth, encryption, IAM, network isolation] +14. Local Development → [how to run/test locally, CPU fallback, dev environment setup] +15. Discrepancies → (cross-reference README/metadata vs actual code) +16. Failure Modes → (cross-cutting — include detection + recovery) +17. Timeout/Dependency Chain → (map cascading timeouts across layers) +``` + +Follow the section structure in [technical-doc-template.md](references/technical-doc-template.md) but adapt to the actual codebase — add sections for significant modules, skip sections that don't apply. Aim for balance: each section should map to a meaningful subset of files. If a module maps to more than ~30 files, consider splitting it into sub-sections. + +**Do NOT pause for user review.** Proceed immediately to analysis. + +### Step 4: Analyze + +Two core analysis paths: + +#### Path A: Application Code + +For each outline section, read mapped source files and extract: + +1. API and service definitions — route handlers, controllers, gRPC services, GraphQL resolvers +2. Data model definitions — database schemas, ORM models, type definitions +3. Internal dependencies — imports between modules, shared utilities, event handlers +4. External integrations — SDK clients, HTTP calls, queue producers/consumers +5. Configuration — environment variables, feature flags, secrets references + +Consult [framework-patterns.md](references/framework-patterns.md) for framework-specific extraction patterns. + +#### Path B: Infrastructure-as-Code + +When IaC files are detected (CDK, CloudFormation, Terraform, Serverless Framework): + +1. Parse resource definitions. Identify AWS resource types, relationships, and networking topology. +2. Map infrastructure to application components that use them. +3. Extract networking topology — VPCs, subnets, security groups. +4. Consult MCP servers — use `awsiac` to confirm resource interpretations, `awsknowledge` for service descriptions. + +When no IaC is found, infer infrastructure from application code (SDK clients, connection strings, environment variables) and mark components as `[INFERRED]`. + +**Note on CDK projects:** In CDK codebases, the IaC IS application code (TypeScript/Python constructs). Process CDK files in a single pass covering both Path A and Path B rather than treating them as separate analyses. Extract both the resource definitions (Path B) and the application logic interleaved with them (Lambda bundling, environment wiring, IAM grants — Path A) simultaneously. + +#### Writing Sections + +For each outline section: + +1. Re-read mapped source files for exact line numbers — do not rely on memory from earlier steps. +2. Use grep for patterns — route definitions, model declarations, error handlers. +3. Write content with inline citations. See [citation-format.md](references/citation-format.md). +4. **Document every source file.** Enumerate ALL non-generated source files. Every file should appear somewhere in the documentation — in a module table, component table, or at minimum a file inventory. Files that define symbols never imported by any execution path should be flagged as `[UNUSED]` potential dead code. +5. **Analyze the test suite.** Document what tests verify, what coverage gaps exist, and how to interpret test failures. Tests reveal expected behavior and edge cases. + +Process cross-cutting sections (Failure Modes, Configuration, Security, Discrepancies) last, drawing on accumulated knowledge. + +**Discrepancy detection**: After analyzing the codebase, re-read the README, CLAUDE.md, package.json description, and any project metadata. Flag every claim that does not match the actual code — features referenced but not implemented, resource types that differ, architecture components that don't exist. For legacy codebases, this "trust but verify" pass is the single most valuable output. + +**Actionable failure modes**: For each failure mode, include the detection method (CloudWatch metric, log pattern, symptom) and recovery steps (actual commands), not just a description. The reader is an on-call engineer at 3am. + +#### Deep Analysis Approach + +Do not attempt a single-pass skim. For each module or service, use iterative deepening: + +1. **First pass** — scan file structure and entry points to understand scope +2. **Second pass** — read core files, identify questions (what calls this? where is this configured? what happens on error?) +3. **Third pass** — search for answers to those questions across the codebase, trace cross-module dependencies +4. **Write** — only write the section after all three passes. Re-read cited files to verify exact line numbers. + +#### Large Codebase Strategy + +For codebases with multiple top-level modules, deep nesting, or hundreds of source files: + +- **Primary: tracked sequential analysis.** Create a `.codebase-documentor-progress.md` task board to track progress through sections, enabling resumability if interrupted. This works on all platforms (Claude Code, Cursor, Codex, or any coding assistant). +- **Acceleration: parallel workers.** If the environment supports spawning parallel agents, assign outline sections to independent workers. Each worker reads its mapped files and produces section content with citations. Keep Architecture Overview and cross-cutting sections in the main session for assembly. + +See [recursive-analysis.md](references/recursive-analysis.md) for detailed instructions on both approaches. + +### Step 5: Generate Diagrams + +Two types of diagrams serve different purposes: + +**Sequence/flow diagrams — inline Mermaid.** For request lifecycle traces and data pipeline flows identified in Step 4, generate Mermaid `sequenceDiagram` or `flowchart` blocks inline in the relevant CODEBASE_ANALYSIS.md sections. Mermaid is the community standard for simple flow diagrams and renders natively on GitHub. Keep these focused — one diagram per major request path or data flow. + +**Architecture diagram — always attempt the `aws-architecture-diagram` skill first.** For the system-level architecture diagram (services, infrastructure, boundaries): invoke the `aws-architecture-diagram` skill (part of the `deploy-on-aws` plugin) with "analyze [target-directory]" to trigger Mode A. It produces a validated draw.io diagram (`docs/*.drawio`) with official AWS4 icons and professional styling. **Only if** the skill is genuinely unavailable (not installed, invocation fails), fall back to a Mermaid `flowchart TD` architecture overview directly in the Architecture Overview section. Include all major services, data stores, external dependencies, and infrastructure boundaries (VPC/subnets as subgraphs when IaC is present). + +After diagram generation, try to export to PNG for embedding in the report. Run `drawio -x -f png -e -b 10 -o docs/.drawio.png docs/.drawio`. If `drawio` is not on PATH, skip the PNG export — the report will link to the `.drawio` file directly instead of embedding an image. + +Cross-reference the diagram against the Architecture Overview text. Update documentation or diagram if they diverge. + +### Step 6: Assemble and Deliver + +1. Assemble all sections into `CODEBASE_ANALYSIS.md` following [technical-doc-template.md](references/technical-doc-template.md) +2. Embed the architecture diagram as an image with a link to the editable source: + + ```markdown + ![Architecture](./docs/.drawio.png) + + > Editable source: [`docs/.drawio`](./docs/.drawio) + ``` + + If PNG export was not possible, link to the `.drawio` file directly. Mermaid flow diagrams go inline in relevant sections. +3. When the codebase reveals clear business capabilities (API contracts, domain models, data flows, SLA configs), include a **Business Context** section at the end of `CODEBASE_ANALYSIS.md` following [business-context.md](references/business-context.md). Skip only for pure libraries or infrastructure-only code. Do NOT include speculative content — but a README describing the product IS sufficient business context. +4. Tag items not inferable from code with `[UNKNOWN]` +5. Write `CODEBASE_ANALYSIS.md` to the target directory +6. Present summary: components documented, APIs found, unknowns tagged, citations included + +## Output Files + +| File | Purpose | +| ---------------------- | ------------------------------------------------------------------------------ | +| `CODEBASE_ANALYSIS.md` | Single output — technical docs, business context, citations, and flow diagrams | +| `docs/*.drawio` | Architecture diagram source (editable in draw.io) | +| `docs/*.drawio.png` | Architecture diagram image (embedded in report, if CLI export available) | + +## Defaults + +| Setting | Default | Override | +| -------------------- | ------------------------------------------------------------------------ | --------------- | +| Primary output | CODEBASE_ANALYSIS.md | - | +| Flow diagrams | Mermaid inline (sequenceDiagram / flowchart) | "skip diagrams" | +| Architecture diagram | draw.io via aws-architecture-diagram skill (Mermaid fallback if missing) | "skip diagrams" | +| IaC reading | Read-only (never modify) | - | +| AWS enrichment | Enabled when AWS services detected | "skip AWS" | +| Scope | User-specified directory | - | + +## Error Handling + +See [error-scenarios.md](references/error-scenarios.md) for handling of empty directories, missing entry points, missing IaC, existing output files, and MCP server failures. + +## MCP Servers + +### awsknowledge + +Consult when AWS services are detected. Use for enrichment (adding official service descriptions and documentation links to CODEBASE_ANALYSIS.md) and validation (confirming the analysis interpretation is correct). When the codebase is self-explanatory, validation is more valuable than enrichment — do not add MCP content just because the server is available. + +Example queries: search for "Amazon ECS on EC2 GPU instances" to confirm GPU support patterns, or read the official service page for an unfamiliar AWS service to get a one-line description. + +### awsiac + +Consult when CDK, CloudFormation, or Terraform files are detected. Use primarily for validation — confirm that the interpretation of a construct or resource type matches its actual behavior. Particularly useful for complex constructs with non-obvious defaults. + +Example queries: confirm properties of `ecs.FargateService` vs `ecs.Ec2Service`, look up Terraform resource attributes for unfamiliar providers, or verify CloudFormation resource relationships. + +## References + +- [Output template](references/technical-doc-template.md) — CODEBASE_ANALYSIS.md section structure +- [Business context](references/business-context.md) — Business Context section guidance (when to include, what to cover) +- [Citation format](references/citation-format.md) — Clickable citation rules and anchoring +- [Error scenarios](references/error-scenarios.md) — Handling common failure conditions +- [Exclusion patterns](references/exclusion-patterns.md) — Files and directories to skip during scanning +- [Project detection](references/discovery-patterns.md) — Project type detection and entry point identification +- [Code extraction](references/framework-patterns.md) — Framework-specific data extraction patterns +- [Large codebase strategy](references/recursive-analysis.md) — Tracked sequential analysis and parallel workers diff --git a/plugins/codebase-documentor/skills/document-service/references/business-context.md b/plugins/codebase-documentor/skills/document-service/references/business-context.md new file mode 100644 index 00000000..cecfd87f --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/references/business-context.md @@ -0,0 +1,83 @@ +# Business Context Section Template + +Structure for the **Business Context** section at the end of `CODEBASE_ANALYSIS.md`. Include only when sufficient business context is available from code, existing docs, or user input. + +## When to Include + +Include the Business Context section when: + +- Existing README or docs contain business context +- User provides business context during the gather step +- Code clearly reveals business domain (e.g., e-commerce order flow, payment processing) + +Skip the Business Context section when: + +- Code is purely infrastructure or utility +- No business context is available from any source +- Including would require substantial speculation + +## Document Structure + +```markdown +# [Service Name] — Business Documentation + +> Generated by codebase-documentor on [date] +> Target directory: [path] + +## Service Overview + +[One paragraph: what the service does in business terms, +who uses it, and why it exists.] + +> Sources: +> +> - `file:line` — [what this source reveals] + +## Business Capabilities + +| Capability | Description | Source | +| ------------ | ------------------------------- | ----------- | +| [capability] | [what it does for the business] | `file:line` | + +## Data Flows + +### [Flow Name] + +1. [Step 1] — `source:line` +2. [Step 2] — `source:line` +3. [Step 3] — `source:line` + +[Describe the key business processes and how data moves +through the system.] + +## Dependencies + +### Upstream (services this depends on) + +| Service | Purpose | Source | +| ------- | ------------------ | ----------- | +| [name] | [what it provides] | `file:line` | + +### Downstream (services that depend on this) + +| Service | Purpose | Source | +| ------- | ------------------ | ----------- | +| [name] | [what it consumes] | `file:line` | + +## SLAs and Constraints + +[Performance requirements, availability targets, rate limits — +only if documented in code or configuration.] + +| Constraint | Value | Source | +| ------------ | ------- | ----------- | +| [constraint] | [value] | `file:line` | +``` + +## Section Guidelines + +- **Service Overview**: Derive from README, code comments, API names, domain objects. Keep to one paragraph. +- **Business Capabilities**: Map code functionality to business terms. E.g., `processOrder()` → "Order processing". +- **Data Flows**: Trace key business processes through the code. Cite each step. +- **Dependencies**: Identify from API calls, SDK clients, queue consumers/producers, database connections. +- **SLAs and Constraints**: Only include if found in code (timeout configs, rate limit settings, health check thresholds). Mark missing SLAs as `[UNKNOWN]`. diff --git a/plugins/codebase-documentor/skills/document-service/references/citation-format.md b/plugins/codebase-documentor/skills/document-service/references/citation-format.md new file mode 100644 index 00000000..34a1ee8e --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/references/citation-format.md @@ -0,0 +1,86 @@ +# Source-of-Truth Citation Format + +Every finding in generated documentation must cite the source code that produced it. + +## Why Citations Matter + +- **Verifiability**: Readers can follow any citation to confirm a finding +- **Maintainability**: When cited code changes, stale documentation is identifiable +- **Trust**: Documentation with citations is trusted over unsourced claims + +## Citation Format + +All citations use **clickable markdown links** so readers can cmd+click (Mac) or ctrl+click to navigate directly to the source code. The link format uses `#L{line}` anchors which work on GitHub and in most editors. + +### Inline Citations (for tables) + +Use markdown links in table cells: + +| Component | Type | Purpose | Source | +| ------------- | ------ | ----------------------- | --------------------------------------------------- | +| order-queue | SQS | Buffers incoming orders | [`lib/order-stack.ts:23`](./lib/order-stack.ts#L23) | +| process-order | Lambda | Processes queued orders | [`lib/order-stack.ts:31`](./lib/order-stack.ts#L31) | + +### Block Citations (for narrative sections) + +Use `> Sources:` blocks after narrative paragraphs: + +```markdown +The service uses an event-driven architecture with SQS queues +triggering Lambda handlers for asynchronous order processing. + +> Sources: +> +> - [`lib/order-stack.ts:45-52`](./lib/order-stack.ts#L45) — SQS queue to Lambda event source mapping +> - [`src/handlers/process-order.ts:12-18`](./src/handlers/process-order.ts#L12) — SQS event handler entry point +> - [`src/handlers/charge-payment.ts:8`](./src/handlers/charge-payment.ts#L8) — SNS trigger for payment processing +``` + +### Line Ranges + +Use `file:start-end` for multi-line references. The link anchors to the start line: + +- Single line: [`src/app.ts:42`](./src/app.ts#L42) +- Line range: [`src/app.ts:42-58`](./src/app.ts#L42) — link opens at line 42 +- Entire file (avoid unless small): [`config/database.yml`](./config/database.yml) + +### Anchoring Citations + +When citing a specific function, class, or constant, include its name alongside the line number. This makes citations resilient to line shifts when code changes: + +- Good: [`src/auto-reply/tokens.ts:4`](./src/auto-reply/tokens.ts#L4) (`SILENT_REPLY_TOKEN`) +- Good: [`src/app.ts:42-58`](./src/app.ts#L42) (`handleRequest()`) +- Weak: `src/auto-reply/tokens.ts:4` (no link, no anchor — brittle and not navigable) + +## Citation Rules + +1. **Cite every finding** — No claim without a source +2. **Be specific** — Cite exact lines, not entire files +3. **Verify accuracy** — Read the cited lines to confirm they support the finding +4. **Use relative paths** — Paths relative to the analyzed directory root +5. **Annotate citations** — Include a brief description of what the citation shows +6. **Group related citations** — Multiple sources for one finding go in one `> Sources:` block + +## Citation Verification Process + +REQUIRED: Before including any citation, verify it is accurate: + +1. **Read the cited lines** — Confirm the code at the cited location supports the finding +2. **Use exact line numbers** — Do not estimate; read the file and count lines +3. **Mark unverifiable claims** — Mark unverifiable citations `[UNVERIFIED]` with an explanation + +## When Citations Cannot Be Provided + +Mark findings that are inferred rather than directly cited: + +```markdown +The service likely communicates with a payment gateway. +[INFERRED — based on `PaymentService` import at `src/app.ts:3`, +but no direct API call found] +``` + +Do not present inferences as facts. Clearly distinguish between: + +- **Cited findings**: Directly supported by code +- **Inferences**: Logically deduced but not directly visible +- **Unknowns**: Cannot be determined from code diff --git a/plugins/codebase-documentor/skills/document-service/references/discovery-patterns.md b/plugins/codebase-documentor/skills/document-service/references/discovery-patterns.md new file mode 100644 index 00000000..abf343b0 --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/references/discovery-patterns.md @@ -0,0 +1,131 @@ +# Discovery Patterns + +Framework-specific patterns for extracting information during codebase analysis. + +## Project Type Detection + +Detect the project type early — it determines which entry points to look for, which frameworks to expect, and which patterns matter most. + +| Indicator File(s) | Project Type | Primary Language | +| ----------------------------------------------------------- | ------------- | --------------------- | +| `package.json` | Node.js | JavaScript/TypeScript | +| `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile` | Python | Python | +| `go.mod` | Go | Go | +| `Cargo.toml` | Rust | Rust | +| `pom.xml`, `build.gradle`, `build.gradle.kts` | Java/Kotlin | Java/Kotlin | +| `*.csproj`, `*.sln` | .NET | C# | +| `Gemfile` | Ruby | Ruby | +| `mix.exs` | Elixir | Elixir | +| `composer.json` | PHP | PHP | +| `cdk.json` (with above) | AWS CDK | (see above) | +| `serverless.yml` | Serverless | (see above) | +| `.projenrc.ts`, `.projenrc.js` | Projen | (see above) | +| `pnpm-workspace.yaml` | pnpm monorepo | (see above) | +| `*.tf` | Terraform | HCL | + +When multiple indicators are present (e.g., `package.json` + `requirements.txt`), the project is **polyglot** — note all detected types and analyze each stack's entry points. + +### Entry Points by Project Type + +| Project Type | Entry Point Files | +| ------------ | ---------------------------------------------------------- | +| Node.js | `index.ts`, `index.js`, `app.ts`, `server.ts`, `main.ts` | +| Python | `main.py`, `app.py`, `manage.py`, `__main__.py`, `wsgi.py` | +| Go | `main.go`, `cmd/*/main.go` | +| Rust | `src/main.rs`, `src/lib.rs` | +| Java/Kotlin | `*Application.java`, `src/main/java/**/Main.java` | +| .NET | `Program.cs`, `Startup.cs` | +| Ruby | `config.ru`, `app.rb`, `bin/rails` | +| PHP | `index.php`, `public/index.php`, `artisan` | +| CDK | `bin/*.ts`, `bin/*.py`, `app.py` | + +## General Discovery Order + +After detecting the project type and entry points, analyze files in this order for maximum information yield: + +1. Package manifests (dependencies reveal the technology stack) +2. IaC files (infrastructure reveals the architecture) +3. Entry points (reveal the application structure) +4. Route/handler definitions (reveal API surface) +5. Data models and schemas (reveal domain objects) +6. Tests (reveal expected behavior and edge cases) +7. CI/CD configs (reveal deployment and build process) +8. README and docs (reveal intent, even if stale) + +## Framework Detection + +| Indicator | Framework | Key Files | +| --------------------------------- | -------------------- | ---------------------------------------- | +| `package.json` with `express` | Express.js | `app.js`, `routes/`, `middleware/` | +| `package.json` with `next` | Next.js | `pages/`, `app/`, `next.config.*` | +| `package.json` with `react` | React SPA | `src/App.*`, `src/components/` | +| `requirements.txt` with `django` | Django | `manage.py`, `settings.py`, `urls.py` | +| `requirements.txt` with `fastapi` | FastAPI | `main.py`, `routers/`, `models/` | +| `requirements.txt` with `flask` | Flask | `app.py`, `routes/`, `templates/` | +| `go.mod` | Go | `main.go`, `cmd/`, `internal/` | +| `Cargo.toml` | Rust | `src/main.rs`, `src/lib.rs` | +| `pom.xml` or `build.gradle` | Java/Spring | `src/main/java/`, `application.yml` | +| `cdk.json` | AWS CDK | `lib/*-stack.ts`, `bin/*.ts` | +| `.projenrc.ts` | Projen | `.projenrc.ts` (full project config) | +| `pnpm-workspace.yaml` | pnpm monorepo | `packages/*/`, `apps/*/`, `extensions/*` | +| `serverless.yml` | Serverless Framework | `handler.*`, `functions/` | + +## IaC Detection + +| File/Pattern | IaC Type | Key Information | +| ---------------------------------- | -------------------- | ------------------------------ | +| `cdk.json` + `lib/*-stack.ts` | CDK TypeScript | Constructs, resources, props | +| `cdk.json` + `lib/*_stack.py` | CDK Python | Constructs, resources, props | +| `template.yaml` or `template.json` | CloudFormation | Resources, outputs, parameters | +| `*.tf` files | Terraform | Resources, modules, variables | +| `serverless.yml` | Serverless Framework | Functions, events, resources | +| `sam-template.yaml` | AWS SAM | Functions, APIs, tables | + +## API Surface Detection + +### REST APIs + +Look for route definitions: + +- Express: `app.get()`, `router.post()`, `app.use()` +- FastAPI: `@app.get()`, `@router.post()` +- Django: `urlpatterns`, `path()`, `re_path()` +- Spring: `@GetMapping`, `@PostMapping`, `@RequestMapping` +- Go: `http.HandleFunc()`, `mux.Handle()`, `gin.GET()` + +### GraphQL + +Look for schema definitions: + +- `schema.graphql`, `*.graphql` files +- `typeDefs` in code +- `@Query`, `@Mutation` decorators + +### Event-Driven + +Look for event handlers: + +- SQS: `SqsEvent`, `sqs.receiveMessage`, queue URL references +- SNS: `SnsEvent`, topic ARN references +- EventBridge: rule definitions, event patterns +- Kafka: consumer group configs, topic references + +## Data Model Detection + +| Pattern | What to Extract | +| ------------------------------------------------------------ | ------------------------------------------ | +| ORM models (Sequelize, SQLAlchemy, TypeORM, Prisma) | Entity names, fields, types, relationships | +| Migration files | Schema evolution, table structures | +| Type definitions (TypeScript interfaces, Python dataclasses) | Domain object shapes | +| JSON Schema files | Validation rules, field constraints | +| Protobuf/Avro definitions | Message formats, service contracts | + +## Dependency Detection + +Look for external and internal dependencies: + +- SDK client instantiation (`new S3Client()`, `boto3.client('s3')`) +- HTTP client calls to external or internal service URLs +- Database connection strings and queue/topic ARN references +- Environment variables referencing endpoints +- Shared library imports and cross-service event publishing/subscribing diff --git a/plugins/codebase-documentor/skills/document-service/references/error-scenarios.md b/plugins/codebase-documentor/skills/document-service/references/error-scenarios.md new file mode 100644 index 00000000..fd7be292 --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/references/error-scenarios.md @@ -0,0 +1,42 @@ +# Error Scenarios + +How to handle common failure conditions during codebase analysis. + +## Empty or Non-Existent Directory + +- Report: "Directory [path] not found" or "No source code files detected in [path]" +- Ask user to confirm or provide the correct target directory +- Do NOT generate documentation for an empty directory + +## No Entry Point Found + +- Report: "No clear entry point detected in [directory]" +- List files analyzed and ask user to specify the main entry point + +## IaC Not Found + +- Inform: "No IaC files detected. Deployment section will be based on code analysis only." +- Ask: "Is your IaC in a different directory?" +- If user provides IaC location: analyze that location for infrastructure context +- If no IaC exists: proceed with code-only analysis, note in Architecture Overview + +## Architecture Diagram Skill Not Available + +When the `deploy-on-aws` plugin (which provides the `aws-architecture-diagram` skill) is not installed: + +- Generate a Mermaid `flowchart TD` architecture overview directly in the Architecture Overview section +- Include all major services, data stores, external dependencies, and infrastructure boundaries +- A simple Mermaid overview is always better than no diagram +- Mermaid sequence/flow diagrams are generated inline regardless of plugin availability + +## Existing Documentation at Output Path + +- Checked in Step 1 (before autonomous workflow begins) +- If `CODEBASE_ANALYSIS.md` already exists: ask user "Overwrite or write to a different filename?" +- Do NOT proceed to Step 2 without resolving the output path + +## MCP Server Unavailable + +- Inform: "AWS documentation enrichment unavailable (MCP server not responding)" +- Proceed without AWS-specific enrichment +- Ask: "Continue without AWS service documentation links?" diff --git a/plugins/codebase-documentor/skills/document-service/references/exclusion-patterns.md b/plugins/codebase-documentor/skills/document-service/references/exclusion-patterns.md new file mode 100644 index 00000000..734e47b5 --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/references/exclusion-patterns.md @@ -0,0 +1,77 @@ +# Exclusion Patterns + +Standard patterns for files and directories to skip during codebase analysis. These are noise — they add no documentation value and waste analysis time. + +## Excluded Directories + +Skip these directory names at any depth: + +| Directory | Reason | +| ---------------- | ------------------------------------------------------------- | +| `.git` | Version control internals | +| `node_modules` | npm/yarn dependencies | +| `vendor` | Go/PHP vendored dependencies | +| `__pycache__` | Python bytecode cache | +| `.venv` / `venv` | Python virtual environments | +| `dist` | Build output | +| `build` | Build output | +| `out` | Build output | +| `.next` | Next.js build cache | +| `.nuxt` | Nuxt build cache | +| `target` | Rust/Java/Scala build output | +| `bin` | Compiled binaries | +| `obj` | .NET intermediate build output | +| `.idea` | JetBrains IDE config | +| `.vs` | Visual Studio config | +| `.vscode` | VS Code config (usually) | +| `coverage` | Test coverage reports | +| `.terraform` | Terraform provider cache | +| `cdk.out` | CDK synthesized output | +| `.serverless` | Serverless Framework build output | +| `.aws-sam` | SAM build output | +| `logs` | Application log files | +| `tmp` / `temp` | Temporary files | +| `.cache` | Various tool caches | +| `packages` | Monorepo packages (scan each individually, not the container) | + +## Excluded Files + +Skip these file patterns: + +| Pattern | Reason | +| ----------------------------------------------------------- | ------------------------- | +| `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml` | Dependency lock files | +| `poetry.lock`, `Pipfile.lock`, `Cargo.lock`, `Gemfile.lock` | Dependency lock files | +| `go.sum` | Go checksum database | +| `*.min.js`, `*.min.css` | Minified assets | +| `*.map` | Source maps | +| `*.pyc`, `*.pyo` | Python bytecode | +| `*.exe`, `*.dll`, `*.so`, `*.dylib` | Compiled binaries | +| `*.wasm` | WebAssembly binaries | +| `*.png`, `*.jpg`, `*.gif`, `*.ico`, `*.svg` | Image assets | +| `*.woff`, `*.woff2`, `*.ttf`, `*.eot` | Font files | +| `*.zip`, `*.tar.gz`, `*.jar`, `*.war` | Archives | +| `*.pb`, `*.proto` (compiled) | Compiled protocol buffers | +| `.DS_Store`, `Thumbs.db` | OS metadata | +| `.env` (actual, not `.env.example`) | Secrets — do not read | + +## Files to Always Include + +Even if they match no code pattern, these files carry high documentation value: + +- `README.md` / `README.*` +- `CONTRIBUTING.md` +- `CHANGELOG.md` +- `Dockerfile`, `docker-compose.yml` +- `.env.example`, `.env.template` +- `Makefile`, `Justfile` +- CI/CD configs (`.github/workflows/*.yml`, `buildspec.yml`, `.gitlab-ci.yml`) +- IaC files (`cdk.json`, `*.tf`, `template.yaml`, `serverless.yml`) +- Package manifests (`package.json`, `go.mod`, `Cargo.toml`, `pom.xml`, `requirements.txt`, `pyproject.toml`) + +## Applying Exclusions + +1. Build the file tree first (full recursive listing) +2. Remove all paths matching excluded directories and files +3. The filtered tree becomes the working set for all subsequent analysis +4. If a `.gitignore` exists, respect its patterns as additional exclusions diff --git a/plugins/codebase-documentor/skills/document-service/references/framework-patterns.md b/plugins/codebase-documentor/skills/document-service/references/framework-patterns.md new file mode 100644 index 00000000..a8dab3a8 --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/references/framework-patterns.md @@ -0,0 +1,105 @@ +# Framework Patterns + +Common framework conventions for extracting architecture and documentation from application code. + +## Web Frameworks + +### Express.js / Node.js + +| Pattern | Where to Find | What It Reveals | +| ----------------------------------------- | ---------------- | -------------------------------------- | +| `app.listen(port)` | Entry point file | Server port, startup sequence | +| `app.use(middleware)` | App setup | Middleware chain (auth, logging, CORS) | +| `router.get/post/put/delete` | Route files | API endpoints | +| `mongoose.model()` / `sequelize.define()` | Model files | Data models and relationships | +| `new SQSClient()` / `new S3Client()` | Service files | AWS service dependencies | + +### FastAPI / Python + +| Pattern | Where to Find | What It Reveals | +| ---------------------------- | -------------- | ---------------------------------- | +| `@app.get()` / `@app.post()` | Router files | API endpoints with type hints | +| `class Model(BaseModel)` | Schema files | Request/response models (Pydantic) | +| `class Model(Base)` | Model files | Database models (SQLAlchemy) | +| `Depends()` | Route handlers | Dependency injection chain | +| `boto3.client('service')` | Service files | AWS service dependencies | + +### Django / Python + +| Pattern | Where to Find | What It Reveals | +| ----------------------------------------------- | -------------- | ------------------------------------ | +| `urlpatterns = [path()]` | urls.py | URL routing structure | +| `class Model(models.Model)` | models.py | Database schema | +| `class Serializer(serializers.ModelSerializer)` | serializers.py | API contracts (DRF) | +| `DATABASES` in settings | settings.py | Database configuration | +| `INSTALLED_APPS` | settings.py | Application modules and dependencies | + +### Spring Boot / Java + +| Pattern | Where to Find | What It Reveals | +| -------------------------------------------- | --------------------- | ------------------------------- | +| `@RestController` | Controller classes | API endpoint groups | +| `@GetMapping` / `@PostMapping` | Controller methods | Individual endpoints | +| `@Entity` | Entity classes | JPA data models | +| `@Repository` | Repository interfaces | Data access patterns | +| `application.yml` / `application.properties` | Config files | All configuration including AWS | + +### Go + +| Pattern | Where to Find | What It Reveals | +| ----------------------------------------------------- | -------------------- | ------------------------------------ | +| `http.HandleFunc()` / `mux.Handle()` | Main or router files | HTTP endpoints | +| `struct` definitions | Model files | Data structures | +| `sql.Open()` / `gorm.Open()` | Database setup | Database connections | +| `config.LoadDefaultConfig()` / `session.NewSession()` | AWS client setup | AWS service dependencies (v2/v1 SDK) | + +## AWS CDK Patterns + +| Pattern | What It Creates | Key Properties | +| -------------------------------------- | -------------------- | ------------------------------------ | +| `new lambda.Function()` | Lambda function | handler, runtime, environment | +| `new sqs.Queue()` | SQS queue | visibilityTimeout, deadLetterQueue | +| `new dynamodb.Table()` | DynamoDB table | partitionKey, sortKey, billingMode | +| `new apigateway.RestApi()` | API Gateway | endpoints, authorizers | +| `new ecs.FargateService()` | Fargate service | taskDefinition, desiredCount | +| `new s3.Bucket()` | S3 bucket | encryption, versioned, removalPolicy | +| `addEventSource(new SqsEventSource())` | Event source mapping | Lambda-to-SQS binding | + +## Configuration Patterns + +| Source | What to Extract | +| --------------------------------- | ---------------------------------------- | +| `.env.example` / `.env.template` | Environment variable names and purposes | +| `process.env.VAR_NAME` (Node.js) | Runtime configuration dependencies | +| `os.environ['VAR_NAME']` (Python) | Runtime configuration dependencies | +| `ssm.GetParameter()` | AWS Systems Manager parameter references | +| `secretsmanager.GetSecretValue()` | AWS Secrets Manager references | + +## Monorepo Patterns + +| Pattern | Where to Find | What It Reveals | +| ------------------------------- | ------------------- | ---------------------------------- | +| `pnpm-workspace.yaml` | Root | Workspace roots and package layout | +| `packages/*/package.json` | Package manifests | Internal dependencies, shared libs | +| `extensions/*/package.json` | Extension manifests | Plugin/extension ecosystem | +| `.projenrc.ts` / `.projenrc.js` | Root | Full project config, deps, tasks | +| `turbo.json` / `nx.json` | Root | Build pipeline, task dependencies | +| `lerna.json` | Root | Package management strategy | + +## CLI Patterns + +| Pattern | Where to Find | What It Reveals | +| ------------------------------------------ | ---------------------- | ----------------------------------- | +| `commander` / `yargs` / `clipanion` import | Entry point / CLI file | CLI framework and command structure | +| `.command()` / `.addCommand()` | CLI setup | Available commands and subcommands | +| `.option()` / `.argument()` | Command definitions | CLI arguments and flags | +| `bin` field in `package.json` | Package manifest | CLI entry point binary name | + +## Test Patterns + +| Pattern | What It Reveals | +| ---------------------- | -------------------------------------------- | +| Integration test setup | External service dependencies, test fixtures | +| Mock definitions | Expected interfaces of external services | +| Test data factories | Domain object shapes and relationships | +| E2E test scenarios | Key user workflows and business processes | diff --git a/plugins/codebase-documentor/skills/document-service/references/recursive-analysis.md b/plugins/codebase-documentor/skills/document-service/references/recursive-analysis.md new file mode 100644 index 00000000..c208ff87 --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/references/recursive-analysis.md @@ -0,0 +1,71 @@ +# Large Codebase Strategy + +For codebases too large to analyze in a single pass, use parallel decomposition or tracked sequential analysis. + +## When to Use + +Apply this strategy when **any** of these conditions is true: + +- Multiple top-level modules or packages exist (e.g., monorepo with `packages/`, `apps/`, `services/`) +- Deep directory structure (3+ levels of nesting with source files) +- Hundreds of source files after applying exclusion patterns +- The documentation outline from Step 3 produces 6+ sections mapped to distinct file sets + +For single-module codebases with a flat structure, process outline sections directly without this strategy. + +## Approach A: Tracked Sequential Analysis (Primary) + +The primary approach works on all platforms (Claude Code, Cursor, Codex, or any coding assistant). Create a progress file to track analysis state: + +```markdown +# Analysis Progress + +| Section | Mapped Files | Status | Key Findings | +| ----------------------- | ---------------------------- | ----------- | ------------ | +| Architecture Overview | lib/*-stack.ts, src/app.ts | DONE | Event-driven | +| Order Processing Module | src/order-api/**, handlers/* | IN_PROGRESS | 3 endpoints | +| Payment Integration | src/payment/** | PENDING | - | +``` + +Save as `.codebase-documentor-progress.md` in the target directory. This enables: + +- **Resumability** — A new session reads the progress file and continues from the next PENDING section +- **Visibility** — Users can check progress at any time +- **Completeness tracking** — Clear view of what has been analyzed + +## Section Analysis Steps + +For each outline section (in either approach): + +1. **Read mapped source files on-demand** — Re-read for exact line numbers; do not rely on earlier scan memory +2. **Grep for patterns** — Route definitions, model declarations, error handlers. Faster than reading every file. +3. **Record findings with citations** — Every claim needs a verified `file:line` citation +4. **Note cross-module dependencies** — Track which modules depend on which +5. **Mark section complete** — Update progress (agent result or task board) + +## Assembly + +After all sections are complete: + +1. Resolve cross-module dependencies into a coherent Architecture Overview +2. Generate the architecture diagram +3. Assemble final `CODEBASE_ANALYSIS.md` following the template structure +4. Remove the progress file (if created) + +## Principles + +- **Outline-driven** — The documentation outline determines analysis structure, not the directory layout +- **On-demand reading** — Read source files when writing each section, not in a bulk scan phase +- **Depth-first** — Complete one section fully before starting the next +- **Bottom-up for dependencies** — Analyze shared/utility modules before their consumers when possible +- **Cross-cutting sections last** — Sections like Failure Modes that span modules are analyzed after all module-specific sections + +## Approach B: Parallel Workers (Optional Acceleration) + +When the environment supports spawning parallel workers (sub-agents, background tasks, or similar): + +1. **Assign outline sections to independent workers** — Each section becomes a task for a parallel worker. Each reads only its mapped source files and produces section content with citations. +2. **Keep architecture and cross-cutting sections in the main session** — Architecture Overview, Failure Modes, and Configuration span multiple modules and require accumulated context. +3. **Collect and assemble** — Gather findings from all workers, resolve cross-module dependencies, and assemble into the final CODEBASE_ANALYSIS.md. + +This is an acceleration of Approach A — the same section analysis steps apply. The progress file is still useful for tracking which workers have completed. diff --git a/plugins/codebase-documentor/skills/document-service/references/technical-doc-template.md b/plugins/codebase-documentor/skills/document-service/references/technical-doc-template.md new file mode 100644 index 00000000..c4bb53ae --- /dev/null +++ b/plugins/codebase-documentor/skills/document-service/references/technical-doc-template.md @@ -0,0 +1,211 @@ +# Codebase Analysis Template + +Structure for the generated `CODEBASE_ANALYSIS.md` file. Every section must include source citations. Business context is included as a section at the end (not a separate file). + +## Document Structure + +```markdown +# [Service Name] — Codebase Analysis + +> Generated by codebase-documentor on [date] +> Target directory: [path] + +## Architecture Overview + +[1-3 paragraphs: design patterns, communication style (REST, event-driven, GraphQL), +key technology choices. For each major technology choice, explain WHY it was chosen +if the rationale is visible in code comments, TODOs, or naming conventions. +If no rationale is visible, note [RATIONALE UNKNOWN].] + +![Architecture](./docs/.drawio.png) + +> Editable source: [`docs/.drawio`](./docs/.drawio) + +[If PNG export was not available, link to .drawio file directly. +If architecture skill was not available, use Mermaid flowchart here instead.] + +> Sources: +> +> - [`file:line`](./file#L1) — [what this source reveals] + +## Code Analysis + +[Module structure, entry points, key code paths, internal dependencies. +Give readers enough context to navigate the codebase.] + +> Sources: +> +> - `file:line` — [what this source reveals] + +## Request Lifecycle + +[For each major entry point (API endpoint, message handler, CLI command), +trace the COMPLETE path from entry to response. Show every intermediate step, +function call, transformation, and external interaction. Annotate with timeouts +and failure points. This section answers: "if it breaks, where do I look?"] + +### [Entry Point Name] + +1. [Step] — `file:line` (`functionName`) [timeout if applicable] +2. [Step] — `file:line` (`functionName`) +3. [Step] — `file:line` (`functionName`) + +## Domain Logic Deep-Dive + +[Identify the 2-3 most complex or domain-specific code paths in the codebase. +These are where production bugs will occur. Document HOW they work — the +algorithm, key parameters, edge cases, and non-obvious behavior. A surface-level +summary provides no value over a naive AI prompt.] + +### [Complex Component Name] + +**What it does:** [one sentence] +**How it works:** [step-by-step algorithm description with citations] +**Key parameters:** [table of parameters that control behavior] +**Edge cases:** [known edge cases, boundary conditions, error paths] + +> Sources: +> +> - `file:line` — [what this source reveals] + +## Startup and Initialization + +[Trace the boot sequence: what happens from process start to ready-to-serve? +Document model loading, cache warmup, dependency checks, health check registration. +Flag any startup failures that would cause silent unavailability.] + +## Components + +[One row per logical component. Use a table.] + +| Component | Type | Purpose | Source | +| --------- | ------------------------------ | ------- | ----------- | +| [name] | [Lambda/Fargate/Queue/DB/etc.] | [role] | `file:line` | + +## API Contracts + +[Document every detected endpoint. Include OpenAPI/Swagger docs if detected.] + +### [Endpoint/Route] + +- Method: [GET/POST/PUT/DELETE] +- Path: [route path] +- Request/Response: [schema or description] +- Auth: [authentication method] +- Source: `file:line` + +## Data Models + +[Core entities with field-level detail. Include relationships.] + +| Field | Type | Description | Source | +| ------- | ------ | ----------- | ----------- | +| [field] | [type] | [purpose] | `file:line` | + +## Deployment + +[IaC stack description. If no IaC, note "No IaC detected" and describe +deployment from CI/CD configs, Dockerfiles, or scripts.] + +| Resource | Type | Configuration | Source | +| -------- | ---------------------- | ------------- | ----------- | +| [name] | [AWS service/resource] | [key config] | `file:line` | + +## Configuration + +[All environment variables, feature flags, and secrets references. +Also analyze prompt templates, YAML configs, and any configuration files +that control application behavior.] + +| Variable | Purpose | Default | Source | +| -------- | ------------------ | ---------------- | ----------- | +| [name] | [what it controls] | [default if any] | `file:line` | + +## Monitoring and Observability + +[Document what observability is available: log groups, metrics, tracing, +alarms, dashboards. Include how to correlate request IDs across layers. +If monitoring is sparse, flag what SHOULD be monitored but isn't.] + +| Signal | Type | Location/Config | Source | +| ----------- | ------------------ | ------------------ | ----------- | +| [log group] | CloudWatch Logs | [log group name] | `file:line` | +| [metric] | CloudWatch Metrics | [namespace/metric] | `file:line` | +| [trace] | X-Ray / tracing | [configuration] | `file:line` | + +## Security + +[Consolidate all security-relevant findings: encryption, IAM, network isolation, +authentication, secrets management, compliance. Only include what is found in code.] + +| Aspect | Implementation | Source | +| -------------- | ---------------------- | ----------- | +| Encryption | [at-rest / in-transit] | `file:line` | +| Authentication | [method] | `file:line` | +| Network | [VPC, SGs, endpoints] | `file:line` | +| Secrets | [management approach] | `file:line` | +| Compliance | [checks, if any] | `file:line` | + +> Sources: +> +> - `file:line` — [what this source reveals] + +## Local Development + +[How to run/test the system locally. Look for docker-compose files, +Makefile targets, scripts/, dev server commands, CPU fallback modes, +and test fixtures. If no local dev setup exists, note this as a gap. +Include environment setup prerequisites.] + +## Discrepancies + +[CRITICAL for legacy codebases. Cross-reference README, CLAUDE.md, package.json +descriptions, and code comments against the actual deployed code. Flag every +mismatch. For legacy inheritors, knowing what the previous team SAID vs what +they BUILT is the most valuable possible insight.] + +| Claim Source | Claim | Reality | Source | +| ------------------ | -------------------------- | ------------------------- | ----------- | +| [README/package/…] | [what docs/metadata claim] | [what code actually does] | `file:line` | + +Also flag: features referenced in project metadata (package.json deps, README +sections, CI configs) that have no corresponding implementation in the codebase. + +## Failure Modes + +[Both handled failures and unhandled risks. Mark unhandled with [RISK]. +For each failure mode, include detection method and recovery steps — not just +a description. The reader is an on-call engineer at 3am.] + +| Failure | Detection | Recovery | Risk Level | Source | +| ---------- | -------------------- | ----------------- | ----------------- | ----------- | +| [scenario] | [metric/log/symptom] | [commands to fix] | [HIGH/MEDIUM/LOW] | `file:line` | + +## Timeout and Dependency Chain + +[When multiple layers have timeouts or retry configs, map the complete chain +so operators know exactly where a request will fail and why. Include: API +gateway timeout, load balancer idle timeout, Lambda/container timeout, +HTTP client timeout, health check intervals.] + +| Layer | Timeout/Interval | Source | +| ------------- | ---------------- | ----------- | +| [API Gateway] | [29s] | `file:line` | +| [ALB] | [300s idle] | `file:line` | +| [Lambda] | [5min] | `file:line` | + +## Runbook Hints + +[Only if inferable from scripts, CI configs, or Dockerfiles. Do not speculate.] + +> Sources: +> +> - `file:line` — [what this source reveals] + +## Business Context + +[Include when the codebase reveals clear business capabilities. +Skip only for pure libraries or infrastructure-only code. +See business-context.md for section details. +Key sections: Business Capabilities, Data Flows, SLAs.] +```