RFC: Advanced Workflow Extensions for Greenfield and Brownfield AI-DLC Projects
1. Background & Motivation
The AI-DLC methodology provides an incredibly robust, structured framework for guiding AI agents through the software development lifecycle. However, as organizations attempt to scale AI-DLC into massive, complex enterprise environments (both greenfield and brownfield applications), several gaps emerge:
- Sequential Bottlenecks: Agent execution is inherently sequential. In projects with dozens of independent modules, waiting for sequential code generation drastically increases the end-to-end completion time.
- Verification Gaps: The current Build & Test phase generates instructional Markdown for humans to execute, which breaks the autonomous verification loop.
- Brownfield Data Blindspots: Reverse Engineering currently relies heavily on schema discovery. In data-driven brownfield apps, the AI often guesses or hallucinates categorical values because it never profiles the actual data.
- Enterprise Code Safety: Standard code generation can produce silent, hard-to-detect failures in complex environments (e.g., pagination loops, middleware tracing loss, mock lifecycle pollution).
- Missing Estimation Bridge: Units of work are decomposed effectively but there is no translation layer for human project managers who need sprint-level sizing.
- Domain-Specific NFR Gaps: A global NFR pass covers system-wide concerns (database choice, caching layer, retry framework), but it cannot anticipate unit-specific performance characteristics — how a particular unit handles slow dependencies, what its latency budget is, or what happens at its specific system boundaries.
To address these challenges, I am proposing 6 new opt-in extensions.
Note: While I am presenting these 6 extensions together here to show the cohesive vision for Advanced/Enterprise scaling, I have opened them as 6 completely independent Pull Requests. This allows the community to debate, accept, or reject each extension on its own individual merit.
2. Proposed Extensions
2.1 Parallel Execution — 19 Rules (See PR #209)
Problem: Sequential generation of independent units of work is highly inefficient for large projects.
Solution: A comprehensive, coordinator-driven, adaptive parallel execution model built on the principle "Accuracy First — parallelism is a speed optimization, accuracy is non-negotiable."
Key capabilities across 19 rules (PARALLEL-EXEC-001 through PARALLEL-EXEC-019):
- Wave Dependency Planning with formal accuracy safety assessments per wave group
- Pre-Flight Accuracy Gate that re-validates conditions at execution time
- Sub-Task Parallelism within units (backend ∥ frontend, data pipeline ∥ visualization, ML training ∥ serving, IaC ∥ application code) with mandatory convergence gates
- Anticipatory Test Planning — starts test plans as soon as Functional Design is approved
- Cross-Unit Knowledge Sharing via
shared-patterns.md
- Coordinator Dispatch Model — full dispatch lifecycle with prompt templates, context discipline, result collection, and fallback-to-sequential
- Runtime Convergence Validation — catches hardcoded values that don't match actual data
- Critical Path Marking and Priority within waves
- Session Resume with context loading discipline for parallel dispatch
- Impact: Massively reduces overall execution time while maintaining deterministic, accuracy-first state tracking.
2.2 Build & Test Execution — 5 Rules (See PR #210)
Problem: The default AI-DLC Build & Test phase generates test instruction documentation for humans to run.
Solution: Upgrades the phase to enforce actual test execution with self-healing iteration.
Key capabilities across 5 rules (BUILD-TEST-EXEC-001 through BUILD-TEST-EXEC-005):
- Environment Validation Pre-flight — validates dependencies, test runner config, zero-warnings policy, setup file execution, and environment isolation
- Contract Verification — static pre-execution validation of import paths, identifiers, selectors, mock paths, pagination contracts, error serialization contracts, and mock lifecycle alignment
- Test Execution and Iteration — runs suites, categorizes failures, fixes code, iterates until green
- Inline Test Execution During Code Generation — runs tests immediately after each code gen step, not deferred to end
- Results Presentation — completion messages reflect actual test runner output, not estimates
- Impact: Closes the loop on autonomous verification and increases confidence in generated code.
2.3 Estimation Guidance — 2 Rules (See PR #211)
Problem: The Units Generation phase successfully breaks down work but provides no translation layer for human project managers (e.g., sprint planning).
Solution: Introduces structured estimation with two complementary rules:
- ESTIMATION-001 (Structured Estimation): Adds relative complexity sizing (story points or T-shirt sizes) as a mandatory measure, plus an optional conventional team estimate (developer-weeks) labeled clearly as "Reference Only"
- ESTIMATION-002 (Anti-Confusion Guard): Prevents presenting conventional estimates as AI-DLC execution predictions. AI-DLC elapsed time depends on approval gate throughput, not generation speed
- Impact: Bridges the gap between autonomous execution and agile project management tools.
2.4 Code Safety — 6 Rules (See PR #212)
Problem: AI code generation frequently misses complex, framework-level safety constraints that compile successfully but fail silently at runtime.
Solution: Injects strict rules during the Code Generation phase to prevent common silent failure classes, each derived from real production incidents:
- CODE-SAFETY-001 (Middleware/Request-Context Safety): Traces every request-scoped property to the middleware that populates it; verifies route-level vs global application
- CODE-SAFETY-002 (Configuration/Environment Isolation): Prevents config loaders from loading env files in test mode
- CODE-SAFETY-003 (Test Mock Lifecycle): Handles module reset invalidation, clearing vs resetting distinction, auto-mock limitations with
instanceof, and mock scope leakage
- CODE-SAFETY-004 (Pagination/Bounded-Query Safety): Ensures all pages are consumed, continuation tokens are acted on, and tests include multi-page scenarios
- CODE-SAFETY-005 (Error Serialization Consistency): Ensures all semantic error fields are serialized and tests assert the full response shape
- CODE-SAFETY-006 (Post-Refactor Test Alignment): Greps test files after any source refactor, updates stale mocks, and runs affected tests immediately
- Impact: Dramatically reduces hard-to-debug architectural bugs in the generated codebase.
2.5 NFR Compensation — 2 Rules (See PR #213)
Problem: A global NFR pass covers system-wide concerns (database choice, caching layer, retry framework), but it cannot anticipate unit-specific performance characteristics — how a particular unit handles slow dependencies, what its latency budget is, or what happens at its specific system boundaries. When units skip their dedicated NFR stages (e.g., because NFR was handled globally by a foundation unit), these domain-specific concerns are lost entirely.
Solution: Adds two complementary rules to the Functional Design phase:
- NFR-COMP-001 (Mandatory Performance Section): When a unit's NFR stages are skipped, its Functional Design must include a "Performance & Behavioral Considerations" section covering latency budgets, timeout strategies, resource constraints, edge-case behavior at system boundaries, and testable acceptance criteria specific to that unit.
- NFR-COMP-002 (Cross-Reference with Global NFR): Requires the unit to explicitly reference inherited global NFR decisions, identify gaps where global NFR doesn't cover this unit's specific needs, and flag conflicts for resolution before Code Generation.
- Impact: Ensures domain-specific performance concerns are captured where they are best understood — during the functional design of each domain unit — without requiring a full NFR stage re-run.
2.6 Data Profile — 4 Rules (See PR #214)
Problem: During Reverse Engineering, agents understand the database schema but remain blind to the actual values, leading to hallucinations when handling enums, categories, or edge-case nulls. Additionally, shared data dependencies have fragilities (CWD-relative paths, deprecated APIs) that aren't surfaced.
Solution: A comprehensive data profiling system that spans the full construction lifecycle:
- DATA-PROFILE-001 (Data Source Profiling): During Reverse Engineering, profiles every data source — extracts exact categorical values, key patterns (for NoSQL), numeric ranges, storage type variance (mixed-type detection for schema-less stores), and audits shared data dependencies (signatures, fragilities, safe usage patterns). Uses a 3-tier accessibility model (local → code-inferable → user-reported)
- DATA-PROFILE-002 (Functional Design Alignment): Requires Functional Design to read the profile and use exact values for filters, selectors, and business rules
- DATA-PROFILE-003 (Code Generation Accuracy): Enforces exact value matching for all hardcoded strings in generated code; mandates cross-referencing every filter value against the profile
- DATA-PROFILE-004 (Build & Test Validation): Validates that mixed-type attributes are handled defensively in source code
- Impact: Eliminates "empty result set" errors and runtime KeyErrors by ensuring the AI writes code based on reality, not schema assumptions.
3. Implementation Status
To demonstrate the viability of these extensions, I have drafted all 6 in accordance with the CONTRIBUTING.md standards.
- Fully Agnostic: They do not assume specific IDEs or platforms.
- Linting: All Markdown files strictly adhere to the project's
markdownlint-cli2 rules (0 errors).
- Format: They utilize the established
extension.md and extension.opt-in.md structure.
4. Request for Comments
I would love to hear feedback from the maintainers and the community on the following topics across all 6 extensions:
Parallel Execution
- Does the wave-based Coordinator model align with the long-term vision for AI-DLC multi-agent scaling, or is the project considering alternative parallelism approaches (e.g., DAG-based continuous dispatch)?
- Should the wave tracking state live in
aidlc-state.md or in a dedicated file to prevent context window bloat on very large projects?
- The extension contains 19 rules — is this granularity appropriate, or should some rules be consolidated?
Build & Test Execution
- Is there a concern about agents running arbitrary test commands in user environments? Should there be a sandboxing or confirmation step before execution?
- How does the self-healing retry loop interact with the existing overconfidence-prevention rules? Should there be a maximum retry cap defined at the extension level?
- The contract verification step (BUILD-TEST-EXEC-002) includes checks for pagination, error serialization, and mock lifecycle — is there overlap concern with the Code Safety extension, or is the redundancy desirable as defense-in-depth?
Estimation Guidance
- Is the dual-estimate model (AI-DLC time vs. conventional developer-weeks) useful for the broader community, or would a single unified estimate be preferred?
- Should estimation outputs be integrated into
aidlc-state.md or kept as standalone artifacts?
Code Safety
- Are the 6 safety categories (middleware context, config isolation, mock lifecycle, pagination, error serialization, post-refactor alignment) comprehensive enough, or should additional categories be considered?
- How does this extension interact with the existing
property-based-testing extension? Should there be a formal recommendation to use them together?
NFR Compensation
- Is the two-rule approach (mandatory performance section + cross-reference with global NFR) the right granularity, or should this be a single combined rule?
- Should the extension trigger only when NFR stages are fully skipped, or should it also apply when NFR stages are executed but at reduced depth (e.g., a "lite" NFR pass)?
- How should conflicts between a unit's domain-specific NFR needs and global NFR decisions be escalated — should they block Code Generation or just be flagged as warnings?
Data Profile
- Are there privacy or security concerns with agents profiling real production data (e.g., PII exposure)? Should the extension include data masking guidance?
- Should the profiling depth (number of rows sampled, categorical value limits) be configurable via the opt-in prompt?
- The 3-tier accessibility model (local → code-inferable → user-reported) — is this sufficient, or should there be a tier for API-accessible data sources?
General
- Are there any concerns regarding the cumulative prompt token overhead when multiple extensions are enabled simultaneously?
- Would the maintainers prefer these to be merged gradually or evaluated as a consolidated batch?
Thank you for your time and guidance!
RFC: Advanced Workflow Extensions for Greenfield and Brownfield AI-DLC Projects
1. Background & Motivation
The AI-DLC methodology provides an incredibly robust, structured framework for guiding AI agents through the software development lifecycle. However, as organizations attempt to scale AI-DLC into massive, complex enterprise environments (both greenfield and brownfield applications), several gaps emerge:
To address these challenges, I am proposing 6 new opt-in extensions.
Note: While I am presenting these 6 extensions together here to show the cohesive vision for Advanced/Enterprise scaling, I have opened them as 6 completely independent Pull Requests. This allows the community to debate, accept, or reject each extension on its own individual merit.
2. Proposed Extensions
2.1 Parallel Execution — 19 Rules (See PR #209)
Problem: Sequential generation of independent units of work is highly inefficient for large projects.
Solution: A comprehensive, coordinator-driven, adaptive parallel execution model built on the principle "Accuracy First — parallelism is a speed optimization, accuracy is non-negotiable."
Key capabilities across 19 rules (
PARALLEL-EXEC-001throughPARALLEL-EXEC-019):shared-patterns.md2.2 Build & Test Execution — 5 Rules (See PR #210)
Problem: The default AI-DLC Build & Test phase generates test instruction documentation for humans to run.
Solution: Upgrades the phase to enforce actual test execution with self-healing iteration.
Key capabilities across 5 rules (
BUILD-TEST-EXEC-001throughBUILD-TEST-EXEC-005):2.3 Estimation Guidance — 2 Rules (See PR #211)
Problem: The Units Generation phase successfully breaks down work but provides no translation layer for human project managers (e.g., sprint planning).
Solution: Introduces structured estimation with two complementary rules:
2.4 Code Safety — 6 Rules (See PR #212)
Problem: AI code generation frequently misses complex, framework-level safety constraints that compile successfully but fail silently at runtime.
Solution: Injects strict rules during the Code Generation phase to prevent common silent failure classes, each derived from real production incidents:
instanceof, and mock scope leakage2.5 NFR Compensation — 2 Rules (See PR #213)
Problem: A global NFR pass covers system-wide concerns (database choice, caching layer, retry framework), but it cannot anticipate unit-specific performance characteristics — how a particular unit handles slow dependencies, what its latency budget is, or what happens at its specific system boundaries. When units skip their dedicated NFR stages (e.g., because NFR was handled globally by a foundation unit), these domain-specific concerns are lost entirely.
Solution: Adds two complementary rules to the Functional Design phase:
2.6 Data Profile — 4 Rules (See PR #214)
Problem: During Reverse Engineering, agents understand the database schema but remain blind to the actual values, leading to hallucinations when handling enums, categories, or edge-case nulls. Additionally, shared data dependencies have fragilities (CWD-relative paths, deprecated APIs) that aren't surfaced.
Solution: A comprehensive data profiling system that spans the full construction lifecycle:
3. Implementation Status
To demonstrate the viability of these extensions, I have drafted all 6 in accordance with the
CONTRIBUTING.mdstandards.markdownlint-cli2rules (0 errors).extension.mdandextension.opt-in.mdstructure.4. Request for Comments
I would love to hear feedback from the maintainers and the community on the following topics across all 6 extensions:
Parallel Execution
aidlc-state.mdor in a dedicated file to prevent context window bloat on very large projects?Build & Test Execution
Estimation Guidance
aidlc-state.mdor kept as standalone artifacts?Code Safety
property-based-testingextension? Should there be a formal recommendation to use them together?NFR Compensation
Data Profile
General
Thank you for your time and guidance!