-
Notifications
You must be signed in to change notification settings - Fork 0
feat: TrueFoundry resilience pivot — three fault-injection demo scenarios #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
99ebae5
cleaned up fake infrastructure (fake mcp, fake upstream, old placehol…
Tom-Shuhong-Tang 5329a0e
chore: update localstripe_demo to b2d7273 (eval-trigger service)
Tom-Shuhong-Tang 48d1eff
fixed eval to run test
Tom-Shuhong-Tang cd00680
feat: add eval web UI and custom eval endpoint
Tom-Shuhong-Tang dfb44e1
chore: update localstripe_demo to 9fc10bc (seed entrypoint + demo cha…
Tom-Shuhong-Tang a35dbed
docs: add TrueFoundry resilience pivot design spec
henryqingmo 4c608a0
docs: add TrueFoundry resilience pivot implementation plan
henryqingmo 2c3fb8d
feat(eval-runner): accept upstream_error policyOutcome
henryqingmo 50c0c74
feat(gateway): write upstream_error audit record on forwarder failure
henryqingmo 489a0ec
feat(gateway): write expired audit record on approval timeout
henryqingmo 33457b3
feat(gateway): make approval timeout configurable via APPROVAL_LOCK_T…
henryqingmo 6b618eb
feat(compose): add mock-slack service and wire APPROVAL_LOCK_TTL + SL…
henryqingmo 396a831
feat(evalsuite): add resilience eval cases for mcp-down and approval-…
henryqingmo 56fdf00
feat: add demo-resilience script and make target for TrueFoundry subm…
henryqingmo 0c471a8
fix(eval-runner): require policyOutcome field in eval cases
henryqingmo 8f62c6f
Apply suggestion from @gemini-code-assist[bot]
henryqingmo d183066
Apply suggestion from @gemini-code-assist[bot]
henryqingmo aefa200
Apply suggestion from @gemini-code-assist[bot]
henryqingmo 49bed8e
fix: resolve port conflict in demo-resilience script
henryqingmo d351d23
feat: cache initialize/tools-list responses for upstream-down resilience
henryqingmo 86172cf
fix: warm gateway capability cache before fault injection
henryqingmo b47751b
fix: add eval-trigger healthcheck and Makefile build-compose-bins target
henryqingmo cb592c6
fix: poll audit_log until terminal decision record appears
henryqingmo 21b0362
fix: allow upstream_error and expired in audit_log decision constraint
henryqingmo 653291e
fix: remove timeout command (not available on macOS)
henryqingmo 854456d
fix: wait for localstripe-mcp health before Scenario 3
henryqingmo 7c4ebb3
fix: make demo-resilience 3/3 pass — seed data, per-scenario YAML, se…
henryqingmo f0e596c
merge: integrate remote Gemini suggestions, keep tested resilience fixes
henryqingmo c857c9d
Merge remote-tracking branch 'origin/main' into eval-gate2
henryqingmo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,14 @@ | ||
| .PHONY: demo | ||
| .PHONY: demo demo-resilience | ||
|
|
||
| demo: | ||
| EVAL_COMPOSE_FILE=deploy/docker-compose.yml \ | ||
| POSTGRES_DSN=postgres://gateway:gateway@127.0.0.1:15432/gateway?sslmode=disable \ | ||
| AGENT_URL=http://127.0.0.1:18085 \ | ||
| go run ./cmd/eval-runner evalsuite/default.yaml | ||
|
|
||
| demo-resilience: build-compose-bins | ||
| @bash scripts/demo-resilience.sh | ||
|
|
||
| build-compose-bins: | ||
| @mkdir -p .compose-bin | ||
| @GOOS=linux GOARCH=$$(go env GOARCH) CGO_ENABLED=0 go build -o .compose-bin/gateway ./cmd/gateway |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,170 @@ | ||
| package main | ||
|
|
||
| import ( | ||
| "context" | ||
| _ "embed" | ||
| "encoding/json" | ||
| "fmt" | ||
| "log/slog" | ||
| "net/http" | ||
| "os" | ||
| "strings" | ||
|
|
||
| "github.com/jackc/pgx/v5/pgxpool" | ||
| ) | ||
|
|
||
| //go:embed ui.html | ||
| var uiHTML []byte | ||
|
|
||
| type evalResponse struct { | ||
| Passed bool `json:"passed"` | ||
| PassCount int `json:"pass_count"` | ||
| TotalCount int `json:"total_count"` | ||
| Cases []CaseResult `json:"cases"` | ||
| Report string `json:"report"` | ||
| } | ||
|
|
||
| func serve(suitePath string) error { | ||
| cfg, err := LoadConfig() | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| suite, err := LoadSuite(suitePath) | ||
| if err != nil { | ||
| return fmt.Errorf("load suite: %w", err) | ||
| } | ||
|
|
||
| ctx := context.Background() | ||
| db, err := openPostgresPool(ctx, cfg.PostgresDSN) | ||
| if err != nil { | ||
| return fmt.Errorf("connect to postgres: %w", err) | ||
| } | ||
| defer db.Close() | ||
|
|
||
| pool, ok := db.(*pgxpool.Pool) | ||
| if !ok { | ||
| return fmt.Errorf("database connection is not a *pgxpool.Pool") | ||
| } | ||
| runner := NewCaseRunner(cfg.AgentURL, pool) | ||
|
|
||
| // AI agent runner — optional, only active when AI_AGENT_URL is set | ||
| aiAgentURL := os.Getenv("AI_AGENT_URL") | ||
| var aiRunner caseExecutor | ||
| var aiSuite *EvalSuite | ||
| if aiAgentURL != "" { | ||
| aiRunner = NewCaseRunner(aiAgentURL, pool) | ||
| aiSuitePath := os.Getenv("AI_SUITE_PATH") | ||
| if aiSuitePath == "" { | ||
| aiSuitePath = "evalsuite/ai-agent.yaml" | ||
| } | ||
| aiSuite, err = LoadSuite(aiSuitePath) | ||
| if err != nil { | ||
| return fmt.Errorf("load AI suite: %w", err) | ||
| } | ||
| } | ||
|
|
||
| port := os.Getenv("EVAL_SERVE_PORT") | ||
| if port == "" { | ||
| port = "8099" | ||
| } | ||
|
|
||
| http.HandleFunc("GET /", func(w http.ResponseWriter, r *http.Request) { | ||
| w.Header().Set("Content-Type", "text/html; charset=utf-8") | ||
| _, _ = w.Write(uiHTML) | ||
| }) | ||
|
|
||
| http.HandleFunc("POST /run-eval", makeEvalHandler(runner, suite, pool)) | ||
|
|
||
| http.HandleFunc("POST /run-eval/ai", func(w http.ResponseWriter, r *http.Request) { | ||
| if aiRunner == nil { | ||
| http.Error(w, `{"error":"AI_AGENT_URL not configured"}`, http.StatusServiceUnavailable) | ||
| return | ||
| } | ||
| makeEvalHandler(aiRunner, aiSuite, pool)(w, r) | ||
| }) | ||
|
|
||
| http.HandleFunc("POST /run-eval/custom", makeCustomEvalHandler(pool)) | ||
|
|
||
| http.HandleFunc("GET /healthz", func(w http.ResponseWriter, r *http.Request) { | ||
| w.WriteHeader(http.StatusOK) | ||
| }) | ||
|
|
||
| slog.Info("eval server listening", "port", port) | ||
| return http.ListenAndServe(":"+port, nil) | ||
| } | ||
|
|
||
| func makeCustomEvalHandler(pool *pgxpool.Pool) http.HandlerFunc { | ||
| return func(w http.ResponseWriter, r *http.Request) { | ||
| var body struct { | ||
| Suite string `json:"suite"` | ||
| AgentURL string `json:"agent_url"` | ||
| } | ||
| if err := json.NewDecoder(r.Body).Decode(&body); err != nil { | ||
| http.Error(w, fmt.Sprintf("invalid request: %v", err), http.StatusBadRequest) | ||
| return | ||
| } | ||
| if body.AgentURL == "" { | ||
| http.Error(w, "missing agent_url", http.StatusBadRequest) | ||
| return | ||
| } | ||
| if body.Suite == "" { | ||
| http.Error(w, "missing suite", http.StatusBadRequest) | ||
| return | ||
| } | ||
|
|
||
| suite, err := LoadSuiteFromReader(strings.NewReader(body.Suite)) | ||
| if err != nil { | ||
| http.Error(w, fmt.Sprintf("invalid suite: %v", err), http.StatusBadRequest) | ||
| return | ||
| } | ||
|
|
||
| runner := NewCaseRunner(body.AgentURL, pool) | ||
| makeEvalHandler(runner, suite, pool)(w, r) | ||
| } | ||
| } | ||
|
|
||
| func makeEvalHandler(runner caseExecutor, suite *EvalSuite, _ *pgxpool.Pool) http.HandlerFunc { | ||
| return func(w http.ResponseWriter, r *http.Request) { | ||
| results := make([]CaseResult, 0, len(suite.Cases)) | ||
| for _, testCase := range suite.Cases { | ||
| trace, err := runner.Run(r.Context(), testCase) | ||
| result := CaseResult{Name: testCase.Name} | ||
| if err != nil { | ||
| result.Failures = []CheckFailure{{ | ||
| Check: "run", | ||
| Expected: "case completes successfully", | ||
| Observed: err.Error(), | ||
| }} | ||
| } else { | ||
| result = Evaluate(testCase, trace) | ||
| } | ||
| results = append(results, result) | ||
| } | ||
|
|
||
| passCount := 0 | ||
| for _, r := range results { | ||
| if r.Passed { | ||
| passCount++ | ||
| } | ||
| } | ||
|
|
||
| report := GenerateReport(results) | ||
|
|
||
| if r.Header.Get("Accept") == "application/json" { | ||
| resp := evalResponse{ | ||
| Passed: passCount == len(results), | ||
| PassCount: passCount, | ||
| TotalCount: len(results), | ||
| Cases: results, | ||
| Report: report, | ||
| } | ||
| w.Header().Set("Content-Type", "application/json") | ||
| _ = json.NewEncoder(w).Encode(resp) | ||
| return | ||
| } | ||
|
|
||
| w.Header().Set("Content-Type", "text/plain") | ||
| _, _ = fmt.Fprint(w, report) | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
http.HandleFuncregisters handlers on the globalhttp.DefaultServeMux, which is a security risk as any package in the dependency tree can register routes on it. Additionally, passingniltohttp.ListenAndServeuses this global mux.Consider using a local
http.NewServeMuxto isolate your routes.