Skip to content

Conversation

@eoinfennessy
Copy link
Contributor

@eoinfennessy eoinfennessy commented Jan 20, 2026

Summary

Fixes #4657 - security vulnerability where provider_codegen.py leaked AWS credentials and other secrets from environment variables into generated documentation.

Changes

  • Security fix: Inspect default_factory source code before executing to detect environment variable access
  • Extract fallback values from getenv("VAR", "fallback") patterns instead of executing
  • Leave defaults blank for env vars without fallbacks (semantically correct)
  • Update Bedrock config float defaults from "60" to "60.0" for clarity
  • Remove obsolete path normalization HACK (no longer needed)

Security Impact

Before: Running ./scripts/provider_codegen.py with AWS credentials in environment wrote actual secrets to docs:

| `aws_access_key_id` | ... | No | ASIAYQPE7PSIEEXAMPLE | ... |
| `aws_secret_access_key` | ... | No | wJalrXUtnFEMI/K7MDENG... | ... |

After: Secrets are never executed or written:

| `aws_access_key_id` | ... | No |  | ... |
| `aws_secret_access_key` | ... | No |  | ... |
| `connect_timeout` | ... | No | 60.0 | ... |

Testing

  • No secrets appear in regenerated documentation
  • Env vars without fallbacks show blank defaults
  • Env vars with fallbacks show the constant fallback value
  • All existing functionality preserved

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 20, 2026
Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity is not that high IMO, so perhaps we should concentrate on converting to SecretStr the remaining secrets. Thanks!

@eoinfennessy
Copy link
Contributor Author

@leseb I have opened #4681 to use SecretStr type for AWS credentials.

I still do feel there's benefit to the static code analysis approach used in this PR (even if it feels a bit hacky). It prevents any environment variable (or redaction "********") from accidentally being committed -- these may be easy to miss in large diffs.

@leseb
Copy link
Collaborator

leseb commented Jan 21, 2026

@leseb I have opened #4681 to use SecretStr type for AWS credentials.

I still do feel there's benefit to the static code analysis approach used in this PR (even if it feels a bit hacky). It prevents any environment variable (or redaction "********") from accidentally being committed -- these may be easy to miss in large diffs.

Not sure, i'd rather use a dedicated pre-commit hook rather than our own script, hooks like https://github.com/awslabs/git-secrets or https://github.com/Yelp/detect-secrets should help. Bonus point, this works across the code not just doc. What do you think?

@eoinfennessy
Copy link
Contributor Author

Not sure, i'd rather use a dedicated pre-commit hook rather than our own script, hooks like https://github.com/awslabs/git-secrets or https://github.com/Yelp/detect-secrets should help. Bonus point, this works across the code not just doc. What do you think?

I think that's a good call. I'll look into adding one of these to our pre-commit config.

The benefit of this PR is that it ensures env vars will never be added to docs, regardless of whether they are secrets, redacted content, or less interesting things like the list below. Even with the SecretStr fix, my pre-commit hook for the docs step fails because "********" is added as a default value in the generated docs. With this fix, pre-commit passes regardless of which env vars have been set.

These could all potentially end up getting added to docs -- not a big deal for most, but could cause some confusion for users if they were to get merged, and also some frustration for developers whose pre-commit docs check constantly fails and generates unwanted changes:

  • region_name - os.getenv("AWS_DEFAULT_REGION")
  • profile_name - os.getenv("AWS_PROFILE")
  • total_max_attempts - int(val) if (val := os.getenv("AWS_MAX_ATTEMPTS")) else None
  • retry_mode - os.getenv("AWS_RETRY_MODE")
  • connect_timeout - float(os.getenv("AWS_CONNECT_TIMEOUT", "60"))
  • read_timeout - float(os.getenv("AWS_READ_TIMEOUT", "60"))
  • session_ttl - int(os.getenv("AWS_SESSION_TTL", "3600"))
  • ...

@mattf
Copy link
Collaborator

mattf commented Jan 22, 2026

The benefit of this PR is that it ensures env vars will never be added to docs, regardless of whether they are secrets, redacted content, or less interesting things like the list below.

💯 this!

we should not capture developer or ci environment details in the docs.

Copy link
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reset env vars when running https://github.com/llamastack/llama-stack/blob/main/.pre-commit-config.yaml#L106? something like env -i bash?

@mattf
Copy link
Collaborator

mattf commented Jan 29, 2026

can we reset env vars when running https://github.com/llamastack/llama-stack/blob/main/.pre-commit-config.yaml#L106? something like env -i bash?

good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: provider_codegen.py leaks secrets from environment variables into generated documentation

3 participants