Skip to content

Conversation

@JohnRPunk
Copy link

Gotta test this once the token rollover goes tonight

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Assessment Summary

Critical Issues

Sensitive Data Exposure in Logging
The application logs the model name during LLM adapter initialization. The current pattern of passing API keys through command-line arguments and environment variables risks exposing these credentials if the logging level is misconfigured or if error messages inadvertently include them. It is imperative to ensure API keys are never logged. Implement secure credential handling by using environment variables with robust masking and conduct a thorough audit of all logging statements to eliminate any potential exposure of sensitive data.

High Severity Issues

Insufficient Input Validation for LLM Feedback
The mechanism for filtering findings based on LLM feedback is flawed. It relies on a basic string check for the word 'inaccurate' in the first line of feedback. This approach is vulnerable due to case sensitivity issues, lack of structural validation, and the potential for unintended substring matches, leading to false positives and negatives. To address this, implement proper validation. Utilize structured output from the LLM with dedicated boolean fields for accuracy assessment. At a minimum, employ case-insensitive exact matching with comprehensive parsing to ensure the feedback structure is valid before processing.

Medium Severity Issues

Directory Creation Race Condition (TOCTOU)
The code contains a time-of-check-time-of-use (TOCTOU) race condition during directory creation. The sequence of checking for a directory's existence and then creating it allows a window where another process could interfere, such as by creating a symlink, which could lead to security vulnerabilities. To resolve this, replace the manual existence check and creation logic with a single atomic operation. Use os.makedirs with the exist_ok=True parameter, which safely handles the directory creation and eliminates the race condition.

if args.llm == 'anthropic':
llm = AnthropicAdapter( api_key = args.llm_api_key, model=model)
if args.llm == "anthropic":
llm = AnthropicAdapter(api_key=args.llm_api_key, model=model)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The LLM adapter initialization logs the model name but could potentially expose API keys if the logging level is set incorrectly or if error messages include sensitive information. While not directly in the diff, the pattern of passing API keys through command-line arguments and environment variables without proper masking in logs is a concern.

Priority: MEDIUM

CWE: CWE-532

Recommendation: Ensure API keys are never logged, even in debug mode. Implement secure handling of credentials, use environment variables with proper masking, and audit logging statements to ensure no sensitive data is exposed.

Snippet: llm = AnthropicAdapter(api_key=args.llm_api_key, model=model)

saist/main.py Outdated
file_content = await scm.read_file_contents(finding.file)
system_prompt = prompts.CHECK_FINDING
logger.debug(f"Confirming {finding.issue} for {finding.file}")
prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The prompt construction includes potentially sensitive file content and security findings without sanitization. If the LLM service is compromised or logs prompts, this could expose sensitive code and security vulnerabilities to unauthorized parties. The prompt includes the entire file content and security issue details.

Priority: MEDIUM

CWE: CWE-200

Recommendation: Implement prompt sanitization or consider using a local LLM for sensitive security analysis. At minimum, truncate or redact sensitive portions of file content before sending to external LLM services.

Snippet: prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"

)
else:
hash: str = await hash_file(scm, filename)
cache_file = os.path.join(cache_folder, hash + ".json")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The cache file path is constructed by concatenating user-controlled input (hash) without proper validation. While the hash is generated from file content, if there's any way to influence the hash or cache_folder, this could lead to path traversal or file overwrite vulnerabilities.

Priority: LOW

CWE: CWE-22

Recommendation: Use os.path.join safely and validate the hash format. Consider using a safer approach like uuid for cache file names or implementing additional validation on the hash string.

Snippet: cache_file = os.path.join(cache_folder, hash + ".json")

scm, llm, app_files, max_concurrent, disable_tools, disable_caching, cache_folder
):
if disable_caching is False:
if not os.path.exists(cache_folder) or not os.path.isdir(cache_folder):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The directory creation check has a time-of-check-time-of-use (TOCTOU) race condition. Between checking if the directory exists and creating it, another process could create a symlink or modify the filesystem, potentially leading to security issues.

Priority: LOW

CWE: CWE-367

Recommendation: Use os.makedirs with exist_ok=True parameter which handles the race condition safely. Replace the existence check with a single atomic operation.

Snippet: if not os.path.exists(cache_folder) or not os.path.isdir(cache_folder):

@JohnRPunk
Copy link
Author

After running on cacti using gemini-2.5-pro:

477 findings before LLM check
419 after LLM check
With some improvements in the parser for the filter, this could be improved: atm, this only drops findings with 'inaccurate' in the first line of the LLM output.
Coulds:

  • Better batching for the false-positive checks
  • false-positive checks are not cached; it'll need that.

@JohnRPunk JohnRPunk marked this pull request as ready for review February 6, 2026 17:07
saist/main.py Outdated
print(f"{len(all_findings)} before LLM checks")

all_findings = [
f for f in all_findings if not "inaccurate" in f.feedback.strip().split("\n")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is awful logic - what about cases like:

  • "not accurate"
  • "Inaccurate"
  • "inaccurate" but ont line 2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; this was a quick check so I could see what fell out of the model.
Now I've had a proper look at the output, I think the better logic would be 'is "inaccurate" in the first three lines when converted to lowercase' or try to get the AI to spit out the final in json or something

@JohnRPunk JohnRPunk marked this pull request as draft February 10, 2026 09:37
@JohnRPunk
Copy link
Author

JohnRPunk commented Feb 10, 2026

TODO:

  • Pass finding in (w/ tools = true)
  • New model for review response
  • Replace the batched approach

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUMMARY OF SECURITY FINDINGS

CRITICAL ISSUES

PATH TRAVERSAL VULNERABILITY
A critical path traversal vulnerability was identified in the check_single_finding function within saist/main.py. The function directly uses user-controlled input from the 'file' field of a Finding object to read file contents via scm.read_file_contents() without any validation. An attacker who can manipulate this field could potentially read arbitrary files from the server's filesystem by using sequences like '../'. This could lead to exposure of sensitive data, including configuration files, source code, or credentials.

The immediate recommendation is to implement strict file path validation before any read operation. This includes normalizing the path, ensuring it resides within the intended and allowed directory scope, and rejecting any paths containing directory traversal sequences. A whitelist of allowed directories or a secure path joining function that anchors to a base directory is strongly advised to mitigate this risk.

)
else:
hash: str = await hash_file(scm, filename)
cache_file = os.path.join(cache_folder, hash + ".json")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The cache file path is constructed by concatenating cache_folder with hash + ".json" without validating the hash value. If an attacker can control the hash (e.g., through file manipulation), they could perform path traversal attacks to write files outside the cache directory.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Validate the hash parameter to ensure it doesn't contain path traversal sequences. Use os.path.normpath and check that the resulting path stays within the cache directory. Consider using a safer method like os.path.join with validated components.

Snippet: cache_file = os.path.join(cache_folder, hash + ".json")

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Security Findings

Critical Issues

Sensitive Data Exposure in LLM Processing
The application sends entire file contents to an LLM for verification without any filtering. This poses a high risk of exposing secrets, credentials, or other confidential data if present in the files. Such exposure could occur if the LLM service logs or stores these prompts, leading to significant information disclosure.

High Severity Issues

Insecure Data Merging Leading to Potential Manipulation
The code merges two dictionaries using an operator without validation. This is a high-risk operation because if the feedback dictionary contains conflicting keys, it can overwrite critical attributes from the original findings dictionary. This flaw could allow for the manipulation of security findings if the LLM were to return maliciously structured data, compromising the integrity of the analysis.

Medium Severity Issues

Over-Reliance on LLM Accuracy Resulting in Data Loss
The current filtering logic retains only those findings verified as accurate by the LLM, discarding all others. This creates a dependency on the LLM's correctness and could lead to the loss of legitimate security findings if the verification fails or is incorrect. This design flaw undermines the reliability of the security analysis by having no fallback for unverified or incorrectly flagged issues.

saist/main.py Outdated
file_content = await scm.read_file_contents(finding.file)
system_prompt = prompts.CHECK_FINDING
logger.info(f"Confirming {finding.issue} for {finding.file}")
prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code sends the entire file content to an LLM for verification, which could expose sensitive information if the file contains secrets, credentials, or other confidential data. This could lead to information disclosure if the LLM service logs or stores these prompts.

Priority: MEDIUM

CWE: CWE-200

Recommendation: Implement content filtering before sending to LLM, truncate large files, or use a more targeted approach that only sends relevant portions of the file rather than the entire content.

Snippet: prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"

saist/main.py Outdated
try:
feedback = await adapter.prompt_structured(system_prompt, prompt, Check)
args = dict(finding) | dict(feedback)
new_finding = CheckedFinding(**args)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code creates a CheckedFinding object using **args without validating the input. An LLM could potentially return malicious data that bypasses Pydantic validation or causes unexpected behavior. This could lead to injection attacks or data corruption in the security analysis pipeline.

Priority: MEDIUM

CWE: CWE-20

Recommendation: Add input validation and sanitization before creating the CheckedFinding object. Consider using Pydantic's model_validate method with strict mode or implementing additional validation logic.

Snippet: new_finding = CheckedFinding(**args)

saist/main.py Outdated
prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"
try:
feedback = await adapter.prompt_structured(system_prompt, prompt, Check)
args = dict(finding) | dict(feedback)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code merges two dictionaries using the | operator without validation. If the feedback dictionary contains keys that conflict with finding dictionary keys, it could overwrite critical attributes. This could potentially allow manipulation of security findings if the LLM returns malicious structured data.

Priority: MEDIUM

CWE: CWE-20

Recommendation: Validate the feedback dictionary structure before merging, or use a more controlled approach like creating a new CheckedFinding object with explicit field assignments rather than blind dictionary merging.

Snippet: args = dict(finding) | dict(feedback)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Summary Report

Critical Issues: Input Validation and Injection Risks

Path Traversal Vulnerabilities: Multiple instances of path traversal vulnerabilities were identified. In saist/util/output.py, the csv_path parameter is used directly without validation, which could allow an attacker to write files to arbitrary locations on the file system if user-controlled input is passed. Similarly, in saist/util/argparsing.py, multiple file output path parameters (csv-path, tex-filename, pdf-filename, cache-folder) accept arbitrary strings without validation, creating multiple vectors for path traversal attacks. All file paths that accept user input must be strictly validated and sanitized to prevent directory traversal. Input should be canonicalized and restricted to intended, safe directories.

Server-Side Request Forgery (SSRF): In saist/util/argparsing.py, the ollama-base-uri and openai-base-uri parameters accept arbitrary URLs without restriction. If the tool is exposed to untrusted input, this could be exploited to perform SSRF attacks, allowing an attacker to make requests to internal services or arbitrary external systems. Validate the URI format strictly. For local services like Ollama, consider restricting the base URI to localhost or a set of trusted domains. For external APIs like OpenAI, enforce the use of HTTPS and consider implementing an allowlist of permitted endpoints if feasible.

General Injection Risk: In saist/main.py, the security analysis tool itself may be vulnerable to injection attacks if it processes malicious file names or content without proper validation. This could lead to command injection or other attacks when the tool interacts with external systems or subprocesses. Implement strict, context-aware input validation for all user-controlled inputs, including file names, file content, and other parameters. Sanitize inputs before processing or passing them to any external system.

High Issues: Sensitive Data Exposure

Credential Exposure via Command Line: In saist/util/argparsing.py, sensitive credentials such as GitHub tokens and LLM API keys are accepted via command-line arguments. This practice exposes these credentials in process listings (e.g., ps commands) and shell history files, posing a significant risk of credential theft. The recommendation is to mandate the use of environment variables or secure credential stores for all sensitive tokens and API keys. The tool's argument parser should read from environment variables, and clear warnings should be added to the documentation and possibly to the runtime output if credentials are supplied via the command line.

Medium Issues: Code Quality and Operational Security

Unhandled Empty Iterable: In saist/util/output.py, the code accesses findings[0] without first checking if the findings list is empty. This will cause an IndexError to be raised when the iterable has no elements, leading to a crash. Add a conditional check to ensure the list is not empty before accessing elements, and handle the empty case gracefully according to application logic.

Debug Artifact in Production Code: In saist/util/output.py, a breakpoint() statement is present. If executed in a production environment, this will pause the application and open a debugger, which could lead to denial of service and potentially expose sensitive application state or memory if accessed. All debug statements, including breakpoint(), must be removed before the code is deployed to a production environment.

saist/main.py Outdated
"""
Checks a single finding against the file
"""
file_content = await scm.read_file_contents(finding.file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code reads file contents using finding.file without validating that the file path is within the expected directory scope. An attacker could potentially manipulate finding.file to read sensitive files outside the intended directory.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Validate that finding.file is within the expected directory scope before reading. Use path normalization and check that the resolved path doesn't escape the intended base directory.

Snippet: file_content = await scm.read_file_contents(finding.file)

saist/main.py Outdated
file_content = await scm.read_file_contents(finding.file)
system_prompt = prompts.CHECK_FINDING
logger.info(f"Confirming {finding.issue} for {finding.file}")
prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code constructs a prompt that includes file content from user-controlled input (finding.file). If finding.file contains malicious content or path traversal sequences, it could lead to information disclosure or other security issues when displayed in the prompt.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Validate and sanitize the finding.file input before using it in prompts. Ensure it's a legitimate file path within the expected scope and doesn't contain path traversal sequences or other malicious content.

Snippet: prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"

)
else:
hash: str = await hash_file(scm, filename)
cache_file = os.path.join(cache_folder, hash + ".json")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code constructs a cache file path by concatenating a hash with .json extension. While the hash is generated from file content, if an attacker can control the hash value (through file content manipulation), they might be able to perform path traversal attacks or overwrite arbitrary files.

Priority: LOW

CWE: CWE-22

Recommendation: Use os.path.join with proper validation and ensure the hash is properly validated. Consider using a safer approach like generating a secure filename from the hash.

Snippet: cache_file = os.path.join(cache_folder, hash + ".json")

scm, llm, app_files, max_concurrent, disable_tools, disable_caching, cache_folder
):
if disable_caching is False:
if not os.path.exists(cache_folder) or not os.path.isdir(cache_folder):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code checks if a cache folder exists and creates it if not. This creates a race condition where multiple processes could attempt to create the directory simultaneously, potentially leading to errors or security issues.

Priority: LOW

CWE: CWE-362

Recommendation: Use os.makedirs(cache_folder, exist_ok=True) which is atomic and handles the race condition properly.

Snippet: if not os.path.exists(cache_folder) or not os.path.isdir(cache_folder):

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Assessment Summary

Critical Findings

Sensitive Data Exposure in LLM Integration
The application transmits entire file contents to an external LLM service without sanitization. This occurs in the check_single_finding function within main.py. Any secrets, personally identifiable information, or other confidential data present in the file will be sent to the third-party service, creating a significant data leakage risk. It is recommended to implement a content sanitization layer that filters or truncates data before transmission. A whitelist approach for safe file types or pattern-matching to redact sensitive data should be considered.

High Severity Findings

Insecure Default Configuration for Network Communication
The default configuration for the Ollama service connection in argparsing.py uses an HTTP endpoint (http://localhost:11434). This transmits data in cleartext, which could lead to interception of sensitive information, especially if the service is exposed to a network or in environments where localhost traffic can be monitored. The recommendation is to default to HTTPS for secure transport or to provide clear warnings and documentation about the security implications of using an insecure connection, allowing users to explicitly configure the protocol.

Medium Severity Findings

Resource Exhaustion via Long Line Processing
A potential denial-of-service vector exists in the file parsing utility within argparsing.py. The configured maximum line length of 1000 characters is excessively high. An attacker could craft files with extremely long lines to cause excessive memory consumption or severe performance degradation during scanning. To mitigate this, reduce the default maximum line length to a more reasonable limit, such as 200 to 500 characters. Alternatively, implement streaming processing for large files to handle content without loading entire lines into memory at once.

saist/main.py Outdated
file_content = await scm.read_file_contents(finding.file)
system_prompt = prompts.CHECK_FINDING
logger.info(f"Confirming {finding.issue} for {finding.file}")
prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The prompt construction in check_single_finding function includes the full file content in the prompt sent to the LLM. This could lead to sensitive information disclosure if the file contains secrets, PII, or other confidential data that gets transmitted to external LLM services.

Priority: MEDIUM

CWE: CWE-200

Recommendation: Implement content filtering or truncation before sending file contents to external LLM services. Consider using a whitelist approach for file types or implementing a content sanitization layer that removes sensitive patterns before transmission.

Snippet: prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"

saist/main.py Outdated
"""
Checks a single finding against the file
"""
file_content = await scm.read_file_contents(finding.file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The check_single_finding function reads the entire file content without any validation of file type, size, or sensitivity. This could allow the LLM to access sensitive files that should not be analyzed, potentially exposing secrets or confidential information.

Priority: MEDIUM

CWE: CWE-552

Recommendation: Implement file access controls and validation before reading file contents. Check file extensions, size limits, and potentially implement a deny-list for sensitive file paths (e.g., .env, config files with secrets).

Snippet: file_content = await scm.read_file_contents(finding.file)

)
else:
hash: str = await hash_file(scm, filename)
cache_file = os.path.join(cache_folder, hash + ".json")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The cache file path is constructed by concatenating cache_folder with a hash value. If an attacker can control the filename or hash value, they might be able to perform path traversal attacks to write files outside the intended cache directory.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Use os.path.join safely and validate that the resulting path is within the intended cache directory. Consider using pathlib.Path with resolve() to ensure the path doesn't escape the cache directory.

Snippet: cache_file = os.path.join(cache_folder, hash + ".json")

help="Host for the web server to bind to",
envvar="SAIST_WEB_HOST",
action=EnvDefault,
default="127.0.0.1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The web server defaults to binding to 127.0.0.1 only, which is generally secure. However, if users change this to 0.0.0.0 or another interface, the web server might expose sensitive findings data without authentication. The tool lacks authentication mechanisms for the web interface.

Priority: MEDIUM

CWE: CWE-284

Recommendation: Add authentication or at minimum a warning when binding to non-localhost interfaces. Consider implementing basic authentication or token-based access control for the web interface.

Snippet: default="127.0.0.1"

help="Base uri of ollama",
envvar="SAIST_OLLAMA_BASE_URI",
action=EnvDefault,
default="http://localhost:11434",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default Ollama base URI is set to 'http://localhost:11434' which uses HTTP instead of HTTPS. This could expose sensitive data to interception if the tool is used in a network environment where localhost traffic might be sniffed or if the service is exposed to the network.

Priority: LOW

CWE: CWE-319

Recommendation: Consider using HTTPS by default or providing a warning about insecure connections. Alternatively, allow users to explicitly choose HTTP/HTTPS and document the security implications.

Snippet: default="http://localhost:11434"

envvar="SAIST_CACHE_FOLDER",
action=EnvDefault,
required=False,
default="SAISTCache",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default cache folder name 'SAISTCache' doesn't specify a full path, which could lead to caching sensitive data in insecure locations. Depending on the current working directory, cache files might be created in world-writable directories, exposing sensitive scan results.

Priority: LOW

CWE: CWE-276

Recommendation: Use a secure default location (e.g., user's home directory with appropriate permissions) or require explicit cache path configuration. Implement proper file permissions for cache files.

Snippet: default="SAISTCache"

type=int,
help="Maximum allowed line length, files with lines longer than this value will be skipped",
required=False,
default=1000,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default maximum line length is set to 1000 characters, which could allow processing of extremely long lines that might cause memory exhaustion or performance issues. Attackers could craft malicious files with very long lines to disrupt the scanning process.

Priority: LOW

CWE: CWE-400

Recommendation: Consider reducing the default maximum line length to a more reasonable value (e.g., 200-500 characters) or implementing streaming processing for large files to prevent memory exhaustion.

Snippet: default=1000

saist/main.py Outdated
file_content = await scm.read_file_contents(finding.file)
system_prompt = prompts.CHECK_FINDING
logger.info(f"Confirming {finding.issue} for {finding.file}")
prompt = f"\n\nReport:\n{finding.issue}\n\nFile: {finding.file}\n{file_content}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never work - not enough context

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SECURITY ASSESSMENT SUMMARY

CRITICAL ISSUES

PATH TRAVERSAL VULNERABILITY
In saist/scm/adapters/filesystem.py, the path traversal check is insufficient. The check uses is_relative_to() after os.path.join(), which could allow directory traversal if the filename contains '../' sequences. Furthermore, exceptions are only logged, potentially enabling silent failures. To remediate, use os.path.normpath() and os.path.commonpath() for robust checks before file operations. Prefer using pathlib.Path for all path manipulations.

DENIAL OF SERVICE RISK FROM UNBOUNDED FILE READS
In saist/scm/adapters/filesystem.py, file reading operations lack size limitations. The _iter_changed_files method reads entire file contents into memory, which could cause memory exhaustion and denial of service when processing large files. Implement file size checks before reading and enforce maximum size constraints based on application requirements. Consider using chunked reading or streaming approaches for handling large files.

HIGH SEVERITY ISSUES

EXPOSURE OF SECRETS AND API KEYS
In saist/main.py, GitHub tokens and API keys are passed directly to adapter constructors. This practice risks exposure through logging or error messages. All secrets should be managed via environment variables or a secure configuration service. Additionally, implement masking for any tokens or keys that appear in log output.

INFORMATION EXPOSURE THROUGH ERROR MESSAGES
Multiple instances in saist/main.py and saist/scm/adapters/filesystem.py reveal that error messages expose sensitive details. This includes security finding specifics, file paths, system information, and processing errors, which could aid attackers. All error messages in production must be sanitized to provide only generic, non-revealing information.

MEDIUM SEVERITY ISSUES

EXCESSIVE AND SENSITIVE LOGGING
Numerous logging issues exist in saist/main.py. Info and debug logs expose security findings, file names, system configuration, model names, adapter types, concurrency settings, and processing decisions. This information can reveal sensitive system state and internal details. Implement structured logging with appropriate log levels (e.g., limit debug logging in production). Sanitize all log output to remove sensitive data and avoid logging security findings directly.

INSECURE EXCEPTION HANDLING AND LOGGING
In saist/scm/adapters/filesystem.py, error logging captures full exceptions, potentially exposing sensitive filesystem structure and internal paths. Replace this with generic error messages. Consider using logger.exception() within proper exception-handling blocks to control the information disclosed.

VERBOSE CONSOLE OUTPUT
In saist/main.py, console output discloses the number of security findings, which reveals system state. Limit verbose output in production environments and ensure console logging uses appropriate severity levels.

)
filename = os.path.join(self.compare_path, filename)
try:
if not pathlib.Path(filename).is_relative_to(self.compare_path):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The path traversal check uses is_relative_to() which may not be sufficient. The check occurs after os.path.join() which could potentially allow directory traversal if the filename parameter contains '../' sequences. Additionally, the exception is caught and only logged, potentially allowing silent failures.

Priority: HIGH

CWE: CWE-22

Recommendation: Use os.path.normpath() and os.path.commonpath() for more robust path traversal checks. Ensure the check happens before any file operations. Consider using pathlib.Path for all path operations.

Snippet: if not pathlib.Path(filename).is_relative_to(self.compare_path):

if args.llm == 'anthropic':
llm = AnthropicAdapter( api_key = args.llm_api_key, model=model)
if args.llm == "anthropic":
llm = AnthropicAdapter(api_key=args.llm_api_key, model=model)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: API keys are passed directly to adapter constructors and could be logged or exposed in error messages

Priority: MEDIUM

CWE: CWE-200

Recommendation: Use environment variables or secure configuration management, mask API keys in logs

Snippet: llm = AnthropicAdapter(api_key=args.llm_api_key, model=model)

elif args.llm == 'bedrock':
llm = BedrockAdapter( api_key = args.llm_api_key, model=model)
elif args.llm == "bedrock":
llm = BedrockAdapter(api_key=args.llm_api_key, model=model)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: API keys are passed directly to adapter constructors and could be logged or exposed in error messages

Priority: MEDIUM

CWE: CWE-200

Recommendation: Use environment variables or secure configuration management, mask API keys in logs

Snippet: llm = BedrockAdapter(api_key=args.llm_api_key, model=model)

elif args.llm == 'deepseek':
llm = DeepseekAdapter(api_key = args.llm_api_key, model=model)
elif args.llm == "deepseek":
llm = DeepseekAdapter(api_key=args.llm_api_key, model=model)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: API keys are passed directly to adapter constructors and could be logged or exposed in error messages

Priority: MEDIUM

CWE: CWE-200

Recommendation: Use environment variables or secure configuration management, mask API keys in logs

Snippet: llm = DeepseekAdapter(api_key=args.llm_api_key, model=model)

elif args.llm == 'openai':
llm = OpenAiAdapter(api_key = args.llm_api_key, model=model)
elif args.llm == "openai":
llm = OpenAiAdapter(api_key=args.llm_api_key, model=model)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: API keys are passed directly to adapter constructors and could be logged or exposed in error messages

Priority: MEDIUM

CWE: CWE-200

Recommendation: Use environment variables or secure configuration management, mask API keys in logs

Snippet: llm = OpenAiAdapter(api_key=args.llm_api_key, model=model)

raise Exception(f"Tried to access file outside of the root: {filename}")

async with aiofiles.open(filename, mode='r', encoding='utf-8') as f:
async with aiofiles.open(filename, mode="r", encoding="utf-8") as f:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The file reading operation reads the entire file content without any size limitations. This could lead to denial of service if large files are processed, consuming excessive memory.

Priority: LOW

CWE: CWE-400

Recommendation: Implement file size limits before reading. Consider using chunked reading or setting maximum file size constraints based on the application's requirements.

Snippet: async with aiofiles.open(filename, mode="r", encoding="utf-8") as f:

a_path = f"{self.compare_path}/{filename}"

try:
with open(a_path) as a:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The file reading in _iter_changed_files() reads entire files into memory without size limits when generating diffs. This could lead to memory exhaustion when processing large files.

Priority: LOW

CWE: CWE-400

Recommendation: Implement file size checks before reading. Consider using streaming approaches for large files or setting reasonable file size limits.

Snippet: with open(a_path) as a:

help="Host for the web server to bind to",
envvar="SAIST_WEB_HOST",
action=EnvDefault,
default="127.0.0.1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The web server defaults to binding to 127.0.0.1 (localhost only). While this is secure for preventing unauthorized network access, it could be changed to 0.0.0.0 or a public IP without proper authentication or authorization controls. The web server may expose sensitive findings data.

Priority: LOW

CWE: CWE-284

Recommendation: Implement authentication for the web server interface, or at minimum add strong warnings in documentation about the security implications of exposing the web interface publicly.

Snippet: default="127.0.0.1"

help="Base uri of ollama",
envvar="SAIST_OLLAMA_BASE_URI",
action=EnvDefault,
default="http://localhost:11434",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default Ollama base URI is set to 'http://localhost:11434' which binds to localhost only. While this is secure for local usage, if the application is deployed in a container or network environment, this could prevent legitimate remote access or cause connectivity issues. However, the more significant issue is that this could be changed to a public IP without authentication.

Priority: LOW

CWE: CWE-1188

Recommendation: Consider adding authentication mechanisms or at minimum document the security implications of exposing Ollama publicly. Add warnings about exposing the service without proper security controls.

Snippet: default="http://localhost:11434"

type=int,
help="Maximum allowed line length, files with lines longer than this value will be skipped",
required=False,
default=1000,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The maximum line length defaults to 1000 characters, which could allow processing of extremely long lines that might cause memory exhaustion or processing delays. While 1000 is reasonable, the parameter is user-configurable and could be set to extremely high values leading to resource exhaustion attacks.

Priority: LOW

CWE: CWE-400

Recommendation: Implement reasonable upper bounds for configurable parameters that affect resource consumption. Add validation to prevent excessively large values that could lead to denial of service.

Snippet: default=1000

@JohnRPunk
Copy link
Author

JohnRPunk commented Feb 11, 2026

  • Remove json struct in prompt
  • support an optional different LLM for the second opinion
  • the csv should contain all findings, with false positive flagged I think. That would be the best user experience
  • fix typos in the prompt
  • pass better detail to the LLM.
  • allow the LLM to use tools, but respect the disable tools arg
  • Breakpoints and logging levels

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Assessment Summary

Critical Issues

Data Exposure in External API Calls
Multiple issues were identified where sensitive information is sent to external LLM APIs without proper safeguards. In the main.py file, the prompt construction includes the full patch text, which could expose source code. Furthermore, the check_single_finding function reads and sends entire file contents to the LLM without validation. This poses a significant risk of data leakage if the API is compromised or its logs are exposed. It is recommended to sanitize or truncate patch text before transmission and implement file content filtering or exclusion patterns to prevent sensitive files from being processed.

Path Traversal Vulnerability
A security vulnerability exists in the cache file path construction. The path is built by concatenating user-controlled hash values without validation, creating a potential for path traversal attacks. An attacker could manipulate the hash to access or write files outside the intended directory. To mitigate this, validate the hash format, use os.path.normpath to normalize the path, and implement robust path sanitization routines.

saist/main.py Outdated
Checks a single finding against the file
"""
finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())
file_content = await scm.read_file_contents(finding.file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The check_single_finding function reads the entire file content and sends it to the LLM without validation. This could expose sensitive files or configuration data to the LLM API.

Priority: MEDIUM

CWE: CWE-200

Recommendation: Implement file content filtering or exclusion patterns to prevent sensitive files from being sent to external LLM services.

Snippet: file_content = await scm.read_file_contents(finding.file)

prompt = (
f"\n\nFile: {filename}\n{patch_text}\n"
)
prompt = f"\n\nFile: {filename}\n{patch_text}\n"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The prompt construction includes the full patch text which may contain sensitive information. If the LLM API is compromised or logs are leaked, this could expose source code or sensitive data.

Priority: MEDIUM

CWE: CWE-200

Recommendation: Sanitize or truncate patch text before sending to external APIs. Consider hashing or tokenizing sensitive parts of the code.

Snippet: prompt = f"\n\nFile: {filename}\n{patch_text}\n"

)
else:
hash: str = await hash_file(scm, filename)
cache_file = os.path.join(cache_folder, hash + ".json")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The cache file path is constructed by concatenating user-controlled hash values without proper validation. An attacker could potentially manipulate the hash to perform path traversal attacks.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Validate the hash format, use os.path.normpath to prevent directory traversal, and implement proper path sanitization.

Snippet: cache_file = os.path.join(cache_folder, hash + ".json")

scm, llm, filename, patch_text, disable_tools
)
else:
hash: str = await hash_file(scm, filename)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Cache files are named using a hash of the file content without any validation. This could lead to cache poisoning or path traversal if the hash function is predictable or if filenames can be manipulated.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Use a secure hash function with salt, validate cache file paths, and implement proper file permission controls.

Snippet: hash: str = await hash_file(scm, filename)

):
if disable_caching is False:
if not os.path.exists(cache_folder) or not os.path.isdir(cache_folder):
os.makedirs(cache_folder, exist_ok=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The cache directory is created with default permissions which may be too permissive, allowing other users on the system to read or modify cached findings.

Priority: MEDIUM

CWE: CWE-732

Recommendation: Set secure permissions on the cache directory (e.g., 0o700) to prevent unauthorized access to cached security findings.

Snippet: os.makedirs(cache_folder, exist_ok=True)

scm, llm, app_files, max_concurrent, disable_tools, disable_caching, cache_folder
):
if disable_caching is False:
if not os.path.exists(cache_folder) or not os.path.isdir(cache_folder):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The directory creation check has a time-of-check-time-of-use (TOCTOU) race condition. An attacker could potentially create a symlink or manipulate the directory between the check and creation.

Priority: LOW

CWE: CWE-367

Recommendation: Use os.makedirs with exist_ok=True to atomically create directories and avoid race conditions.

Snippet: if not os.path.exists(cache_folder) or not os.path.isdir(cache_folder):

@@ -367,15 +474,22 @@ async def main():
if args.ci and len(all_findings) > 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The CI mode exits with code 1 when findings are detected, but this could be exploited in CI/CD pipelines to cause denial of service or mask other security issues.

Priority: LOW

CWE: CWE-248

Recommendation: Implement proper exit code handling with different codes for different severity levels, and consider implementing a threshold-based approach.

Snippet: if args.ci and len(all_findings) > 0:

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Assessment Summary

Critical Severity Issues

The following issues pose the highest risk and must be addressed immediately.

Insecure Input Handling and Path Traversal
Multiple critical vulnerabilities stem from improper handling of user-controlled input. In the main module, the finding.file value is used directly to read file contents without validation, which could allow an attacker to perform path traversal attacks and read sensitive files outside the intended directory. You must validate and sanitize this path. Furthermore, user-controlled content, such as finding_string and file_content, is embedded directly into LLM prompts without sanitization, creating a significant risk of prompt injection attacks. This could allow malicious content to manipulate the LLM's behavior. Implement strict input validation and sanitization for all content included in prompts.

Information Disclosure in Logging
Sensitive information is being logged insecurely. In the main module, detailed findings containing issue details and file paths are logged at the INFO level, which could expose security vulnerabilities or sensitive paths if logs are accessed by unauthorized parties. Additionally, in the filesystem adapter, warning logs include full exception messages that may contain sensitive system details or file paths. You must sanitize all logged data. For security findings, log only metadata at DEBUG level or redact sensitive details. For exceptions, log only generic error information without exposing internal details.

Server-Side Request Forgery (SSRF) Risk
A default configuration in the argument parsing module introduces an SSRF risk. The --ollama-base-uri parameter defaults to 'http://localhost:11434'. If the application is exposed to untrusted networks, attackers could potentially exploit this to probe or attack local services. Remove this default value or set it to an empty string, requiring explicit configuration. Implement validation to prevent URIs from targeting internal or localhost addresses in production environments.

High Severity Issues

These issues present substantial security risks and should be prioritized for remediation.

Insecure File Operations and Resource Exhaustion
Several file operation parameters lack proper safeguards. Output file parameters (--csv-path, --tex-filename, etc.) use default values without path validation, creating a risk of path traversal attacks where files could be written to unintended locations. Implement strict path validation and sanitization. Separately, the --max-line-length parameter defaults to 1000 characters, which could allow an attacker to craft files with extremely long lines to exhaust system memory. Set a more conservative default and implement streaming file reading to mitigate resource exhaustion.

Exception Handling and Silent Failures
Poor exception handling can mask failures and lead to unstable application states. In the filesystem adapter, the get_file_contents method uses a broad except Exception clause, logs only a warning, and returns None. This causes silent failures where file reading errors are not properly communicated to callers, potentially leading to crashes or unexpected behavior. Implement specific exception handling for expected errors like FileNotFoundError and ensure errors are propagated appropriately. A related issue in the main module involves the check_single_finding function returning a Finding object instead of a CheckedFinding on error, violating the expected return type and causing potential attribute errors. Ensure consistent return types or proper error state handling.

Medium Severity Issues

These issues should be addressed to improve security posture and code reliability.

Inconsistent Encoding and File Processing
The filesystem adapter opens files for diff comparison and general reading without specifying an encoding, relying on the system default. This can cause inconsistent behavior, parsing errors, or data corruption when processing files from different systems. Always specify an encoding, such as UTF-8, when opening text files to ensure consistent and secure processing across platforms.

Inconsistent Asynchronous Error Handling
In the main module, results from asyncio.gather are assigned to a discarded variable, while results are collected via side effects. This pattern makes proper error handling difficult and could mask exceptions from individual asynchronous tasks. Refactor this to collect results directly from the gather call and implement robust exception handling for each task.

Network Binding Configuration
The --web-host parameter defaults to '127.0.0.1'. While this limits exposure, it may not be suitable for containerized environments and does not account for IPv6. Consider using '0.0.0.0' as a default for such environments with clear security warnings, or remove the default to require explicit configuration, accompanied by documentation on the security implications of different binding choices.

logger.debug(
f"get_file_contents: reading file {filename} under {self.compare_path}"
)
filename = os.path.join(self.compare_path, filename)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The get_file_contents method joins user-controlled filename with compare_path without proper validation. Although there's a check using is_relative_to, this occurs after the join operation. An attacker could potentially use path traversal sequences like '../../etc/passwd' to read files outside the intended directory.

Priority: HIGH

CWE: CWE-22

Recommendation: Validate the filename parameter before joining paths. Use pathlib.Path to normalize and check if the resolved path is within the compare_path directory. Consider using pathlib.Path(filename).resolve().is_relative_to(pathlib.Path(self.compare_path).resolve()) before any file operations.

Snippet: filename = os.path.join(self.compare_path, filename)

saist/main.py Outdated
Checks a single finding against the file
"""
finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())
file_content = await scm.read_file_contents(finding.file)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The finding.file value is used directly to read file contents without validation. If an attacker can control the finding.file value (e.g., through malicious input or crafted findings), they could potentially perform path traversal attacks to read sensitive files outside the intended directory.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Validate the finding.file path to ensure it's within the expected directory scope before reading. Use path normalization and check for directory traversal attempts.

Snippet: file_content = await scm.read_file_contents(finding.file)

saist/main.py Outdated
system_prompt = prompts.CHECK_FINDING
tools = [] if disable_tools else [scm.read_file_contents]
logger.info(f"Confirming {finding.issue} for {finding.file}")
prompt = f"\n\nReport:\n{finding_string}\n\nFile: {finding.file}\n{file_content}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: User-controlled content (finding_string and file_content) is directly embedded into the LLM prompt without sanitization. This could allow prompt injection attacks where malicious content in findings or file contents could manipulate the LLM's behavior or exfiltrate data.

Priority: MEDIUM

CWE: CWE-74

Recommendation: Implement input validation and sanitization for content that will be included in LLM prompts. Consider using delimiters or escaping mechanisms to prevent prompt injection.

Snippet: prompt = f"\n\nReport:\n{finding_string}\n\nFile: {finding.file}\n{file_content}"

envvar="SAIST_CSV_PATH",
action=EnvDefault,
required=False,
default="results.csv",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Multiple file output parameters (--csv-path, --tex-filename, --pdf-filename, --cache-folder) use default values without path validation. This could lead to path traversal attacks if user input is not properly sanitized, or could result in files being written to unexpected locations.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Implement path validation and sanitization. Restrict file paths to safe directories. Use absolute path validation and prevent directory traversal attacks. Consider using secure temporary directories for file operations.

Snippet: default="results.csv"

"""
Checks a single finding against the file
"""
finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The finding_string construction may expose sensitive information from the Finding object when logged or displayed. If the Finding object contains any sensitive data (like file paths with sensitive information, user data, or internal details), this could lead to information disclosure.

Priority: MEDIUM

CWE: CWE-200

Recommendation: Sanitize the finding data before constructing the string, especially if it will be logged or displayed. Consider implementing a safe serialization method that filters sensitive fields or redacts sensitive information.

Snippet: finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())

return new_finding
except Exception as e:
logger.error(f"[Error] Finding '{finding}': {e}")
return finding

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: In the check_single_finding function, when an exception occurs, the function returns 'finding' (a Finding object) instead of a CheckedFinding object. This violates the function's return type signature and could cause type errors or unexpected behavior in calling code.

Priority: LOW

CWE: CWE-703

Recommendation: Return a proper CheckedFinding object with appropriate error state, or re-raise the exception. Consider creating an error CheckedFinding with check.is_accurate=False and appropriate feedback.

Snippet: return finding

results.append(await check_single_finding(scm, adapter, finding, disable_tools))

tasks = [controlled_request(finding) for finding in findings]
_ = await asyncio.gather(*tasks)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The results from asyncio.gather are assigned to '_' (discarded variable), but the actual results are collected via side effects in the controlled_request function. This pattern makes error handling difficult and could mask exceptions from individual tasks.

Priority: LOW

CWE: CWE-754

Recommendation: Collect results directly from asyncio.gather and handle exceptions properly. Use return_exceptions=True if you want to continue on individual task failures.

Snippet: _ = await asyncio.gather(*tasks)


return contents
except Exception as e:
logging.warn(f"ERR: {e}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The warning log includes the full exception message which could contain sensitive information such as file paths, system details, or other internal information. This violates the principle of least information in logging.

Priority: LOW

CWE: CWE-532

Recommendation: Log only generic error information without exposing sensitive details. Use structured logging with appropriate log levels. Consider logging only the exception type and a generic message, not the full exception details.

Snippet: logging.warn(f"ERR: {e}")

a_path = f"{self.compare_path}/{filename}"

try:
with open(a_path) as a:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The file is opened without specifying an encoding, which uses the system default encoding. This can lead to encoding-related issues and potential security problems when processing files from different systems. Inconsistent encoding handling can cause parsing errors or data corruption.

Priority: LOW

CWE: CWE-176

Recommendation: Always specify an encoding when opening text files. Use encoding='utf-8' or another appropriate encoding. This ensures consistent behavior across different platforms and prevents encoding-related security issues.

Snippet: with open(a_path) as a:

@@ -69,7 +91,11 @@ def _iter_changed_files(self) -> list[File]:

try:
with open(a_path, "r") as a, open(b_path, "r") as b:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Similar to the previous issue, files are opened for diff comparison without specifying an encoding. This can cause inconsistent behavior when comparing files with different encodings or when running on systems with different default encodings.

Priority: LOW

CWE: CWE-176

Recommendation: Specify encoding when opening files for diff comparison, e.g., open(a_path, "r", encoding='utf-8'). This ensures consistent text processing and prevents encoding-related issues.

Snippet: with open(a_path, "r") as a, open(b_path, "r") as b:

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Assessment Summary

Critical Severity Issues

Prompt Injection Vulnerabilities
The application directly concatenates user-controlled data into LLM prompts without sanitization. This occurs in multiple locations within saist/main.py where finding_string, finding.file, and finding_context are used. An attacker could craft malicious input to perform prompt injection attacks, potentially manipulating the LLM's behavior, exfiltrating data, or causing other security breaches. It is imperative to implement input validation and sanitization for all user-controlled data before inclusion in LLM prompts. Consider using parameterized prompts to separate data from instructions.

High Severity Issues

Concurrent Data Corruption
In saist/main.py, multiple concurrent coroutines append to a shared results list without synchronization. This race condition can lead to data corruption, loss of findings, or inconsistent application state. To resolve this, implement proper synchronization using asyncio.Lock or utilize thread-safe collections designed for concurrent operations to protect all shared mutable state.

Insecure Default Configurations
Several insecure defaults are present in the argument parsing module. The Ollama base URI defaults to an unencrypted HTTP connection to localhost, which could expose sensitive data if the endpoint is changed. The default cache folder uses a predictable name in the current working directory, risking information disclosure if sensitive data is cached. The default CSV output file can overwrite existing files or expose findings. Recommendations include using HTTPS by default for the Ollama URI, defaulting to a secure user-specific cache directory with proper permissions, and requiring an explicit output path or using a timestamped default filename to prevent accidental data loss.

Medium Severity Issues

Improper Error and Type Handling
There are multiple issues related to error handling and type assumptions. In saist/main.py, the function check_single_finding can return a generic Finding object instead of the expected CheckedFinding type when an error occurs, breaking the contract and potentially causing AttributeError exceptions elsewhere in the code. Additionally, the code assumes all check objects have specific nested attributes without proper validation. It is recommended to ensure functions adhere to their return type contracts, potentially by returning a properly marked CheckedFinding object for errors, and to add robust type checking before accessing nested object attributes.

Denial of Service via Resource Exhaustion
In saist/util/argparsing.py, an excessively high default maximum line length of 1000 characters is configured. Processing files with extremely long lines could lead to memory exhaustion and denial of service. This default should be reduced to a more reasonable value, such as 200-500 characters, to mitigate resource exhaustion attacks.

Low Severity Issues

Unchecked Index Access Leading to Crashes
Multiple functions in saist/util/output.py, including _write_checks_csv and write_csv, access the first element of a findings list (findings[0]) without first verifying the list is non-empty. This will cause an IndexError and crash the application if no findings are present. The code also performs isinstance checks on findings[0] without this guard. A check for an empty findings iterable must be added before any index access, with appropriate handling such as returning early or writing an empty file.

Insufficient Default Context Size
The default context size for false-positive checking is set to only 6 lines. For complex security issues, this may be insufficient to provide the LLM with adequate surrounding code for accurate analysis. It is recommended to increase the default context size to a more robust value, such as 10-15 lines, or make it configurable with clear documentation.

Network Binding Confusion
The web server defaults to binding only to localhost (127.0.0.1). While this is often secure, it can create confusion for users who may need external access or conversely, believe a service is private when it is later configured otherwise. The recommendation is to change the default to "0.0.0.0" accompanied by a prominent security warning, or to require explicit host specification for any non-localhost binding to ensure conscious configuration.

logger.info(f"Confirming {finding.issue} for {finding.file}")
prompt = f"\n\nReport:\n{finding_string}\n\nFile: {finding.file}\n{finding_context}"
try:
check = await adapter.prompt_structured(system_prompt, prompt, Check, tools)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The system uses user-controlled input (finding_string) directly in LLM prompts without proper sanitization, which could lead to prompt injection attacks.

Priority: MEDIUM

CWE: CWE-74

Recommendation: Sanitize user input before including it in LLM prompts. Consider using parameterized prompts or input validation.

Snippet: check = await adapter.prompt_structured(system_prompt, prompt, Check, tools)

system_prompt = prompts.CHECK_FINDING
tools = [] if disable_tools else [scm.read_file_contents]
logger.info(f"Confirming {finding.issue} for {finding.file}")
prompt = f"\n\nReport:\n{finding_string}\n\nFile: {finding.file}\n{finding_context}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: User-controlled data (finding_string, finding.file, finding_context) is directly concatenated into LLM prompts without sanitization, enabling prompt injection.

Priority: MEDIUM

CWE: CWE-74

Recommendation: Implement input validation and sanitization for all user-controlled data before including in LLM prompts.

Snippet: prompt = f"\n\nReport:\n{finding_string}\n\nFile: {finding.file}\n{finding_context}"


async def controlled_request(finding):
async with semaphore:
results.append(await check_single_finding(scm, adapter, finding, context_size, disable_tools))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Multiple concurrent coroutines append to the same results list without proper synchronization, potentially causing data corruption or loss.

Priority: MEDIUM

CWE: CWE-362

Recommendation: Use asyncio.Lock or thread-safe collections for shared mutable state in concurrent operations.

Snippet: results.append(await check_single_finding(scm, adapter, finding, context_size, disable_tools))

semaphore = asyncio.Semaphore(max_concurrent)
results = []

async def controlled_request(finding):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The controlled_request function appends to a shared results list without proper synchronization, potentially causing race conditions in concurrent execution.

Priority: MEDIUM

CWE: CWE-362

Recommendation: Use thread-safe data structures or proper synchronization mechanisms for shared mutable state in concurrent code.

Snippet: async def controlled_request(finding):

async with semaphore:
results.append(await check_single_finding(scm, adapter, finding, context_size, disable_tools))

tasks = [controlled_request(finding) for finding in findings]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Creating tasks for all findings without rate limiting could lead to excessive resource consumption if there are many findings.

Priority: MEDIUM

CWE: CWE-400

Recommendation: Implement proper rate limiting or batch processing to prevent resource exhaustion.

Snippet: tasks = [controlled_request(finding) for finding in findings]

help="Base uri of ollama",
envvar="SAIST_OLLAMA_BASE_URI",
action=EnvDefault,
default="http://localhost:11434",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Ollama base URI defaults to localhost without encryption (HTTP). This could expose sensitive data in transit if the service is actually running on a remote host or if users modify the endpoint without considering encryption.

Priority: LOW

CWE: CWE-319

Recommendation: Use HTTPS by default or require explicit protocol specification. Add warnings about using unencrypted connections for sensitive LLM operations.

Snippet: default="http://localhost:11434"

envvar="SAIST_CACHE_FOLDER",
action=EnvDefault,
required=False,
default="SAISTCache",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Cache folder defaults to a predictable name (SAISTCache) in the current working directory. This could lead to information disclosure if cache contains sensitive data and is stored in insecure locations.

Priority: LOW

CWE: CWE-922

Recommendation: Use a secure, user-specific directory (e.g., ~/.cache/saist) by default and ensure proper permissions.

Snippet: default="SAISTCache"

help="Host for the web server to bind to",
envvar="SAIST_WEB_HOST",
action=EnvDefault,
default="127.0.0.1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The web server defaults to binding only to localhost (127.0.0.1), which could lead to security issues if users expect it to be accessible from other machines. This might cause users to expose services unintentionally or create false sense of security.

Priority: LOW

CWE: CWE-1188

Recommendation: Change default to "0.0.0.0" with a clear warning about network exposure, or require explicit host specification for non-localhost binding.

Snippet: default="127.0.0.1"

envvar="SAIST_CSV_PATH",
action=EnvDefault,
required=False,
default="results.csv",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: CSV output defaults to 'results.csv' in current directory. This could lead to accidental overwriting of existing files or information disclosure if sensitive findings are written to insecure locations.

Priority: LOW

CWE: CWE-922

Recommendation: Require explicit output path or use a timestamped filename by default to prevent accidental data loss or exposure.

Snippet: default="results.csv"

)
"--check-context-size",
help="Number of lines on either side of the issue to supply to the false-positive check.",
default=6,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Default context size of 6 lines for false-positive checking may be insufficient for proper analysis of complex security issues that require more surrounding context.

Priority: LOW

CWE: CWE-1021

Recommendation: Increase default context size to at least 10-15 lines or make it configurable with better documentation about appropriate values.

Snippet: default=6

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Assessment Summary

Critical Issues: Defensive Programming and Input Validation

Multiple high-severity issues were identified in the file saist/util/output.py, all related to a lack of defensive programming and proper input validation. The primary risk is application crashes due to unhandled exceptions when processing data.

The most critical pattern is the repeated unsafe access to list elements without checking for an empty collection. This occurs in four distinct locations where code accesses findings[0] without first verifying that the findings list contains any elements. This will result in an IndexError if an empty list is passed to these functions, leading to runtime failures. The specific unsafe operations include checking isinstance(findings[0], Finding), calling findings[0].flat(), and using the element in other logic. The consistent recommendation is to implement a guard clause at the beginning of relevant functions to check if the findings collection is empty (e.g., if not findings: return) and handle this case appropriately to prevent crashes.

A separate but related high-severity issue involves unsafe access to a nested dictionary structure. The code accesses model_json_schema()["properties"] without validating that the returned schema dictionary contains the expected "properties" key. If the schema's structure changes or is malformed, this will raise a KeyError and crash the application. The recommendation is to add validation to ensure the schema has the expected structure before accessing nested keys, for instance by using the .get() method or explicit key existence checks.

All these issues share a common root cause: assuming ideal data structures without accounting for edge cases or malformed inputs. Addressing these is a priority to improve the application's robustness and reliability.

fieldnames = list(Finding.model_json_schema()["properties"].keys())


def write_csv(findings: Iterable[Finding] | Iterable[CheckedFinding], csv_path: str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The function write_csv accesses findings[0] without checking if the findings iterable is empty. This could lead to an IndexError when an empty list is passed.

Priority: LOW

CWE: CWE-754

Recommendation: Add a check for empty findings before accessing findings[0]. Return early or handle empty collections appropriately.

Snippet: def write_csv(findings: Iterable[Finding] | Iterable[CheckedFinding], csv_path: str):



def write_csv(findings: Iterable[Finding] | Iterable[CheckedFinding], csv_path: str):
if isinstance(findings[0], Finding):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code performs isinstance(findings[0], Finding) without verifying that findings is not empty. This could cause runtime errors when processing empty result sets.

Priority: LOW

CWE: CWE-754

Recommendation: Check if findings is empty before accessing the first element. Consider using a guard clause: if not findings: return

Snippet: if isinstance(findings[0], Finding):

def write_csv(findings: Iterable[Finding] | Iterable[CheckedFinding], csv_path: str):
if isinstance(findings[0], Finding):
return _write_findings_csv(findings, csv_path)
elif isinstance(findings[0], CheckedFinding):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Similar to the previous issue, this line accesses findings[0] without ensuring the collection has elements. This is a defensive programming issue that could lead to crashes.

Priority: LOW

CWE: CWE-754

Recommendation: Add proper empty collection handling before accessing elements by index.

Snippet: elif isinstance(findings[0], CheckedFinding):



def _write_checks_csv(findings: Iterable[CheckedFinding], csv_path):
fieldnames = list(findings[0].flat().keys())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: This line calls findings[0].flat() without verifying that findings has elements. Additionally, it assumes the flat() method exists and returns a dictionary with keys() method.

Priority: LOW

CWE: CWE-754

Recommendation: Add empty collection check and consider defensive programming with try-except blocks for method calls.

Snippet: fieldnames = list(findings[0].flat().keys())



def _write_findings_csv(findings: Iterable[Finding], csv_path):
fieldnames = list(type(findings[0]).model_json_schema()["properties"].keys())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code accesses model_json_schema()["properties"] without checking if the schema contains the expected structure. If the model schema changes or is malformed, this could raise KeyError.

Priority: LOW

CWE: CWE-754

Recommendation: Add validation to ensure the schema has the expected structure before accessing nested keys.

Snippet: fieldnames = list(type(findings[0]).model_json_schema()["properties"].keys())

@JohnRPunk JohnRPunk marked this pull request as ready for review February 11, 2026 16:17
logger = logging.getLogger("saist")

async def analyze_single_file(scm: Scm, adapter: BaseLlmAdapter, filename, patch_text, disable_tools: bool) -> Optional[list[Finding]]:
N_FINDINGS_TO_CHECK = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erm?

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Assessment Summary

Critical Severity Issues

Path Traversal Vulnerability
In the filesystem adapter, a critical path traversal vulnerability exists due to improper path validation order. The current implementation performs security checks after joining paths, creating a race condition that could allow attackers to bypass directory restrictions using symlinks. This could lead to unauthorized file access. The fix requires validating paths before joining them, using functions like os.path.realpath to resolve symlinks and os.path.commonpath to ensure the resolved path remains within the intended directory.

High Severity Issues

Information Disclosure in Logging and Error Messages
Multiple instances were found where sensitive data is exposed through logging and error handling. In the main application, string representations of Finding objects that may contain file paths, code snippets, or recommendations are logged, potentially leaking confidential information. Similarly, debug logs expose file paths and vulnerability details, and error messages in the filesystem adapter reveal internal filesystem structure. All logging must be reviewed to implement redaction of sensitive fields, using structured logging with non-sensitive identifiers only. Debug output should use hashed identifiers instead of full paths.

Improper Exception Handling Leading to Silent Failures
The filesystem adapter's get_file_contents method catches all exceptions, logs them, and does not propagate the error, resulting in silent failures where callers receive None instead of expected file contents. This can cause unpredictable downstream behavior. The method should be modified to re-raise exceptions after logging or to return a defined error value, with a clear specification of possible exceptions.

Medium Severity Issues

Insecure Default Configurations
Two insecure default configurations were identified. The argument parser sets a very high default maximum line length of 1000 characters, which could facilitate denial of service attacks through resource exhaustion. This should be reduced to a reasonable value such as 200-500 characters. Additionally, default output filenames like results.csv and report.pdf are predictable and could be targeted for overwrite attacks or information disclosure. Using timestamped or randomized default filenames is recommended.

Unsafe File Operations Without Specified Encoding
Files are opened in the filesystem adapter without specifying an encoding, relying on the system default. This leads to inconsistent behavior across platforms and potential security issues related to encoding-based attacks. All file operations should explicitly specify an encoding, preferably UTF-8, to ensure consistent and secure handling.

Low Severity Issues

Unchecked Array Access Leading to Potential Crashes
A recurring pattern of unguarded array access was found in the output module. Multiple functions, including _write_findings_csv, write_csv, and _write_checks_csv, access the first element of a findings iterable without checking if it is empty. This will cause an IndexError if an empty iterable is passed, resulting in a crash. Each function must add a guard condition to check for an empty list and handle it appropriately, such as returning early.

Information Leakage in User Output
Console output in the main application reveals the count of false positives detected. While not directly sensitive, this could provide insights into the scanning tool's accuracy and characteristics of the codebase, which may be undesirable in publicly accessible logs like CI/CD outputs. Evaluate the necessity of this output and restrict it to appropriate contexts.

)
filename = os.path.join(self.compare_path, filename)
try:
if not pathlib.Path(filename).is_relative_to(self.compare_path):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The path traversal check using is_relative_to() occurs after the filename has already been joined with compare_path. This creates a race condition where an attacker could potentially exploit symlinks or other filesystem tricks to bypass the check. Additionally, the check is performed on the constructed pathlib.Path object, but the actual file operation uses aiofiles.open with the potentially malicious filename string.

Priority: MEDIUM

CWE: CWE-22

Recommendation: Perform path traversal checks before joining paths. Use os.path.normpath() and os.path.commonprefix() or os.path.commonpath() to ensure the resolved path is within the intended directory. Consider using os.path.realpath() to resolve symlinks before checking.

Snippet: if not pathlib.Path(filename).is_relative_to(self.compare_path):


Checks a single finding against the file
"""
finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code constructs a string representation of a Finding object that includes all its fields. If sensitive data is present in finding fields (like file paths, snippets, or recommendations), this could lead to information disclosure in error messages or logs.

Priority: LOW

CWE: CWE-209

Recommendation: Sanitize or redact sensitive information before logging or including in error messages. Consider using a custom string representation that excludes potentially sensitive fields.

Snippet: finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())

new_finding = CheckedFinding(finding=finding, check=check)
return new_finding
except Exception as e:
logger.error(f"[Error] Finding '{finding}': {e}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The error logging includes the entire finding object converted to string, which may contain sensitive information like file paths, code snippets, or other potentially confidential data.

Priority: LOW

CWE: CWE-209

Recommendation: Log only non-sensitive identifiers (e.g., finding ID or file name) and avoid logging the entire object. Implement structured logging with redaction of sensitive fields.

Snippet: logger.error(f"[Error] Finding '{finding}': {e}")


return contents
except Exception as e:
logging.warn(f"ERR: {e}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The error message is logged using logging.warn() which may expose sensitive information about the filesystem structure or internal errors to attackers. The exception message could contain path information or other system details that should not be exposed.

Priority: LOW

CWE: CWE-209

Recommendation: Use a more generic error message that doesn't expose internal details. Log the error at an appropriate level (error or warning) but sanitize the message. Consider catching specific exceptions and handling them appropriately rather than catching all exceptions.

Snippet: logging.warn(f"ERR: {e}")

type=int,
help="Maximum allowed line length, files with lines longer than this value will be skipped",
required=False,
default=1000,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default maximum line length of 1000 characters is very high and could allow processing of files with extremely long lines, which might be used for denial of service attacks or buffer overflow attempts.

Priority: LOW

CWE: CWE-400

Recommendation: Consider lowering the default maximum line length to a more reasonable value (e.g., 200-500 characters) to prevent potential resource exhaustion attacks.

Snippet: default=1000

help="Host for the web server to bind to",
envvar="SAIST_WEB_HOST",
action=EnvDefault,
default="127.0.0.1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The web server defaults to binding to 127.0.0.1 only, which may expose the service to local network attacks if the user expects it to be accessible only locally. While this is a security feature, it could be misconfigured if users change it without understanding the security implications.

Priority: LOW

CWE: CWE-284

Recommendation: Document the security implications of changing the web-host parameter and consider adding a warning when binding to non-localhost addresses. Also consider adding authentication for the web interface.

Snippet: default="127.0.0.1"

help="Base uri of ollama",
envvar="SAIST_OLLAMA_BASE_URI",
action=EnvDefault,
default="http://localhost:11434",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default Ollama URI points to localhost:11434. If the service is accessible from the network, this could expose the Ollama service to unauthorized access.

Priority: LOW

CWE: CWE-284

Recommendation: Document that users should ensure Ollama service is properly secured if accessible from network, or consider adding authentication support for Ollama connections.

Snippet: default="http://localhost:11434"

envvar="SAIST_CACHE_FOLDER",
action=EnvDefault,
required=False,
default="SAISTCache",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default cache folder name 'SAISTCache' is predictable and could be targeted for cache poisoning or information disclosure attacks if the cache contains sensitive data.

Priority: LOW

CWE: CWE-200

Recommendation: Consider using a more unique or randomized default cache folder name, or implementing cache encryption if sensitive data is stored.

Snippet: default="SAISTCache"

help="Port for web server to listen on",
required=False,
type=int,
default=8080,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The web server uses port 8080 by default, which is a common port for web applications. This could lead to port conflicts or unintended exposure if other services are running on the same port.

Priority: LOW

CWE: CWE-269

Recommendation: Consider using a less common default port or implementing port availability checking before starting the server. Document how to change the port if conflicts occur.

Snippet: default=8080

envvar="SAIST_CSV_PATH",
action=EnvDefault,
required=False,
default="results.csv",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Default output filenames (results.csv, report.tex, report.pdf) are predictable and could be targeted for file overwrite attacks or information disclosure.

Priority: LOW

CWE: CWE-200

Recommendation: Consider using timestamped or randomized default filenames, or requiring explicit filename specification for sensitive outputs.

Snippet: default="results.csv"

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review Summary

Critical Issues

Sensitive Information Exposure in Error Logging
Multiple files, including saist/main.py, log detailed error information that may contain sensitive data from LLM API errors, internal processing errors, or entire Finding objects. This exposes security issues, file paths, and code snippets. It is recommended to implement structured error handling that logs only minimal, non-sensitive information and avoid logging entire sensitive objects.

Insecure Default Communication
In saist/util/argparsing.py, the Ollama base URI defaults to unencrypted HTTP on localhost. This could lead to man-in-the-middle attacks and credential or sensitive data exposure if the service is used over a network. It is recommended to use HTTPS by default or require explicit configuration for network access, along with warnings about HTTP usage in production.

High Severity Issues

Insecure Cache Configuration
In saist/util/argparsing.py, the cache folder defaults to a predictable location (SAISTCache) in the current working directory. If the cache contains sensitive data from previous scans, this predictable location could lead to information disclosure. It is recommended to use a secure, user-specific directory (e.g., ~/.cache/saist) with proper permissions and consider adding an option to encrypt cached data.

Denial of Service Risk via Excessive Line Length
In saist/util/argparsing.py, the default maximum line length of 1000 characters is excessively high. This could allow processing of files with extremely long lines, potentially enabling denial of service attacks or exploiting buffer overflows in downstream processing. It is recommended to reduce the default to a more reasonable value (e.g., 200-500 characters) and document the security implications.

Medium Severity Issues

Multiple Runtime Errors from Unchecked Empty Lists
Several functions in saist/util/output.py (_write_findings_csv, _write_checks_csv, and write_csv) access the first element of a findings iterable (e.g., findings[0]) without first checking if the list is empty. This will cause IndexError exceptions when these functions are called with empty iterables. It is recommended to add checks for empty findings before accessing elements, using methods like next(iter(findings), None) or handling the empty case appropriately by writing only headers or skipping the write operation.

Low Severity Issues

Generic Exception Handling
In saist/main.py, the use of the generic Exception class makes error handling and debugging more difficult. It is recommended to use specific exception types or create custom exceptions to improve error handling and debugging capabilities.

help="Base uri of ollama",
envvar="SAIST_OLLAMA_BASE_URI",
action=EnvDefault,
default="http://localhost:11434",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Ollama base URI defaults to localhost without encryption (HTTP). This could lead to man-in-the-middle attacks or credential exposure if the service is exposed over a network. Sensitive data could be transmitted in cleartext.

Priority: MEDIUM

CWE: CWE-319

Recommendation: Use HTTPS by default or require explicit configuration for network access. Add warnings about using HTTP in production environments.

Snippet: default="http://localhost:11434"


Checks a single finding against the file
"""
finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code exposes potentially sensitive information from Finding objects in error messages, which could include file paths, snippets, and other details that might be sensitive.

Priority: LOW

CWE: CWE-209

Recommendation: Sanitize error messages to avoid exposing sensitive information. Log only minimal, non-sensitive details in error contexts.

Snippet: finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())

new_finding = CheckedFinding(finding=finding, check=check)
return new_finding
except Exception as e:
logger.error(f"[Error] Finding '{finding}': {e}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The error logging exposes the entire Finding object which may contain sensitive information about security issues, file paths, and code snippets.

Priority: LOW

CWE: CWE-209

Recommendation: Log only minimal, non-sensitive error information. Avoid logging entire objects that may contain sensitive data.

Snippet: logger.error(f"[Error] Finding '{finding}': {e}")

help="Host for the web server to bind to",
envvar="SAIST_WEB_HOST",
action=EnvDefault,
default="127.0.0.1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The web server defaults to binding only to localhost (127.0.0.1), which could lead to exposure of sensitive findings data if the server is intended to be accessed from other machines. This is a configuration vulnerability that could lead to information disclosure.

Priority: LOW

CWE: CWE-16

Recommendation: Change default to "0.0.0.0" for proper network binding, or document clearly that the server only binds to localhost. Add a warning about exposing sensitive findings.

Snippet: default="127.0.0.1"

envvar="SAIST_CACHE_FOLDER",
action=EnvDefault,
required=False,
default="SAISTCache",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Cache folder defaults to a predictable location (SAISTCache) in the current working directory. This could lead to information disclosure if cache contains sensitive data from previous scans, and the location is predictable to attackers.

Priority: LOW

CWE: CWE-922

Recommendation: Use a secure, user-specific directory (e.g., ~/.cache/saist) with proper permissions. Add option to encrypt cached data.

Snippet: default="SAISTCache"



def write_csv(findings: Iterable[Finding] | Iterable[CheckedFinding], csv_path: str):
if isinstance(findings[0], Finding):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code performs isinstance(findings[0], Finding) without verifying that findings is not empty. This could cause runtime errors when the function is called with an empty iterable.

Priority: LOW

CWE: CWE-754

Recommendation: Check if findings is empty before accessing the first element. Consider using next(iter(findings), None) to safely get the first element.

Snippet: if isinstance(findings[0], Finding):



def _write_findings_csv(findings: Iterable[Finding], csv_path):
fieldnames = list(type(findings[0]).model_json_schema()["properties"].keys())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The _write_findings_csv function accesses findings[0] without checking if the findings iterable is empty. This could lead to IndexError when writing an empty list of findings to CSV.

Priority: LOW

CWE: CWE-754

Recommendation: Add a check for empty findings before accessing the first element. Handle empty case appropriately (e.g., write only header or skip writing).

Snippet: fieldnames = list(type(findings[0]).model_json_schema()["properties"].keys())



def _write_checks_csv(findings: Iterable[CheckedFinding], csv_path):
fieldnames = list(findings[0].flat().keys())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The _write_checks_csv function accesses findings[0] without checking if the findings iterable is empty. This could lead to IndexError when writing an empty list of checked findings to CSV.

Priority: LOW

CWE: CWE-754

Recommendation: Add a check for empty findings before accessing the first element. Handle empty case appropriately (e.g., write only header or skip writing).

Snippet: fieldnames = list(findings[0].flat().keys())

compare_path=args.path, base_path=args.path_for_comparison
)

raise Exception("Could not determine a suitable file source")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Using generic Exception class without specific error type makes error handling and debugging more difficult.

Priority: LOW

CWE: CWE-248

Recommendation: Use specific exception types or create custom exceptions for better error handling and debugging.

Snippet: raise Exception("Could not determine a suitable file source")

llm = FaikeAdapter("", "Fake LLM")
logger.debug("Using LLM: Faike AI")
else:
raise Exception("Could not determine a suitable LLM to use")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Using generic Exception class without specific error type makes error handling and debugging more difficult.

Priority: LOW

CWE: CWE-248

Recommendation: Use specific exception types or create custom exceptions for better error handling and debugging.

Snippet: raise Exception("Could not determine a suitable LLM to use")

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Assessment Summary

Critical Issues

Sensitive Data Exposure in Finding Object Stringification
A critical security issue was identified in the main application file. The code constructs a string representation of a Finding object by iterating through its dictionary. This process can inadvertently expose sensitive data such as API keys, passwords, tokens, or sensitive file paths. The risk is significantly increased because this string is incorporated into LLM prompts, which may be logged or stored by external services, leading to potential information disclosure. It is imperative to implement data sanitization. A dedicated function must be developed to redact or mask sensitive patterns (e.g., using regex for keys, tokens, and credential patterns) before the data is logged or sent to any external service.

High Severity Issues

Multiple IndexError Vulnerabilities Due to Missing Empty Checks
A recurring high-severity issue was found across multiple functions in the output utility module. Several functions, including _write_findings_csv, write_csv, and _write_checks_csv, directly access findings[0] without first verifying that the findings iterable contains elements. This omission will cause an IndexError when these functions receive an empty list, leading to application crashes and unreliable output generation. This issue must be addressed in all identified locations by adding a guard condition to check if the findings list is empty before attempting to access its first element. The recommended fix is to either return early from the function or raise a more descriptive, user-friendly error to handle the empty state appropriately.


Checks a single finding against the file
"""
finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code constructs a string representation of a Finding object by iterating through its dictionary representation. If the Finding object contains sensitive data (like file paths with sensitive information, API keys in snippets, etc.), this could lead to information disclosure when logged or displayed. The issue is exacerbated by the fact that this string is passed to an LLM prompt which could be logged or stored.

Priority: MEDIUM

CWE: CWE-200

Recommendation: Sanitize the finding data before constructing the string. Remove or mask sensitive information such as API keys, tokens, passwords, or full file paths that may contain sensitive directory structures. Consider implementing a sanitization function that redacts sensitive patterns before logging or sending to external services.

Snippet: finding_string = "\n".join(f"{k}: {v}" for k, v in dict(finding).items())

finding_context,_,_ = await context_from_finding(scm, finding, context_size = context_size)
system_prompt = prompts.CHECK_FINDING
tools = [] if disable_tools else [scm.read_file_contents]
logger.debug(f"Confirming {finding.issue} for {finding.file}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Debug logging includes the finding issue and file path which could contain sensitive information. If debug logs are stored or transmitted insecurely, this could lead to information disclosure about security vulnerabilities in the codebase.

Priority: LOW

CWE: CWE-532

Recommendation: Avoid logging sensitive information at debug level. If logging is necessary, sanitize the data by removing or obfuscating sensitive details. Consider implementing log redaction for security findings.

Snippet: logger.debug(f"Confirming {finding.issue} for {finding.file}")

help="Base uri of ollama",
envvar="SAIST_OLLAMA_BASE_URI",
action=EnvDefault,
default="http://localhost:11434",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default Ollama base URI is set to 'http://localhost:11434' which uses HTTP instead of HTTPS. This could expose LLM API communications to local network eavesdropping if the service is accessible on the network.

Priority: LOW

CWE: CWE-319

Recommendation: Use HTTPS by default or require explicit configuration for HTTP. Consider adding a warning about using HTTP in production environments.

Snippet: default="http://localhost:11434"

fieldnames = list(Finding.model_json_schema()["properties"].keys())


def write_csv(findings: Iterable[Finding] | Iterable[CheckedFinding], csv_path: str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The function write_csv accesses findings[0] without checking if the findings iterable is empty. This could lead to an IndexError if an empty list is passed.

Priority: LOW

CWE: CWE-754

Recommendation: Add a check for empty findings before accessing findings[0]. Return early or raise a more descriptive error.

Snippet: def write_csv(findings: Iterable[Finding] | Iterable[CheckedFinding], csv_path: str):



def write_csv(findings: Iterable[Finding] | Iterable[CheckedFinding], csv_path: str):
if isinstance(findings[0], Finding):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The code accesses findings[0] without checking if the findings iterable is empty. This could cause an IndexError when processing empty results.

Priority: LOW

CWE: CWE-754

Recommendation: Check if findings is empty before accessing the first element. Handle empty cases appropriately.

Snippet: if isinstance(findings[0], Finding):

all_findings = [c.finding for c in checks if c.check.is_accurate]
n_false_positives = sum(1 for c in checks if not c.check.is_accurate)
logger.info(f"{len(all_findings)} after LLM false positive checks")
print(f"llm confirmation found {n_false_positives}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: Printing the number of false positives to console could reveal information about the effectiveness of the security scanning tool and the quality of findings. This information might be useful to attackers understanding the tool's behavior.

Priority: LOW

CWE: CWE-200

Recommendation: Consider making this output configurable or only displaying it in verbose modes rather than default output.

Snippet: print(f"llm confirmation found {n_false_positives}")

help="Host for the web server to bind to",
envvar="SAIST_WEB_HOST",
action=EnvDefault,
default="127.0.0.1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The web server defaults to binding to 127.0.0.1 only, which could expose the application to local network attacks if the user expects it to be accessible only locally but the network configuration allows broader access. This could lead to information disclosure of security findings.

Priority: LOW

CWE: CWE-284

Recommendation: Consider using 'localhost' instead of '127.0.0.1' or provide clearer documentation about the security implications. Alternatively, default to binding to all interfaces but require explicit authentication.

Snippet: default="127.0.0.1"

envvar="SAIST_CACHE_FOLDER",
action=EnvDefault,
required=False,
default="SAISTCache",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default cache folder name 'SAISTCache' is predictable and could be targeted for cache poisoning or information disclosure attacks. Cache contents may contain sensitive security findings or analysis results.

Priority: LOW

CWE: CWE-922

Recommendation: Use a randomized or user-specific cache directory name. Implement cache encryption for sensitive data and validate cache integrity.

Snippet: default="SAISTCache"

type=int,
help="Maximum allowed line length, files with lines longer than this value will be skipped",
required=False,
default=1000,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default maximum line length of 1000 characters is excessively high and could allow processing of maliciously crafted files designed to cause buffer overflows or other parsing issues in downstream components.

Priority: LOW

CWE: CWE-20

Recommendation: Set a more reasonable default (e.g., 200-500 characters) that balances usability with security. Document the security implications of very long lines.

Snippet: default=1000

)
"--check-context-size",
help="Number of lines on either side of the issue to supply to the false-positive check.",
default=6,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issue: The default context size of 6 lines for false-positive checking may be insufficient for proper security analysis, potentially leading to incorrect false-positive determinations due to lack of surrounding context.

Priority: LOW

CWE: CWE-1025

Recommendation: Increase the default context size or make it configurable based on the type of analysis being performed. Consider a minimum of 10-15 lines for proper context.

Snippet: default=6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants