Skip to content

🔒 Add input validation for git diff before OpenAI API calls #65

@ryota-murakami

Description

@ryota-murakami

Problem

Currently, the git diff output is sent directly to the OpenAI API without validation (index.js:103-108). This creates security and cost risks:

  • Security Risk: Could inadvertently send sensitive data (API keys, passwords, tokens) to OpenAI
  • Cost Risk: Large diffs could exceed token limits, causing API errors or unexpected costs
  • User Experience: No warnings when potentially sensitive content is detected

Current Code

const { stdout } = await exec(
  "git diff --cached -- . ':(exclude)*lock.json' ':(exclude)*lock.yaml'"
)
const summary = stdout.trim()
if (summary.length === 0) {
  return null
}
return summary // ⚠️ No validation before sending to OpenAI

Proposed Solution

Add validation with size limits and sensitive data detection:

const MAX_DIFF_SIZE = 10000 // characters
const SENSITIVE_PATTERNS = [
  /api[_-]?key/i,
  /password/i,
  /secret/i,
  /token/i,
  /-----BEGIN [A-Z]+ PRIVATE KEY-----/
]

function validateDiffSafety(diff) {
  // Check size
  if (diff.length > MAX_DIFF_SIZE) {
    throw new Error(
      `Git diff too large (${diff.length} chars). Limit: ${MAX_DIFF_SIZE}\n` +
      'Consider committing in smaller chunks.'
    )
  }

  // Check for sensitive data
  const detectedPatterns = []
  for (const pattern of SENSITIVE_PATTERNS) {
    if (pattern.test(diff)) {
      detectedPatterns.push(pattern.toString())
    }
  }

  if (detectedPatterns.length > 0) {
    console.warn('⚠️  Warning: Potential sensitive data detected in diff:')
    detectedPatterns.forEach(p => console.warn(`   - Pattern: ${p}`))
    console.warn('\nThis content will be sent to OpenAI API.')
    // Could add user confirmation prompt here
  }
}

// In getGitSummary():
const summary = stdout.trim()
if (summary.length === 0) {
  return null
}
validateDiffSafety(summary)
return summary

Benefits

  • ✅ Prevents accidental exposure of sensitive data
  • ✅ Controls OpenAI API costs
  • ✅ Improves user awareness
  • ✅ Configurable limits and patterns

Acceptance Criteria

  • Add size validation for git diff output
  • Add sensitive data pattern detection
  • Display warnings to user when patterns detected
  • Add configuration options for size limits
  • Add tests for validation logic
  • Update documentation

Priority

High - Security and cost implications

Related

Quality analysis report: claudedocs/quality-analysis-report.md section 3.3

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions