Skip to content

feat: implement SHA-256-based artifact storage system#63

Merged
natashaannn merged 7 commits into
mainfrom
refactor/s1-artifacts
May 9, 2026
Merged

feat: implement SHA-256-based artifact storage system#63
natashaannn merged 7 commits into
mainfrom
refactor/s1-artifacts

Conversation

@natashaannn
Copy link
Copy Markdown
Member

@natashaannn natashaannn commented May 9, 2026

Summary

  • Added SHA-256-based artifact storage system with content-addressed storage in .ragtech/artifacts/
  • Implemented storeArtifact() function that accepts both string and Buffer inputs with deterministic hashing
  • Added resolveArtifactPath() helper for artifact path resolution
  • Created comprehensive unit tests covering all functionality including duplicate prevention and hash consistency

How to review

  • scripts/config/artifacts.ts - Core implementation with SHA-256 hashing, directory creation, and duplicate prevention logic
  • scripts/tests/artifacts.test.ts - Complete test suite covering string/Buffer inputs, hash consistency, and edge cases
  • Verify the API matches the handoff contract: storeArtifact() and resolveArtifactPath() functions

Test plan

  • npm test -- scripts/__tests__/artifacts.test.ts passes (8/8 tests)
  • Manual verification: npx tsx -e "import './scripts/config/artifacts.ts'" loads without errors
  • Test artifact creation: npx tsx -e "import { storeArtifact } from './scripts/config/artifacts.ts'; console.log(storeArtifact('test'))" returns hash

Issues

Closes #11

- Add storeArtifact() function for content-addressed storage
- Implement resolveArtifactPath() helper for artifact path resolution
- Create .ragtech/artifacts/ directory structure with automatic creation
- Support both string and Buffer input types with deterministic hashing
- Prevent duplicate writes by checking existing artifacts
- Add comprehensive unit tests covering all functionality
- Ensure content integrity with SHA-256 hash verification

This provides the foundational artifact storage needed for future DAG and reproducibility work, with stable API exposure for pipeline integration.
…iew-pr pattern detection

Pattern: Runtime directory not added to .gitignore
Observed in: refactor/s1-artifacts
- Move integration tests from scripts/__tests__/ to tests/integration/ per project convention
- Add .ragtech/ to .gitignore to prevent committing runtime artifacts
- Use full 64-character SHA-256 hash instead of 12-char truncated version
- Add optional ext parameter for file extensions (.mp4, .txt, etc.)
- Add optional baseDir parameter to avoid process.cwd() coupling
- Update tests to use temp directory isolation with fs.mkdtempSync()
- Enhance test coverage to 12 tests including extension handling scenarios

This resolves all blockers and warnings from the code review while maintaining
backward compatibility with existing API calls.
- Add extension normalization to handle both '.txt' and 'txt' formats
- Add leading-dot guard to prevent silent API footguns in extension parameter
- Document deduplication behavior: keyed on (content, ext) not content alone
- Update JSDoc to clarify extension parameter requirements
- Add comprehensive test coverage for extension normalization scenarios
- Enhance integration tests to verify path resolution with normalized extensions

This improves API robustness while maintaining backward compatibility and
providing clear documentation of deduplication behavior patterns.
@natashaannn
Copy link
Copy Markdown
Member Author

  • Used SWE 1.5 (free) to implement issue with reference to implement-issue skill
  • Used SWE 1.5 to perform pre push audit with reference to pre-push audit skill
  • Used Claude to perform review with review-PR skill, produced document with review findings including requested changes. Used 3% of 5H session window.
  • Used SWE 1.5 to implement requested changes with reference to review findings document produced by Claude
  • Used Claude to perform review on review fixes, used 3% of session window. Review returned clean, with 3 optional improvements
  • Used SWE 1.5 to implement optional improvements.
  • Claude performed final review with no additional findings. Used 1% of session window.

Note: 4 failing suites (generate-carousel, CaptionExtractor, transcript-caption, CarouselGenerator) are failing due to predated test suites relating to Carousel. Creating issue to recommend if removing or updating test files are appropriate.

@natashaannn natashaannn merged commit 838a246 into main May 9, 2026
1 check passed
@natashaannn natashaannn deleted the refactor/s1-artifacts branch May 9, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Content-addressed artifact store

1 participant