fix(gitea): tolerate binary file payloads (#2380)#2440
Conversation
Code Review by Qodo
Context used 1. Non-UTF8 text lost
|
PR Summary by QodoGitea provider: tolerate non-UTF-8 binary payloads in diffs and raw files WalkthroughsUser DescriptionSummaryFixes #2380. Gitea PRs that contain binary media changes can currently crash the provider when raw bytes are decoded as UTF-8. This affects both raw file fetches and This PR makes the Gitea provider tolerate those binary payloads:
TestingRan focused local verification against the changed Also ran syntax checks: The full pytest test module could not be run in this minimal environment without installing PR-Agent's full pinned dependency set; the added tests follow the existing AI Description• Decode Gitea PR .diff payloads with replacement to avoid UTF-8 decode crashes. • Skip non-UTF-8 raw file contents and return empty content with a warning. • Add regression tests covering both binary diff and binary raw file paths. Diagramgraph TD
A["PR-Agent (Gitea)" ] --> B["RepoApi" ] --> C{{"Gitea HTTP API"}} --> D[("Bytes payload")]
D --> E["Diff decode" ] --> F["Text diff parse" ]
D --> G["File decode" ] --> H["Empty content" ]
subgraph Legend
direction LR
_svc["Component" ] ~~~ _ext{{"External"}} ~~~ _db[("Data")]
end
High-Level AssessmentThe following are alternative approaches to this PR: 1. Detect binary via Content-Type/headers before decoding
2. Decode diffs as latin-1 (lossless byte mapping)
3. Pre-scan bytes and strip/skip binary hunks
Recommendation: The PR’s approach is the best tradeoff: it preserves provider stability with minimal, targeted changes. Using errors='replace' for .diff keeps the textual headers/context parseable, while skipping non-UTF-8 raw file content prevents crashes on binary assets. Header-based binary detection or binary-hunk filtering could be considered later if replacement characters materially affect downstream behavior, but they add complexity and rely on less reliable signals. File ChangesBug fix (1)
Tests (1)
|
Summary
Fixes #2380.
Gitea PRs that contain binary media changes can currently crash the provider when raw bytes are decoded as UTF-8. This affects both raw file fetches and
.diffretrieval before ignored binary extensions can be filtered out cleanly.This PR makes the Gitea provider tolerate those binary payloads:
Testing
Ran focused local verification against the changed
RepoApimethods with invalid UTF-8 bytes:Also ran syntax checks:
The full pytest test module could not be run in this minimal environment without installing PR-Agent's full pinned dependency set; the added tests follow the existing
RepoApimock style intests/unittest/test_gitea_provider.py.