Skip to content

Add lfs curl resume#4006

Open
rasapala wants to merge 11 commits intomainfrom
CVS-175912_git_lfs_resume
Open

Add lfs curl resume#4006
rasapala wants to merge 11 commits intomainfrom
CVS-175912_git_lfs_resume

Conversation

@rasapala
Copy link
Collaborator

🛠 Summary

JIRA CVS-175912
Git lfs from libgit2 resume when curl fails to download full file.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@rasapala rasapala changed the title [WIP] Add lfs curl resume Add lfs curl resume Mar 3, 2026
@atobiszei atobiszei requested a review from Copilot March 3, 2026 12:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Hugging Face (libgit2-based) download flow to better handle Git LFS downloads that fail mid-transfer by adding a libgit2 LFS filter patch with curl resume logic, plus OVMS-side repository status checks and “resume” handling for already-present repositories.

Changes:

  • Update the vendored libgit2 patch to add an LFS smudge/filter implementation and attempt curl resume on partial downloads.
  • Add new StatusCode values/messages for git status failures and unclean repositories.
  • Extend HfDownloader with repository status checking and logic to scan an existing repo for LFS pointer-like files and force-checkout them to re-trigger smudging.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
third_party/libgit2/lfs.patch Adds/updates libgit2 LFS filter and curl partial-download resume handling.
src/status.hpp Introduces new status codes for git status failure/unclean states.
src/status.cpp Adds messages for the new status codes.
src/pull_module/libgit2.hpp Exposes new HfDownloader methods related to repo status/resume.
src/pull_module/libgit2.cpp Implements repo status checking and existing-repo “resume” logic (scan + forced checkout + pull).

}

// Find all files under 'directory' that satisfy the first-3-lines LFS keyword check.
std::vector<fs::path> findLfsLikeFiles(const std::string& directory, bool recursive = true) {
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findLfsLikeFiles is only used within this translation unit but is declared non-static, which unnecessarily exports a symbol and increases the chance of ODR/name collisions. Make it static (or move it to an anonymous namespace).

Suggested change
std::vector<fs::path> findLfsLikeFiles(const std::string& directory, bool recursive = true) {
static std::vector<fs::path> findLfsLikeFiles(const std::string& directory, bool recursive = true) {

Copilot uses AI. Check for mistakes.
Comment on lines +805 to +812
if (matches.empty()) {
std::cout << "No files with LFS-like keywords in the first 3 lines were found.\n";
} else {
std::cout << "Found " << matches.size() << " matching file(s):\n";
for (const auto& p : matches) {
std::cout << " " << p.string() << "\n";
}
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch uses std::cout for runtime logging, which bypasses OVMS logging configuration and can pollute stdout in production. Prefer SPDLOG_* (with appropriate log level) for these messages.

Copilot uses AI. Check for mistakes.
bool CheckIfProxySet();
Status RemoveReadonlyFileAttributeFromDir(const std::string& directoryPath);
Status CheckRepositoryStatus();
int CheckRepositoryForResume();
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckRepositoryForResume() is declared on HfDownloader but is not used anywhere in the codebase (only declared/defined). Please remove it or integrate it into the resume flow to avoid dead API surface.

Suggested change
int CheckRepositoryForResume();

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +187
Status HfDownloader::CheckRepositoryStatus() {
git_repository *repo = NULL;
int error = git_repository_open_ext(&repo, this->downloadPath.c_str(), 0, NULL);
if (error < 0) {
const git_error *err = git_error_last();
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckRepositoryStatus() and the new HF_GIT_STATUS_* codes add new behavior paths (OK vs UNCLEAN/FAILED) but there are no unit tests covering these outcomes. Add gtests that validate the returned status for a clean repo and for a repo with an untracked/modified file.

Copilot generated this review using guidance from repository custom instructions.
size_t staged = 0, unstaged = 0, untracked = 0, conflicted = 0;
const size_t n = git_status_list_entrycount(status_list); // iterate entries
for (size_t i = 0; i < n; ++i) {
const git_status_entry* e = git_status_byindex(status_list, i);
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git_status_byindex(status_list, i) may return NULL; the current code dereferences e->status unconditionally, which can crash. Add a null check (like in print_changed_and_untracked) before using the entry.

Suggested change
const git_status_entry* e = git_status_byindex(status_list, i);
const git_status_entry* e = git_status_byindex(status_list, i);
if (e == nullptr) {
continue;
}

Copilot uses AI. Check for mistakes.
Comment on lines +876 to +881
+ curl_off_t resume_from = 0;
+ curl_easy_getinfo(
+ dl_curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T,
+ &resume_from);
+
+ if (resume_from == -1) {
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resume decision uses CURLINFO_CONTENT_LENGTH_DOWNLOAD_T == -1 as a proxy for “server does not support range/resume”, but that field only indicates whether the total size is known, not whether range requests are supported. This can incorrectly skip resume even when ranges are supported (or vice versa); use a range-capability signal (e.g., CURLINFO_ACCEPT_RANGES / HTTP response code/headers) and check the curl_easy_getinfo return value.

Suggested change
+ curl_off_t resume_from = 0;
+ curl_easy_getinfo(
+ dl_curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T,
+ &resume_from);
+
+ if (resume_from == -1) {
+ curl_off_t content_length = -1;
+ char *accept_ranges = NULL;
+ CURLcode info_status;
+
+ info_status = curl_easy_getinfo(
+ dl_curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T,
+ &content_length);
+ if (info_status != CURLE_OK) {
+ fprintf(stderr,
+ "\n[ERROR] curl_easy_getinfo() failed to read content length: %s\n",
+ curl_easy_strerror(info_status));
+ }
+
+ info_status = curl_easy_getinfo(
+ dl_curl, CURLINFO_ACCEPT_RANGES,
+ &accept_ranges);
+ if (info_status != CURLE_OK ||
+ accept_ranges == NULL ||
+ strcmp(accept_ranges, "none") == 0) {
+ if (info_status != CURLE_OK) {
+ fprintf(stderr,
+ "\n[ERROR] curl_easy_getinfo() failed to read range support: %s\n",
+ curl_easy_strerror(info_status));
+ }

Copilot uses AI. Check for mistakes.
Comment on lines +353 to 354


Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is trailing whitespace / an extra blank line entry inside statusMessageMap here, which is likely accidental and can trigger style/formatting checks. Please remove the stray whitespace-only line.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines 482 to +488
+ git_str full_path = GIT_STR_INIT;
+ if (git_repository_workdir_path(&full_path, repo, path) < 0)
+ if (git_repository_workdir_path(&full_path, repo, path) < 0) {
+ fprintf(stderr,
+ "\n[ERROR] failure, cannot get repository path: %s\n",
+ path);
+ return -1;
+ }
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git_repository_workdir_path(&full_path, ...) allocates into full_path.ptr, and later code stores full_path.ptr in the filter payload. Make sure the payload cleanup frees/disposes this allocation (and any other duplicated strings like LFS oid/size) on both success and error paths; otherwise this leaks per filtered file.

Copilot uses AI. Check for mistakes.
Comment on lines +256 to +259
if (is_unborn || is_detached || staged || unstaged || untracked || conflicted) {
return StatusCode::HF_GIT_STATUS_UNCLEAN;
}
return StatusCode::OK;
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckRepositoryStatus() does not free repo on the success path (or on the unclean-status return), leaking the repository handle each call. Free the repository before every return (e.g., via a single cleanup block).

Suggested change
if (is_unborn || is_detached || staged || unstaged || untracked || conflicted) {
return StatusCode::HF_GIT_STATUS_UNCLEAN;
}
return StatusCode::OK;
Status result = StatusCode::OK;
if (is_unborn || is_detached || staged || unstaged || untracked || conflicted) {
result = StatusCode::HF_GIT_STATUS_UNCLEAN;
}
git_repository_free(repo);
return result;

Copilot uses AI. Check for mistakes.
return;
}

const char *path_in_repo = fileToResume.string().c_str();
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path_in_repo is set from fileToResume.string().c_str(), but fileToResume.string() creates a temporary std::string that is destroyed immediately, leaving path_in_repo dangling before it is used by libgit2. Store the string in a local std::string variable that outlives the checkout call.

Suggested change
const char *path_in_repo = fileToResume.string().c_str();
std::string path_in_repo_str = fileToResume.string();
const char *path_in_repo = path_in_repo_str.c_str();

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants