Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the Hugging Face (libgit2-based) download flow to better handle Git LFS downloads that fail mid-transfer by adding a libgit2 LFS filter patch with curl resume logic, plus OVMS-side repository status checks and “resume” handling for already-present repositories.
Changes:
- Update the vendored
libgit2patch to add an LFS smudge/filter implementation and attempt curl resume on partial downloads. - Add new
StatusCodevalues/messages for git status failures and unclean repositories. - Extend
HfDownloaderwith repository status checking and logic to scan an existing repo for LFS pointer-like files and force-checkout them to re-trigger smudging.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
third_party/libgit2/lfs.patch |
Adds/updates libgit2 LFS filter and curl partial-download resume handling. |
src/status.hpp |
Introduces new status codes for git status failure/unclean states. |
src/status.cpp |
Adds messages for the new status codes. |
src/pull_module/libgit2.hpp |
Exposes new HfDownloader methods related to repo status/resume. |
src/pull_module/libgit2.cpp |
Implements repo status checking and existing-repo “resume” logic (scan + forced checkout + pull). |
| } | ||
|
|
||
| // Find all files under 'directory' that satisfy the first-3-lines LFS keyword check. | ||
| std::vector<fs::path> findLfsLikeFiles(const std::string& directory, bool recursive = true) { |
There was a problem hiding this comment.
findLfsLikeFiles is only used within this translation unit but is declared non-static, which unnecessarily exports a symbol and increases the chance of ODR/name collisions. Make it static (or move it to an anonymous namespace).
| std::vector<fs::path> findLfsLikeFiles(const std::string& directory, bool recursive = true) { | |
| static std::vector<fs::path> findLfsLikeFiles(const std::string& directory, bool recursive = true) { |
| if (matches.empty()) { | ||
| std::cout << "No files with LFS-like keywords in the first 3 lines were found.\n"; | ||
| } else { | ||
| std::cout << "Found " << matches.size() << " matching file(s):\n"; | ||
| for (const auto& p : matches) { | ||
| std::cout << " " << p.string() << "\n"; | ||
| } | ||
| } |
There was a problem hiding this comment.
This branch uses std::cout for runtime logging, which bypasses OVMS logging configuration and can pollute stdout in production. Prefer SPDLOG_* (with appropriate log level) for these messages.
| bool CheckIfProxySet(); | ||
| Status RemoveReadonlyFileAttributeFromDir(const std::string& directoryPath); | ||
| Status CheckRepositoryStatus(); | ||
| int CheckRepositoryForResume(); |
There was a problem hiding this comment.
CheckRepositoryForResume() is declared on HfDownloader but is not used anywhere in the codebase (only declared/defined). Please remove it or integrate it into the resume flow to avoid dead API surface.
| int CheckRepositoryForResume(); |
| Status HfDownloader::CheckRepositoryStatus() { | ||
| git_repository *repo = NULL; | ||
| int error = git_repository_open_ext(&repo, this->downloadPath.c_str(), 0, NULL); | ||
| if (error < 0) { | ||
| const git_error *err = git_error_last(); |
There was a problem hiding this comment.
CheckRepositoryStatus() and the new HF_GIT_STATUS_* codes add new behavior paths (OK vs UNCLEAN/FAILED) but there are no unit tests covering these outcomes. Add gtests that validate the returned status for a clean repo and for a repo with an untracked/modified file.
| size_t staged = 0, unstaged = 0, untracked = 0, conflicted = 0; | ||
| const size_t n = git_status_list_entrycount(status_list); // iterate entries | ||
| for (size_t i = 0; i < n; ++i) { | ||
| const git_status_entry* e = git_status_byindex(status_list, i); |
There was a problem hiding this comment.
git_status_byindex(status_list, i) may return NULL; the current code dereferences e->status unconditionally, which can crash. Add a null check (like in print_changed_and_untracked) before using the entry.
| const git_status_entry* e = git_status_byindex(status_list, i); | |
| const git_status_entry* e = git_status_byindex(status_list, i); | |
| if (e == nullptr) { | |
| continue; | |
| } |
third_party/libgit2/lfs.patch
Outdated
| + curl_off_t resume_from = 0; | ||
| + curl_easy_getinfo( | ||
| + dl_curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T, | ||
| + &resume_from); | ||
| + | ||
| + if (resume_from == -1) { |
There was a problem hiding this comment.
The resume decision uses CURLINFO_CONTENT_LENGTH_DOWNLOAD_T == -1 as a proxy for “server does not support range/resume”, but that field only indicates whether the total size is known, not whether range requests are supported. This can incorrectly skip resume even when ranges are supported (or vice versa); use a range-capability signal (e.g., CURLINFO_ACCEPT_RANGES / HTTP response code/headers) and check the curl_easy_getinfo return value.
| + curl_off_t resume_from = 0; | |
| + curl_easy_getinfo( | |
| + dl_curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T, | |
| + &resume_from); | |
| + | |
| + if (resume_from == -1) { | |
| + curl_off_t content_length = -1; | |
| + char *accept_ranges = NULL; | |
| + CURLcode info_status; | |
| + | |
| + info_status = curl_easy_getinfo( | |
| + dl_curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T, | |
| + &content_length); | |
| + if (info_status != CURLE_OK) { | |
| + fprintf(stderr, | |
| + "\n[ERROR] curl_easy_getinfo() failed to read content length: %s\n", | |
| + curl_easy_strerror(info_status)); | |
| + } | |
| + | |
| + info_status = curl_easy_getinfo( | |
| + dl_curl, CURLINFO_ACCEPT_RANGES, | |
| + &accept_ranges); | |
| + if (info_status != CURLE_OK || | |
| + accept_ranges == NULL || | |
| + strcmp(accept_ranges, "none") == 0) { | |
| + if (info_status != CURLE_OK) { | |
| + fprintf(stderr, | |
| + "\n[ERROR] curl_easy_getinfo() failed to read range support: %s\n", | |
| + curl_easy_strerror(info_status)); | |
| + } |
|
|
||
|
|
There was a problem hiding this comment.
There is trailing whitespace / an extra blank line entry inside statusMessageMap here, which is likely accidental and can trigger style/formatting checks. Please remove the stray whitespace-only line.
| + git_str full_path = GIT_STR_INIT; | ||
| + if (git_repository_workdir_path(&full_path, repo, path) < 0) | ||
| + if (git_repository_workdir_path(&full_path, repo, path) < 0) { | ||
| + fprintf(stderr, | ||
| + "\n[ERROR] failure, cannot get repository path: %s\n", | ||
| + path); | ||
| + return -1; | ||
| + } |
There was a problem hiding this comment.
git_repository_workdir_path(&full_path, ...) allocates into full_path.ptr, and later code stores full_path.ptr in the filter payload. Make sure the payload cleanup frees/disposes this allocation (and any other duplicated strings like LFS oid/size) on both success and error paths; otherwise this leaks per filtered file.
| if (is_unborn || is_detached || staged || unstaged || untracked || conflicted) { | ||
| return StatusCode::HF_GIT_STATUS_UNCLEAN; | ||
| } | ||
| return StatusCode::OK; |
There was a problem hiding this comment.
CheckRepositoryStatus() does not free repo on the success path (or on the unclean-status return), leaking the repository handle each call. Free the repository before every return (e.g., via a single cleanup block).
| if (is_unborn || is_detached || staged || unstaged || untracked || conflicted) { | |
| return StatusCode::HF_GIT_STATUS_UNCLEAN; | |
| } | |
| return StatusCode::OK; | |
| Status result = StatusCode::OK; | |
| if (is_unborn || is_detached || staged || unstaged || untracked || conflicted) { | |
| result = StatusCode::HF_GIT_STATUS_UNCLEAN; | |
| } | |
| git_repository_free(repo); | |
| return result; |
| return; | ||
| } | ||
|
|
||
| const char *path_in_repo = fileToResume.string().c_str(); |
There was a problem hiding this comment.
path_in_repo is set from fileToResume.string().c_str(), but fileToResume.string() creates a temporary std::string that is destroyed immediately, leaving path_in_repo dangling before it is used by libgit2. Store the string in a local std::string variable that outlives the checkout call.
| const char *path_in_repo = fileToResume.string().c_str(); | |
| std::string path_in_repo_str = fileToResume.string(); | |
| const char *path_in_repo = path_in_repo_str.c_str(); |
🛠 Summary
JIRA CVS-175912
Git lfs from libgit2 resume when curl fails to download full file.
🧪 Checklist
``