♻️ refactor(project-creation): add full cascade deleteProject() for complete orphan cleanup#4524
♻️ refactor(project-creation): add full cascade deleteProject() for complete orphan cleanup#4524
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- validate PR title (required gate) and commit messages (hygiene gate) against emoji-prefixed conventional-commit spec - add commit-message-check.js with emoji↔type map, VS16 normalization, format/length/case/period validation, merge-commit skip - add 35 unit tests covering all 12 types, edge cases, and error collection - bypass via `commit-message-exception` label
- add requireEmoji option (default true) to validateCommitMessage so PR title stays strict while commit messages accept emoji-less conventional-commit format (e.g. "docs(qa): ...") - auto-skip GitHub web UI patterns (Update/Create/Delete/Rename) in relaxed mode - update workflow to pass requireEmoji: false for commit-messages job - add 18 new tests for relaxed mode and isGitHubWebEdit (53 total)
…i/pr-quality-gates
# Conflicts: # phpstan.neon
…callers Declare checked exceptions on all methods using Database::transaction(), which re-throws \Throwable on rollback. Remove 2 stale baseline entries now covered by the broader @throws declaration.
- run PHPUnit with Xdebug coverage on pull requests - invoke ostico/test-guard@v1.0.1 with 80% coverage threshold - enable AI-powered test adequacy analysis
- move test-guard from standalone workflow into _ci-cd.yml as a PR-only job that consumes coverage from the tests job - extract coverage.xml from test container via docker cp and upload as "coverage-report" artifact - add pull-requests:write and statuses:write permissions to aws_dev.yml and aws_prod.yml callers - set per-job permissions in _ci-cd.yml (test-guard needs write access, other jobs only need contents:read) - update docker submodule (php-pcov + --coverage-clover)
- grant models:read permission in PR workflows for repository access Signed-off-by: domenico <domenico@translated.net>
- update permissions in _ci-cd.yml, aws_prod.yml, and aws_dev.yml - align with GitHub's recommended permissions model Signed-off-by: domenico <domenico@translated.net>
…translation failures fatal - replace getChunksByJobId() re-query in insertPreTranslations() with direct $job->source/$job->target access, avoiding ProxySQL routing standalone SELECT to an unsynced read replica - remove dead getChunksByJobId() method and ChunkDao import from JobCreationService - re-throw exception in insertPreTranslations() catch block so failures abort project creation instead of being swallowed - add clearFailedProject() call in the post-creation catch block to clean up orphaned project/file records on any failure - consolidate duplicate try/catch in createProject() by moving resolveAndInsertFiles(), determineStatusAndPopulateResult(), and insertFileInstructions() into a single unified method resolveFilesExtractSegmentsAndStoreData() - rewrite tests: verify exception propagation, addError-before- throw ordering, and first-failure-aborts-foreach behavior
…te orphan cleanup No FK CASCADE constraints exist in Matecat — manual cascade deletion is required. Covers 15+ tables: comments, qa_chunk_reviews, segment_translation_events/versions, segment_translations, files_job, job_metadata, segment_metadata, segment_notes, segment_original_data, segments, files_parts, files, file_metadata, context_groups, project_metadata, projects, jobs. Batched segment deletions at 200 rows to avoid replication lag spikes. Includes 7 unit tests covering table order, parameter binding, edge cases (no jobs, no files), and batch splitting.
…roject Replace the incomplete 2-table cleanup (ProjectDao + FileDao) with full cascade via ProjectManagerModel::deleteProject(). Remove unused FileDao import. Cleanup failures are caught and logged, never rethrown.
#4520) * 🐛 fix(project-creation): eliminate replication-lag race and make pre-translation failures fatal - replace getChunksByJobId() re-query in insertPreTranslations() with direct $job->source/$job->target access, avoiding ProxySQL routing standalone SELECT to an unsynced read replica - remove dead getChunksByJobId() method and ChunkDao import from JobCreationService - re-throw exception in insertPreTranslations() catch block so failures abort project creation instead of being swallowed - add clearFailedProject() call in the post-creation catch block to clean up orphaned project/file records on any failure - consolidate duplicate try/catch in createProject() by moving resolveAndInsertFiles(), determineStatusAndPopulateResult(), and insertFileInstructions() into a single unified method resolveFilesExtractSegmentsAndStoreData() - rewrite tests: verify exception propagation, addError-before- throw ordering, and first-failure-aborts-foreach behavior * ♻️ refactor(project-creation): centralize error recording in outer catch and remove redundant email notification Remove addError() and sendErrMailReport() from JobCreationService::insertPreTranslations() catch block. Error recording is now handled exclusively by mapSegmentExtractionError() in the outer handler, eliminating duplicate error entries. Debug logging preserved. Remove insertPreTranslationsRecordsErrorBeforePropagating test (behavior removed; propagation already covered by insertPreTranslationsFailurePropagatesException). * 📝 docs: update instructions with used technologies section - add PHP 8.3, JavaScript, and React 18 to the list of used technologies in instructions Signed-off-by: domenico <domenico@translated.net> * 📝 docs(project-creation): fix stale docblock, test name, and technology casing Update insertPreTranslations() docblock to remove mention of removed email notification. Rename test to insertPreTranslationsFailurePropagatesException to match actual behavior. Fix "Javascript" → "JavaScript" casing in review instructions. --------- Signed-off-by: domenico <domenico@translated.net>
* fix: subfiltering handlers persist * Fixed PHPStan error
Merge develop into full-project-cascade-delete. Resolve test conflict: remove addError() assertions from insertPreTranslationsFailurePropagatesException — production code no longer calls addError() in JobCreationService catch block.
There was a problem hiding this comment.
Pull request overview
Refactors failed project-creation cleanup by introducing a model-level deleteProject() cascade delete and wiring it into ProjectManager::clearFailedProject() to prevent orphaned segment-scoped rows.
Changes:
- Added
ProjectManagerModel::deleteProject()plus internal batched deletion helperdeleteInBatches(). - Updated
ProjectManager::clearFailedProject()to delegate cleanup todeleteProject(). - Added/extended unit tests covering cascade order, batching, and
clearFailedProject()delegation behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/Model/ProjectCreation/ProjectManagerModel.php | Implements cascade project deletion and batched segment-range deletes. |
| lib/Model/ProjectCreation/ProjectManager.php | Switches failed-project cleanup to use the new model cascade delete. |
| tests/unit/Model/ProjectCreation/TestableProjectManager.php | Adds a public wrapper to invoke clearFailedProject() for testing. |
| tests/unit/Model/ProjectCreation/ProjectManagerModelTest.php | Adds unit tests validating delete ordering, batching, and parameters. |
| tests/unit/Model/ProjectCreation/ClearFailedProjectTest.php | Adds tests verifying clearFailedProject() delegates to deleteProject() and swallows cleanup exceptions. |
| while ($currentStart <= $lastSegment) { | ||
| $currentEnd = min($currentStart + $batchSize - 1, $lastSegment); | ||
|
|
||
| $params = array_merge($extraParams, [ |
There was a problem hiding this comment.
Batching logic can enter an infinite loop when $batchSize <= 0 (currentEnd becomes < currentStart so currentStart never advances). Since deleteProject exposes $batchSize publicly, validate it (>= 1) before calling deleteInBatches (or throw an InvalidArgumentException) to prevent hangs.
There was a problem hiding this comment.
Fixed — added if ($batchSize < 1) throw new \\InvalidArgumentException(...) at the top of deleteProject(). Two tests cover batchSize=0 and batchSize=-5.
| public function deleteProject(int $idProject, int $batchSize = 200): void | ||
| { | ||
| $conn = $this->dbHandler->getConnection(); | ||
|
|
There was a problem hiding this comment.
deleteProject() is described in the PR as being wrapped in a single transaction with rollback on failure, but the implementation performs multiple DELETEs without beginTransaction/commit/rollback. Either add an explicit transaction around the whole cascade (with rollback in a catch) or update the PR description/expectations to match the non-transactional behavior.
There was a problem hiding this comment.
Good catch — PR description was wrong. Updated the description to clearly state there is no transaction wrapping. This is intentional: deleteProject() cleans up a failed project creation, so partial progress is preferable to rolling back a long-running cascade. The caller (clearFailedProject) catches exceptions and logs them.
| /** | ||
| * Executes DELETE statements in batches over a segment ID range to avoid | ||
| * large single-statement deletions that spike replication lag. | ||
| * | ||
| * @param PDO $conn | ||
| * @param string $sql DELETE statement with :start and :end placeholders | ||
| * @param int $firstSegment | ||
| * @param int $lastSegment | ||
| * @param int $batchSize | ||
| * @param array<string, mixed> $extraParams Additional bound parameters (e.g. ['id_job' => 10]) | ||
| */ | ||
| public static int $batchSleepMicroseconds = 300000; // 300ms — matches production cleanup script | ||
|
|
||
| private function deleteInBatches( |
There was a problem hiding this comment.
The docblock immediately above $batchSleepMicroseconds is describing deleteInBatches(), but in PHP it will attach to the following symbol (the static property). This makes the property documentation incorrect and leaves deleteInBatches() undocumented. Move the property above the docblock, or split into two docblocks so each applies to the intended element.
There was a problem hiding this comment.
Fixed — split into two docblocks: a one-line /** @var int ... */ for the $batchSleepMicroseconds property, and the full method docblock moved directly above deleteInBatches().
| // --- File-scoped deletions --- | ||
| $stmt = $conn->prepare("SELECT id FROM files WHERE id_project = :id_project"); | ||
| $stmt->execute(['id_project' => $idProject]); | ||
| $fileIds = $stmt->fetchAll(PDO::FETCH_COLUMN); | ||
|
|
||
| foreach ($fileIds as $idFile) { | ||
| $stmt = $conn->prepare("DELETE FROM files_parts WHERE id_file = :id_file"); | ||
| $stmt->execute(['id_file' => $idFile]); | ||
| } | ||
|
|
||
| $stmt = $conn->prepare("DELETE FROM files WHERE id_project = :id_project"); | ||
| $stmt->execute(['id_project' => $idProject]); | ||
|
|
||
| $stmt = $conn->prepare("DELETE FROM file_metadata WHERE id_project = :id_project"); | ||
| $stmt->execute(['id_project' => $idProject]); | ||
|
|
There was a problem hiding this comment.
The cascade delete currently removes files_parts/files and file_metadata, but it never deletes from the file_references table (which exists in the schema and is mentioned in the PR description). If file_references rows are created during project creation, they will be left orphaned. Consider adding a DELETE FROM file_references WHERE id_project = :id_project (before deleting files), or adjust the PR description if the table is truly unused.
There was a problem hiding this comment.
Fixed — added DELETE FROM file_references WHERE id_project = :id_project in Phase 3. The table is currently unused in PHP code (added Dec 2022 but never populated/read), but included for data hygiene in case rows are created in the future. Test testDeleteProjectDeletesFileReferences verifies it.
| $firstSegment = (int) min(array_column($jobs, 'job_first_segment')); | ||
| $lastSegment = (int) max(array_column($jobs, 'job_last_segment')); | ||
|
|
There was a problem hiding this comment.
Segment-scoped cleanup uses a project-wide numeric segment-id range derived from min(job_first_segment)..max(job_last_segment). Segment IDs are allocated from a global sequence (see Model\DataAccess\Database::nextSequence()), so this range can include segments belonging to other projects created in between, which would cause cross-project data deletion in the segment_* tables. Prefer deleting via a join constrained to the current project (e.g., segment_* JOIN segments JOIN files WHERE files.id_project = :id_project) and batch on segments.id within that filtered set, rather than deleting purely by BETWEEN range.
There was a problem hiding this comment.
Fixed — Phase 2 now iterates per-job, using each job's own job_first_segment/job_last_segment range instead of computing a merged min/max across all jobs. This was a real bug: the production cleanup script (__database_cleaning.php) always operated per-job to avoid exactly this cross-project deletion scenario.
testDeleteProjectWithNonContiguousJobsDoesNotSpanGap verifies two jobs with ranges 100–200 and 500–600 produce separate batch sets (never spanning the 201–499 gap).
Re: using JOIN-based deletes — the per-job range approach matches the production cleanup script's proven pattern and avoids adding JOINs to high-volume DELETE statements that run against replication.
| $this->deleteInBatches( | ||
| $conn, | ||
| "DELETE FROM context_groups WHERE id_segment BETWEEN :start AND :end", | ||
| $firstSegment, | ||
| $lastSegment, | ||
| $batchSize | ||
| ); |
There was a problem hiding this comment.
The context_groups delete is filtered only by id_segment range. Since context_groups has an id_project column, this can delete rows for other projects whose segments fall in the same numeric range. Add an id_project = :id_project predicate (and bind it) or delete via a join constrained to the project to avoid cross-project deletion.
There was a problem hiding this comment.
Fixed — moved context_groups from Phase 2 (segment range BETWEEN) to Phase 3, using DELETE FROM context_groups WHERE id_project = :id_project. The table has an indexed id_project column (NOT NULL, with KEY id_project_idx), so this is both safer and more efficient than segment-range scoping. Test testDeleteProjectDeletesContextGroupsByProject verifies the query uses id_project and does NOT use BETWEEN.
…se methods Split 120-line deleteProject() into three private methods to satisfy PHPMD ExcessiveMethodLength threshold (100 lines): - deleteJobScopedData(): per-job child rows - deleteSegmentScopedData(): batched segment-range tables - deleteFileAndProjectScopedData(): files, metadata, root records No behavior change — same SQL, same order, same parameters.
…ext_groups, docblock split - Fix cross-project segment deletion: Phase 2 now iterates per-job instead of merging min/max across all jobs (review comment 6) - Add batchSize guard: throw InvalidArgumentException if < 1 (comment 2) - Add file_references deletion to Phase 3 (comment 5) - Move context_groups from Phase 2 (BETWEEN) to Phase 3 (id_project) (comment 7) - Split misattached docblock on $batchSleepMicroseconds (comment 4) - Update and add 9 tests covering all review fixes
* feat: add ed error_code to AI Worker messages * fixed phpstan * fixed phpstan * FE integration --------- Co-authored-by: pierluigi.dicianni <pierluigi.dicianni@translated.net>
👷 ci: add PR quality gates, commit message enforcement, and QA ICU fix
🧪 Test-Guard Report✅ PASS — All changed source files have adequate test coverage. Coverage Analysis: ❌ FAILNo changed source files found in coverage report (threshold: 80%)
Test File Matching: ❌ FAILFile matching: 1 pass, 2 fail
Per-File Evaluation: ✅ PASSEvaluated 3 files: 3 via AI (3 batches), 0 via shortcuts.
Result: ✅ PASS |
982ba8a to
3807efb
Compare
🧪 Test-Guard Report✅ PASS — All changed source files have adequate test coverage. Coverage Analysis: ❌ FAILNo changed source files found in coverage report (threshold: 80%)
Test File Matching: ❌ FAILFile matching: 5 pass, 2 warning, 8 fail
Per-File Evaluation: ✅ PASSAI analysis failed (Error code: 413 - {'error': {'code': 'tokens_limit_reached', 'message': 'Request body too large for gpt-4.1-mini model. Max size: 8000 tokens.', 'details': 'Request body too large for gpt-4.1-mini model. Max size: 8000 tokens.'}}) — shortcuts resolved; remaining files deferred to L1+L2.
Result: ✅ PASS |
🧪 Test-Guard Report✅ PASS — All changed source files have adequate test coverage. Coverage Analysis: ❌ FAILNo changed source files found in coverage report (threshold: 80%)
Test File Matching: ❌ FAILFile matching: 1 pass, 2 fail
Per-File Evaluation: ✅ PASSEvaluated 3 files: 3 via AI (3 batches), 0 via shortcuts.
Result: ✅ PASS |
| $conn = $this->dbHandler->getConnection(); | ||
|
|
||
| // Fetch all jobs for the project | ||
| $stmt = $conn->prepare("SELECT id, job_first_segment, job_last_segment FROM jobs WHERE id_project = :id_project"); | ||
| $stmt->execute(['id_project' => $idProject]); | ||
| /** @var list<array{id: int, job_first_segment: int, job_last_segment: int}> $jobs */ | ||
| $jobs = $stmt->fetchAll(PDO::FETCH_ASSOC); | ||
|
|
||
| $this->deleteJobScopedData($conn, $jobs, $batchSize); | ||
| $this->deleteSegmentScopedData($conn, $jobs, $batchSize); | ||
| $this->deleteFileAndProjectScopedData($conn, $idProject, $jobs); | ||
| } |
There was a problem hiding this comment.
deleteProject() relies on a SELECT from jobs to drive the rest of the cascade. In environments where reads can be routed to replicas (see existing codebase note about ProxySQL replica lag), this SELECT can return an empty/stale job list right after a failed creation, causing the job-scoped/segment-scoped cleanup (and the final per-job DELETE of jobs) to be skipped and leave orphan rows. Consider making the cascade independent of replica reads (e.g., drive deletes via subqueries on id_project, and/or delete jobs by id_project at the end), or otherwise ensure these reads are executed against the writer connection.
There was a problem hiding this comment.
Valid concern in general, but not applicable here. Database::getConnection() returns a single PDO instance connected directly to the writer host — there is no ProxySQL read/write split at the PHP connection level. The $conn used throughout deleteProject() is obtained once via $this->dbHandler->getConnection() and reused for all SELECTs and DELETEs in the same call.
That said, the SELECT on files that drove the N+1 loop was eliminated in c851867 — files_parts is now deleted via a subquery WHERE id_file IN (SELECT id FROM files WHERE id_project = :id_project), so the only remaining SELECT is on jobs at the top of deleteProject(), which is necessary to obtain segment ranges for the batched cascade. That one also runs on the same writer connection.
| $stmt = $conn->prepare("SELECT id FROM files WHERE id_project = :id_project"); | ||
| $stmt->execute(['id_project' => $idProject]); | ||
| $fileIds = $stmt->fetchAll(PDO::FETCH_COLUMN); | ||
|
|
||
| foreach ($fileIds as $idFile) { | ||
| $stmt = $conn->prepare("DELETE FROM files_parts WHERE id_file = :id_file"); | ||
| $stmt->execute(['id_file' => $idFile]); | ||
| } | ||
|
|
There was a problem hiding this comment.
The per-file loop issues one DELETE per file to clean files_parts, which can become an N+1 pattern for projects with many files. This can be replaced with a single statement that deletes files_parts by joining/filtering on files.id_project (or an equivalent subquery), which is both faster and avoids depending on a preceding SELECT of file IDs.
| $stmt = $conn->prepare("SELECT id FROM files WHERE id_project = :id_project"); | |
| $stmt->execute(['id_project' => $idProject]); | |
| $fileIds = $stmt->fetchAll(PDO::FETCH_COLUMN); | |
| foreach ($fileIds as $idFile) { | |
| $stmt = $conn->prepare("DELETE FROM files_parts WHERE id_file = :id_file"); | |
| $stmt->execute(['id_file' => $idFile]); | |
| } | |
| $stmt = $conn->prepare( | |
| "DELETE FROM files_parts WHERE id_file IN ( | |
| SELECT id FROM files WHERE id_project = :id_project | |
| )" | |
| ); | |
| $stmt->execute(['id_project' => $idProject]); |
There was a problem hiding this comment.
Fixed in c851867. Replaced the SELECT + per-file DELETE loop with a single subquery:
DELETE FROM files_parts WHERE id_file IN (
SELECT id FROM files WHERE id_project = :id_project
)This eliminates both the N+1 pattern and the preceding SELECT id FROM files entirely.
🧪 Test-Guard Report✅ PASS — All changed source files have adequate test coverage. Coverage Analysis: ❌ FAILNo changed source files found in coverage report (threshold: 80%)
Test File Matching: ❌ FAILFile matching: 1 pass, 2 fail
Per-File Evaluation: ✅ PASSEvaluated 3 files: 3 via AI (3 batches), 0 via shortcuts.
Result: ✅ PASS |
🧪 Test-Guard Report✅ PASS — All changed source files have adequate test coverage. Coverage Analysis: ❌ FAILNo changed source files found in coverage report (threshold: 80%)
Test File Matching: ❌ FAILFile matching: 1 pass, 2 fail
Per-File Evaluation: ✅ PASSEvaluated 3 files: 3 via AI (3 batches), 0 via shortcuts.
Result: ✅ PASS |
🧪 Test-Guard Report✅ PASS — All changed source files have adequate test coverage. Coverage Analysis: ❌ FAILNo changed source files found in coverage report (threshold: 80%)
Test File Matching: ❌ FAILFile matching: 1 pass, 2 fail
Per-File Evaluation: ✅ PASSEvaluated 3 files: 3 via AI (3 batches), 0 via shortcuts.
Result: ✅ PASS |
Summary
Add full cascade
deleteProject()toProjectManagerModeland wire it intoclearFailedProject(), replacing the former partial cleanup that left orphaned rows in segment-scoped tables.Type
feat— new user-facing featurefix— bug fixrefactor— restructure without behavior changechore— build, deps, config, docsperf— performance improvementtest— test coverageChanges
lib/Model/ProjectCreation/ProjectManagerModel.phpdeleteProject()with 3-phase cascade: job-scoped → segment-scoped (per-job ranges) → file/project-scoped. IncludesdeleteInBatches()helper, batchSize validation,file_referencesandcontext_groupsdeletion viaid_projectlib/Model/ProjectCreation/ProjectManager.phpdeleteProject()intoclearFailedProject(), replacing partial cleanup with full cascade deletetests/unit/Model/ProjectCreation/ProjectManagerModelTest.phpdeleteProject()— table order, per-job segment ranges, batch verification, non-contiguous job gap safety, batchSize guard, context_groups by id_project, file_referencestests/unit/Model/ProjectCreation/ClearFailedProjectTest.phpclearFailedProject()wiring, exception propagation, idempotent cleanuptests/unit/Model/ProjectCreation/TestableProjectManager.phpcallClearFailedProject()test wrapperTesting
vendor/bin/phpunit --exclude-group=ExternalServices --no-coveragepasses./vendor/bin/phpstanpasses (0 errors, with baseline)Full suite: 2146 tests, 17986 assertions, 0 failures
AI Disclosure
Claude Code (claude-opus-4-6)
Notes
deleteProject()performs sequential DELETEs withoutbeginTransaction/commit/rollback. This is intentional: a failed project is being cleaned up, and partial progress is preferable to rolling back a long-running delete cascade. The caller (clearFailedProject) catches exceptions and logs them.comments → qa_chunk_reviews → segment_translation_events → segment_translation_versions → segment_translations → files_job → job_metadatasegment_metadata → segment_notes → segment_original_data → segments— iterates per-job using each job'sjob_first_segment/job_last_segmentto avoid spanning gaps that could include other projects' segmentsfiles_parts → files → file_references → file_metadata → context_groups → project_metadata → projects → jobsusleep(300_000)between batches to avoid replication-lag spikes behind ProxySQLbatchSleepMicrosecondsispublic static int— tests set to 0, production uses 300msbatchSizevalidated ≥ 1 to prevent infinite loop indeleteInBatches()context_groupsdeleted viaWHERE id_project(has indexedid_projectcolumn), not by segment rangefile_referencesincluded for data hygiene despite being currently unused in PHP code