feat: dont regeocode records that havent changed in importdatarecords#419
feat: dont regeocode records that havent changed in importdatarecords#419
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes the importDataRecords job to avoid re-geocoding records when the incoming record JSON hasn’t changed, and adds test coverage to verify the behavior (including deep-equality behavior with differing JSON key order).
Changes:
- Extend
importBatch()to optionally reuse an existing record’s geocode result when the typed JSON is unchanged (deep equality check). - Update
importDataRecordsto fetch existingjson/geocode fields for dirty records and pass them intoimportBatch(). - Implement
CSVAdaptor.fetchByExternalId()and add Vitest coverage for “skip geocode when unchanged” vs “re-geocode when changed”.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/setup.ts | Stops starting the worker during test setup. |
| tests/feature/importDataRecords.test.ts | Adds tests covering geocode skipping, re-geocoding on change, and key-order deep equality. |
| src/server/jobs/importDataSource.ts | Adds deep-equality-based geocode reuse logic in importBatch() and updates its call site. |
| src/server/jobs/importDataRecords.ts | Fetches existing record JSON/geocode info to support the “skip geocode when unchanged” path. |
| src/server/adaptors/csv.ts | Implements fetchByExternalId() by scanning the CSV stream and returning matching row IDs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const existing = existingRecords?.get(record.externalId); | ||
| const jsonUnchanged = | ||
| existing && isDeepStrictEqual(typedJson, existing.json); | ||
|
|
||
| let geocodeResult: GeocodeResult | null; | ||
| if (jsonUnchanged) { | ||
| geocodeResult = existing.geocodeResult; | ||
| } else { | ||
| geocodeResult = await geocodeRecord(record, dataSource.geocodingConfig); | ||
| } |
There was a problem hiding this comment.
existing can be undefined here, but is accessed in the if (jsonUnchanged) branch (existing.geocodeResult). With strict TypeScript this should be a compile error, and at runtime it can also throw if the map doesn't contain the externalId. Consider folding the check into a single condition (e.g. if (existing && isDeepStrictEqual(...))) so existing is guaranteed defined when reading its fields.
| const dataRecords = db | ||
| .selectFrom("dataRecord") | ||
| .select(["id", "externalId"]) | ||
| .select(["id", "externalId", "json", "geocodeResult", "geocodePoint"]) | ||
| .where("dataSourceId", "=", dataSourceId) |
There was a problem hiding this comment.
geocodePoint is selected from the DB but not used anywhere (only json/geocodeResult are consulted for the skip logic, and geocodePoint isn't read). This adds extra DB payload (including a PostGIS geography) with no effect. Either drop geocodePoint from the select, or actually use it when jsonUnchanged to preserve the existing point explicitly.
| json: Record<string, unknown>; | ||
| geocodeResult: GeocodeResult | null; | ||
| geocodePoint: Point | null; | ||
| } |
There was a problem hiding this comment.
The existingRecords value type includes geocodePoint, but importBatch() never reads it (and always derives geocodePoint from geocodeResult). Consider removing geocodePoint from this type unless you plan to use it, to avoid dead fields and make the intent clearer.
| // 4. Run importDataRecords — data hasn't changed, so geocode should be skipped | ||
| await importDataRecords({ dataSourceId: dataSource.id }); | ||
|
|
||
| // 5. Verify geocodeRecord was never called | ||
| expect(geocodeSpy).not.toHaveBeenCalled(); | ||
|
|
||
| // 6. Verify records still have their geocode results | ||
| const recordsAfterSecondImport = await getRecords(dataSource.id); | ||
| expect(recordsAfterSecondImport).toEqual(recordsAfterFirstImport); | ||
|
|
||
| geocodeSpy.mockRestore(); |
There was a problem hiding this comment.
geocodeSpy.mockRestore() is called at the end of the test, but if an assertion throws earlier the spy may never be restored and can leak into subsequent tests. Consider using try/finally around the assertions or moving cleanup to an afterEach(() => vi.restoreAllMocks()).
| // 4. Run importDataRecords — data hasn't changed, so geocode should be skipped | |
| await importDataRecords({ dataSourceId: dataSource.id }); | |
| // 5. Verify geocodeRecord was never called | |
| expect(geocodeSpy).not.toHaveBeenCalled(); | |
| // 6. Verify records still have their geocode results | |
| const recordsAfterSecondImport = await getRecords(dataSource.id); | |
| expect(recordsAfterSecondImport).toEqual(recordsAfterFirstImport); | |
| geocodeSpy.mockRestore(); | |
| try { | |
| // 4. Run importDataRecords — data hasn't changed, so geocode should be skipped | |
| await importDataRecords({ dataSourceId: dataSource.id }); | |
| // 5. Verify geocodeRecord was never called | |
| expect(geocodeSpy).not.toHaveBeenCalled(); | |
| // 6. Verify records still have their geocode results | |
| const recordsAfterSecondImport = await getRecords(dataSource.id); | |
| expect(recordsAfterSecondImport).toEqual(recordsAfterFirstImport); | |
| } finally { | |
| geocodeSpy.mockRestore(); | |
| } |
No description provided.