Skip to content

feat: dont regeocode records that havent changed in importdatarecords#419

Merged
joaquimds merged 1 commit intomainfrom
feat/less-geocoding
Apr 16, 2026
Merged

feat: dont regeocode records that havent changed in importdatarecords#419
joaquimds merged 1 commit intomainfrom
feat/less-geocoding

Conversation

@joaquimds
Copy link
Copy Markdown
Member

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the importDataRecords job to avoid re-geocoding records when the incoming record JSON hasn’t changed, and adds test coverage to verify the behavior (including deep-equality behavior with differing JSON key order).

Changes:

  • Extend importBatch() to optionally reuse an existing record’s geocode result when the typed JSON is unchanged (deep equality check).
  • Update importDataRecords to fetch existing json/geocode fields for dirty records and pass them into importBatch().
  • Implement CSVAdaptor.fetchByExternalId() and add Vitest coverage for “skip geocode when unchanged” vs “re-geocode when changed”.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/setup.ts Stops starting the worker during test setup.
tests/feature/importDataRecords.test.ts Adds tests covering geocode skipping, re-geocoding on change, and key-order deep equality.
src/server/jobs/importDataSource.ts Adds deep-equality-based geocode reuse logic in importBatch() and updates its call site.
src/server/jobs/importDataRecords.ts Fetches existing record JSON/geocode info to support the “skip geocode when unchanged” path.
src/server/adaptors/csv.ts Implements fetchByExternalId() by scanning the CSV stream and returning matching row IDs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +140 to +149
const existing = existingRecords?.get(record.externalId);
const jsonUnchanged =
existing && isDeepStrictEqual(typedJson, existing.json);

let geocodeResult: GeocodeResult | null;
if (jsonUnchanged) {
geocodeResult = existing.geocodeResult;
} else {
geocodeResult = await geocodeRecord(record, dataSource.geocodingConfig);
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing can be undefined here, but is accessed in the if (jsonUnchanged) branch (existing.geocodeResult). With strict TypeScript this should be a compile error, and at runtime it can also throw if the map doesn't contain the externalId. Consider folding the check into a single condition (e.g. if (existing && isDeepStrictEqual(...))) so existing is guaranteed defined when reading its fields.

Copilot uses AI. Check for mistakes.
Comment on lines 20 to 23
const dataRecords = db
.selectFrom("dataRecord")
.select(["id", "externalId"])
.select(["id", "externalId", "json", "geocodeResult", "geocodePoint"])
.where("dataSourceId", "=", dataSourceId)
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geocodePoint is selected from the DB but not used anywhere (only json/geocodeResult are consulted for the skip logic, and geocodePoint isn't read). This adds extra DB payload (including a PostGIS geography) with no effect. Either drop geocodePoint from the select, or actually use it when jsonUnchanged to preserve the existing point explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines +128 to +131
json: Record<string, unknown>;
geocodeResult: GeocodeResult | null;
geocodePoint: Point | null;
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existingRecords value type includes geocodePoint, but importBatch() never reads it (and always derives geocodePoint from geocodeResult). Consider removing geocodePoint from this type unless you plan to use it, to avoid dead fields and make the intent clearer.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +101
// 4. Run importDataRecords — data hasn't changed, so geocode should be skipped
await importDataRecords({ dataSourceId: dataSource.id });

// 5. Verify geocodeRecord was never called
expect(geocodeSpy).not.toHaveBeenCalled();

// 6. Verify records still have their geocode results
const recordsAfterSecondImport = await getRecords(dataSource.id);
expect(recordsAfterSecondImport).toEqual(recordsAfterFirstImport);

geocodeSpy.mockRestore();
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geocodeSpy.mockRestore() is called at the end of the test, but if an assertion throws earlier the spy may never be restored and can leak into subsequent tests. Consider using try/finally around the assertions or moving cleanup to an afterEach(() => vi.restoreAllMocks()).

Suggested change
// 4. Run importDataRecords — data hasn't changed, so geocode should be skipped
await importDataRecords({ dataSourceId: dataSource.id });
// 5. Verify geocodeRecord was never called
expect(geocodeSpy).not.toHaveBeenCalled();
// 6. Verify records still have their geocode results
const recordsAfterSecondImport = await getRecords(dataSource.id);
expect(recordsAfterSecondImport).toEqual(recordsAfterFirstImport);
geocodeSpy.mockRestore();
try {
// 4. Run importDataRecords — data hasn't changed, so geocode should be skipped
await importDataRecords({ dataSourceId: dataSource.id });
// 5. Verify geocodeRecord was never called
expect(geocodeSpy).not.toHaveBeenCalled();
// 6. Verify records still have their geocode results
const recordsAfterSecondImport = await getRecords(dataSource.id);
expect(recordsAfterSecondImport).toEqual(recordsAfterFirstImport);
} finally {
geocodeSpy.mockRestore();
}

Copilot uses AI. Check for mistakes.
@joaquimds joaquimds merged commit f2ca1b3 into main Apr 16, 2026
5 checks passed
@joaquimds joaquimds deleted the feat/less-geocoding branch April 16, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants