Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ In Progress
- [ ] Audit real-corpus result quality now that field-aware ranking, phrase weighting, truncation cues, and multi-term snippets are in place.
- [ ] Decide whether title-only hits should suppress body snippets or use a different presentation policy in the public facade.
- [ ] Keep the persistent `FetchKitLibrary` construction and search API surface under review as real callers exercise the current design.
- [ ] Explore an opt-in extended snippet surface that can use idle time to precompute short document summaries for larger records, with Apple's [`FoundationModels`](https://developer.apple.com/documentation/foundationmodels) or another local summarization path as the first candidate instead of making foreground full-text search wait on summarization.

### Exit Criteria

Expand Down Expand Up @@ -258,6 +259,8 @@ Planned
- Fixed Search Kit index ownership during teardown so the Search Kit verification lane is green again under both `swift test` and `xcodebuild test`.
- Added a dedicated repo-maintenance helper for the focused Search Kit test lane and recorded persistent-surface polish plus ranking/snippet refinement as the next FetchKit work.
- Tightened the persistent `FetchKitLibrary` surface around one resolved storage location, with Application Support defaults plus a direct directory override for local callers.
- Added the first checked-in fixture corpus for `FetchKit` result-quality characterization, using a tiny attributed Hugging Face Project Gutenberg sample without adding a live dataset-download dependency to CI.
- Kept title-only snippets as the default result explanation and added typed result metadata for matched fields plus snippet source field, so consumers can distinguish title evidence from body evidence.
- Recorded that the GitHub-hosted `macos-15` Natural Language verification attempt timed out, so Apple-asset coverage stays local-only for now.
- Audited the Core Data-backed `FetchKit` store after a GitHub-hosted Swift Testing crash, recorded the executor-assumption findings, moved Core Data verification onto XCTest, and switched the durable store over to a private-queue Core Data context with the framework's async `perform` path.
- Refined conventional-search result quality with modest field-aware ranking plus query-aware multi-term snippets across the in-memory and SearchKit-backed `FetchKit` paths.
Expand Down
8 changes: 7 additions & 1 deletion Sources/FetchCore/Search.swift
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,20 @@ public struct FetchSearchResult: Hashable, Codable, Sendable {
public let document: FetchDocument
public let score: Double
public let snippet: FetchSnippet?
public let matchedFields: Set<FetchSearchField>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve backwards-decoding for FetchSearchResult

Adding matchedFields as a non-optional stored property on this public Codable type causes synthesized decoding to require the key, so JSON produced before this change will now fail with keyNotFound when decoded. This impacts any caller that persists or caches FetchSearchResult across app/package upgrades; a manual init(from:) should decode matchedFields with a fallback (for example decodeIfPresent(... ) ?? []) to keep older payloads readable.

Useful? React with 👍 / 👎.

public let snippetField: FetchSearchField?

public init(
document: FetchDocument,
score: Double,
snippet: FetchSnippet? = nil
snippet: FetchSnippet? = nil,
matchedFields: Set<FetchSearchField> = [],
snippetField: FetchSearchField? = nil
) {
self.document = document
self.score = score
self.snippet = snippet
self.matchedFields = matchedFields
self.snippetField = snippetField
}
}
4 changes: 3 additions & 1 deletion Sources/FetchKit/InMemoryFetchIndex.swift
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,9 @@ actor InMemoryFetchIndex: FetchIndex {
score: score,
snippet: snippetMatch.flatMap { match in
FetchSearchSupport.buildSnippet(from: match.text, query: query)
}
},
matchedFields: Set(matches.map(\.field)),
snippetField: snippetMatch?.field
)
}

Expand Down
20 changes: 14 additions & 6 deletions Sources/FetchKit/SearchKitFetchIndex.swift
Original file line number Diff line number Diff line change
Expand Up @@ -317,23 +317,29 @@ public actor SearchKitFetchIndex: FetchIndex {
}

let score = existing.score + new.score
let snippet = preferredSnippet(existing: existing, new: new)
let snippetMatch = preferredSnippet(existing: existing, new: new)
return FetchSearchResult(
document: existing.document,
score: score,
snippet: snippet
snippet: snippetMatch.snippet,
matchedFields: existing.matchedFields.union([new.field]),
snippetField: snippetMatch.field
)
}

private func preferredSnippet(
existing: FetchSearchResult,
new: FieldSearchMatch
) -> FetchSnippet? {
) -> (snippet: FetchSnippet?, field: FetchSearchField?) {
if new.field == .body, new.snippet != nil {
return new.snippet
return (new.snippet, new.field)
}

return existing.snippet ?? new.snippet
if let existingSnippet = existing.snippet {
return (existingSnippet, existing.snippetField)
}

return (new.snippet, new.snippet == nil ? nil : new.field)
}

private func normalize(
Expand Down Expand Up @@ -413,7 +419,9 @@ private struct FieldSearchMatch {
FetchSearchResult(
document: document,
score: score,
snippet: snippet
snippet: snippet,
matchedFields: [field],
snippetField: snippet == nil ? nil : field
)
}
}
Expand Down
21 changes: 21 additions & 0 deletions Tests/FetchCoreTests/FetchCoreModelTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,27 @@ struct FetchCoreSearchModelTests {
#expect(result.score == 0.9)
#expect(result.snippet?.text == "Apples are bright and crisp.")
#expect(result.snippet?.matchRanges == [FetchMatchRange(lowerBound: 0, upperBound: 6)])
#expect(result.matchedFields.isEmpty)
#expect(result.snippetField == nil)
}

@Test("Fetch search results can describe matched fields and snippet source")
func fetchSearchResultsDescribeMatchedFields() {
let document = FetchDocument(
id: "doc-apple",
title: "Apple Guide",
body: "Apples are bright and crisp."
)
let result = FetchSearchResult(
document: document,
score: 0.9,
snippet: FetchSnippet(text: "Apple Guide"),
matchedFields: [.title],
snippetField: .title
)

#expect(result.matchedFields == [.title])
#expect(result.snippetField == .title)
}

@Test("Fetch document records keep durable metadata separate from search and index views")
Expand Down
8 changes: 8 additions & 0 deletions Tests/FetchKitTests/FetchKitLibraryTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ struct FetchKitLibraryTests {
#expect(results.count == 1)
#expect(results[0].document.id == "doc-apple")
#expect(results[0].snippet?.text.contains("bright") == true)
#expect(results[0].matchedFields == [.body])
#expect(results[0].snippetField == .body)
}

@Test("FetchKitLibrary prefers title matches over body-only matches")
Expand All @@ -139,6 +141,10 @@ struct FetchKitLibraryTests {

#expect(results.count == 2)
#expect(results.map(\.document.id) == ["doc-title", "doc-body"])
#expect(results[0].matchedFields == [.title])
#expect(results[0].snippetField == .title)
#expect(results[1].matchedFields == [.body])
#expect(results[1].snippetField == .body)
}

@Test("FetchKitLibrary snippets highlight multiple query terms")
Expand All @@ -159,6 +165,8 @@ struct FetchKitLibraryTests {
#expect(snippet.text.localizedCaseInsensitiveContains("bright"))
#expect(snippet.text.localizedCaseInsensitiveContains("crisp"))
#expect(snippet.matchRanges.count >= 2)
#expect(results.first?.matchedFields == [.body])
#expect(results.first?.snippetField == .body)
}

@Test("FetchKitLibrary snippets show truncation markers when context is cropped")
Expand Down
83 changes: 83 additions & 0 deletions Tests/FetchKitTests/FixtureCorpusQualityTests.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import FetchCore
import Testing
@testable import FetchKit

@Suite("FetchKit fixture corpus quality", .serialized)
struct FixtureCorpusQualityTests {
@Test("Fixture corpus records carry source attribution")
func fixtureCorpusRecordsCarrySourceAttribution() {
#expect(GutenbergMiniCorpus.source.datasetID == "zkeown/gutenberg-corpus")
#expect(GutenbergMiniCorpus.source.config == "chapters")
#expect(GutenbergMiniCorpus.source.split == "train")
#expect(GutenbergMiniCorpus.records.allSatisfy { $0.sourceURI == GutenbergMiniCorpus.source.url })
#expect(GutenbergMiniCorpus.records.allSatisfy { $0.metadata["fixture.dataset"] == GutenbergMiniCorpus.source.datasetID })
}

@Test("Fixture corpus retrieves a body-driven chapter hit")
func fixtureCorpusRetrievesBodyDrivenChapterHit() async throws {
let library = try await indexedFixtureLibrary()

let results = try await library.search(
"storage food seeds",
kind: .allTerms,
fields: [.title, .body],
limit: 3
)
let firstResult = try #require(results.first)

#expect(firstResult.document.id == "gutenberg-78430-chapter-1")
#expect(firstResult.snippet?.text.localizedCaseInsensitiveContains("storage") == true)
#expect(firstResult.snippet?.text.localizedCaseInsensitiveContains("food") == true)
#expect(firstResult.snippet?.text.localizedCaseInsensitiveContains("seeds") == true)
#expect((firstResult.snippet?.matchRanges.count ?? 0) >= 3)
#expect(firstResult.matchedFields == [.body])
#expect(firstResult.snippetField == .body)
}

@Test("Fixture corpus keeps closely related chapters separate")
func fixtureCorpusKeepsRelatedChaptersSeparate() async throws {
let library = try await indexedFixtureLibrary()

let foodStorageResults = try await library.search(
"storage food seeds",
kind: .allTerms,
fields: [.body],
limit: 4
)
let germinationResults = try await library.search(
"germinating seed organic",
kind: .allTerms,
fields: [.body],
limit: 4
)

#expect(foodStorageResults.map(\.document.id) == ["gutenberg-78430-chapter-1"])
#expect(germinationResults.map(\.document.id) == ["gutenberg-78430-chapter-2"])
}

@Test("Fixture corpus title-only hits use the title as the current snippet source")
func fixtureCorpusTitleOnlyHitUsesTitleSnippet() async throws {
let library = try await indexedFixtureLibrary()

let results = try await library.search(
"rocket test pilot",
kind: .allTerms,
fields: [.title, .body],
limit: 3
)
let firstResult = try #require(results.first)
let snippet = try #require(firstResult.snippet)

#expect(firstResult.document.id == "gutenberg-78431-book")
#expect(firstResult.matchedFields == [.title])
#expect(firstResult.snippetField == .title)
#expect(snippet.text.localizedCaseInsensitiveContains("rocket test pilot"))
#expect(!snippet.text.localizedCaseInsensitiveContains("Transcriber's Note"))
}

private func indexedFixtureLibrary() async throws -> FetchKitLibrary {
let library = FetchKitLibrary()
try await library.addDocuments(GutenbergMiniCorpus.records)
return library
}
}
94 changes: 94 additions & 0 deletions Tests/FetchKitTests/Fixtures/GutenbergMiniCorpus.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
import FetchCore

enum GutenbergMiniCorpus {
struct Source: Hashable, Sendable {
let datasetID: String
let config: String
let split: String
let license: String
let url: String
}

static let source = Source(
datasetID: "zkeown/gutenberg-corpus",
config: "chapters",
split: "train",
license: "Apache-2.0 dataset packaging; source texts marked public domain in the USA",
url: "https://huggingface.co/datasets/zkeown/gutenberg-corpus"
)

static let records: [FetchDocumentRecord] = [
FetchDocumentRecord(
id: "gutenberg-78430-chapter-1",
title: "A practical course in botany: Chapter I. The Seed",
body: """
I. The storage of food in seeds.

Material. In addition to the four food tests described in the course, provide raw starch, grape sugar, the white of a hard-boiled egg, and a fatty substance such as lard or oil. Living material includes grains of corn and wheat, and seeds of some kind of bean.
""",
kind: .reference,
language: "en",
sourceURI: source.url,
metadata: [
"fixture.dataset": source.datasetID,
"fixture.config": source.config,
"fixture.split": source.split,
"fixture.row": "2",
"fixture.gutenbergID": "78430",
]
),
FetchDocumentRecord(
id: "gutenberg-78430-chapter-2",
title: "A practical course in botany: Chapter II. Germination and Growth",
body: """
Processes accompanying germination.

Material includes corn, peas, beans, or any quickly germinating seed. Before taking up the study of germinating seeds, it is important to learn from what sources the organic substances used by the growing plant are derived.
""",
kind: .reference,
language: "en",
sourceURI: source.url,
metadata: [
"fixture.dataset": source.datasetID,
"fixture.config": source.config,
"fixture.split": source.split,
"fixture.row": "3",
"fixture.gutenbergID": "78430",
]
),
FetchDocumentRecord(
id: "gutenberg-78431-book",
title: "Always Another Dawn: The Story of a Rocket Test Pilot",
body: """
Transcriber's Note: Italicized text is surrounded by underscores. The opening material identifies A. Scott Crossfield with Clay Blair, Jr. and includes publisher front matter before the main narrative begins.
""",
kind: .article,
language: "en",
sourceURI: source.url,
metadata: [
"fixture.dataset": source.datasetID,
"fixture.config": "books",
"fixture.split": "train",
"fixture.row": "2",
"fixture.gutenbergID": "78431",
]
),
FetchDocumentRecord(
id: "gutenberg-78432-book",
title: "The young pioneers of the North-west",
body: """
Transcriber's note: Unusual and inconsistent spelling is as printed. The frontier series opening material introduces a juvenile fiction setting around pioneer children, conduct of life, and frontier life.
""",
kind: .article,
language: "en",
sourceURI: source.url,
metadata: [
"fixture.dataset": source.datasetID,
"fixture.config": "books",
"fixture.split": "train",
"fixture.row": "3",
"fixture.gutenbergID": "78432",
]
),
]
}
Loading
Loading