feat: add RedisBackedAC with preset-optimized engine and benchmarks#124
Conversation
Implement Redis-backed Aho-Corasick engine (RedisBackedAC) with selectable architecture presets (Speed, Balanced, MemoryEfficient, Ultimate). Redis serves as source of truth; a local preset-optimized automaton is cached for fast reads. Writes use V2 Lua scripts with optimistic locking; reads hit the local engine with no Redis I/O on the hot path. Also add in-memory engine implementations for all 4 presets, comprehensive tests (15 tests with race detector), and a full benchmark suite covering unit, E2E, scale, comparison, unicode, and memory profiling categories.
Reviewer's GuideIntroduces a new pluggable Aho-Corasick engine architecture with in-memory and Redis-backed preset-optimized engines, wires them into the existing AhoCorasick.Create API via new InMemory and Preset options, and adds tests/benchmarks plus documentation for the new modes and presets. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
RedisBackedAC.ensureValidyou callreloadFromRedisLocked, which performsHGetAllwhile holdingac.mu—this can unnecessarily block all readers during slow Redis operations; consider restructuring to fetch the snapshot outside the mutex and only hold the lock while swapping in the new engine state. - There is quite a bit of overlap between
reloadFromRedis,reloadFromRedisLocked, andreadTrieSnapshotin how trie state is loaded and interpreted; consolidating these into a single snapshot-loading helper would reduce duplication and the risk of subtle divergence between code paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `RedisBackedAC.ensureValid` you call `reloadFromRedisLocked`, which performs `HGetAll` while holding `ac.mu`—this can unnecessarily block all readers during slow Redis operations; consider restructuring to fetch the snapshot outside the mutex and only hold the lock while swapping in the new engine state.
- There is quite a bit of overlap between `reloadFromRedis`, `reloadFromRedisLocked`, and `readTrieSnapshot` in how trie state is loaded and interpreted; consolidating these into a single snapshot-loading helper would reduce duplication and the risk of subtle divergence between code paths.
## Individual Comments
### Comment 1
<location path="pkg/acor/engine_flat.go" line_range="209-218" />
<code_context>
+ return matched
+}
+
+func (e *speedEngine) info() *InMemoryInfo {
+ if e.dfa == nil {
+ return &InMemoryInfo{Preset: e.preset}
+ }
+ var mem int64
+ for _, row := range e.dfa {
+ mem += int64(len(row)) * 8
+ }
+ for _, outs := range e.outputMap {
+ mem += int64(16 + len(outs)*16)
+ }
+ mem += int64(len(e.alphabet)) * 16
+ mem += int64(len(e.alphaMap)) * 24
+
+ return &InMemoryInfo{
+ Keywords: e.countKeywords(),
+ Nodes: e.numStates,
</code_context>
<issue_to_address>
**suggestion:** InMemoryInfo from speedEngine does not populate TrieDepth, unlike other engines
info() leaves TrieDepth at its zero value here, while the other engines populate it, which makes preset comparisons less consistent. Since you already know node depths when building the flat trie, consider tracking the maximum depth there and setting TrieDepth in InMemoryInfo for this engine as well.
</issue_to_address>
### Comment 2
<location path="pkg/acor/engine_test.go" line_range="105-106" />
<code_context>
+ for _, kw := range tc.keywords {
+ ac.Add(kw)
+ }
+ matches := ac.Find(tc.text)
+ if !equalUnordered(matches, tc.expected) {
+ t.Errorf("expected %v, got %v", tc.expected, matches)
+ }
</code_context>
<issue_to_address>
**suggestion (testing):** Tests use order-insensitive comparison while the API comment says results are returned in order of occurrence
`TestInMemoryFind` uses `equalUnordered`, but `InMemoryAC.Find` promises results "in order of occurrence". Order-insensitive checks could mask regressions if the implementation stops preserving order. Please either assert the specific expected order (especially for overlapping/repeated matches) or update the API comment if ordering is not guaranteed.
Suggested implementation:
```golang
import (
"reflect"
```
```golang
matches := ac.Find(tc.text)
if !reflect.DeepEqual(matches, tc.expected) {
t.Errorf("expected matches in order %v, got %v", tc.expected, matches)
}
```
</issue_to_address>
### Comment 3
<location path="pkg/acor/redis_backed_test.go" line_range="426-421" />
<code_context>
+ }
+}
+
+func TestRedisBackedDegradedMode(t *testing.T) {
+ mr := miniredis.RunT(t)
+ ac, err := NewRedisBacked(context.Background(), &RedisBackedArgs{
+ AhoCorasickArgs: AhoCorasickArgs{Addr: mr.Addr(), Name: "test-degraded"},
+ Preset: PresetBalanced,
+ })
+ if err != nil {
+ t.Fatalf("NewRedisBacked: %v", err)
+ }
+ defer func() { _ = ac.Close() }()
+
+ ctx := context.Background()
+ ac.Add(ctx, "hello") //nolint:errcheck,gosec
+
+ mr.Close()
+
+ matched, err := ac.Find(ctx, "hello world")
+ if err != nil {
+ t.Fatalf("Find degraded: %v", err)
+ }
+ if len(matched) != 1 || matched[0] != testKeywordHello {
+ t.Errorf("Find degraded = %v, want [hello]", matched)
+ }
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for Find/FindIndex behavior when ensureValid fails due to context cancellation
There’s coverage for degraded mode when Redis goes away, but not for `Find`/`FindIndex` when `ensureValid` fails (e.g. due to context cancellation during reload). Since `ensureValid` is on the hot path, it’d be good to add a focused test that:
- Constructs via `newTestRedisBacked`.
- Marks the instance stale with `ac.markStale()`.
- Calls `Find`/`FindIndex` with a cancelled `context.Context`.
- Asserts the error is `context.Canceled` (or at least non-nil) and that there’s no panic.
This would directly exercise the error and cancellation path in `ensureValid` without relying on Redis failures.
Suggested implementation:
```golang
func TestEnsureValidContextCanceled(t *testing.T) {
ac, mr := newTestRedisBacked(t, "test-ensure-valid-context-canceled")
defer func() { _ = ac.Close() }()
defer mr.Close()
// Mark the instance stale so that the next Find/FindIndex calls go through ensureValid.
ac.markStale()
// Use a context that is already canceled to force ensureValid to fail fast.
ctx, cancel := context.WithCancel(context.Background())
cancel()
_, err := ac.Find(ctx, "hello world")
if err == nil {
t.Fatalf("Find with canceled context: got nil error, want non-nil")
}
if !errors.Is(err, context.Canceled) {
t.Fatalf("Find with canceled context: got %v, want %v", err, context.Canceled)
}
_, err = ac.FindIndex(ctx, "hello world")
if err == nil {
t.Fatalf("FindIndex with canceled context: got nil error, want non-nil")
}
if !errors.Is(err, context.Canceled) {
t.Fatalf("FindIndex with canceled context: got %v, want %v", err, context.Canceled)
}
}
func TestRedisBackedNew(t *testing.T) {
mr := miniredis.RunT(t)
ac, err := NewRedisBacked(context.Background(), &RedisBackedArgs{
AhoCorasickArgs: AhoCorasickArgs{
Addr: mr.Addr(),
Name: "test-new",
```
1. Ensure the test file imports the `errors` package:
- In the import block at the top of `pkg/acor/redis_backed_test.go`, add `errors` if it is not already present:
`import ("errors" ... )`.
2. Confirm that `newTestRedisBacked` returns `(ac *RedisBacked, mr *miniredis.Miniredis)` or analogous types and that the package name of the test matches where `markStale` is defined so the test can access this unexported method.
3. If `FindIndex` has a different signature (e.g., additional parameters or a different return type), adjust the call accordingly while still using the canceled `ctx` and asserting the returned error as shown.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| func (e *speedEngine) info() *InMemoryInfo { | ||
| if e.dfa == nil { | ||
| return &InMemoryInfo{Preset: e.preset} | ||
| } | ||
| var mem int64 | ||
| for _, row := range e.dfa { | ||
| mem += int64(len(row)) * 8 | ||
| } | ||
| for _, outs := range e.outputMap { | ||
| mem += int64(16 + len(outs)*16) |
There was a problem hiding this comment.
suggestion: InMemoryInfo from speedEngine does not populate TrieDepth, unlike other engines
info() leaves TrieDepth at its zero value here, while the other engines populate it, which makes preset comparisons less consistent. Since you already know node depths when building the flat trie, consider tracking the maximum depth there and setting TrieDepth in InMemoryInfo for this engine as well.
| matches := ac.Find(tc.text) | ||
| if !equalUnordered(matches, tc.expected) { |
There was a problem hiding this comment.
suggestion (testing): Tests use order-insensitive comparison while the API comment says results are returned in order of occurrence
TestInMemoryFind uses equalUnordered, but InMemoryAC.Find promises results "in order of occurrence". Order-insensitive checks could mask regressions if the implementation stops preserving order. Please either assert the specific expected order (especially for overlapping/repeated matches) or update the API comment if ordering is not guaranteed.
Suggested implementation:
import (
"reflect" matches := ac.Find(tc.text)
if !reflect.DeepEqual(matches, tc.expected) {
t.Errorf("expected matches in order %v, got %v", tc.expected, matches)
}| if err != nil { | ||
| t.Fatalf("ac2.Find: %v", err) | ||
| } | ||
| if len(matched) != 1 || matched[0] != testKeywordHello { |
There was a problem hiding this comment.
suggestion (testing): Add a test for Find/FindIndex behavior when ensureValid fails due to context cancellation
There’s coverage for degraded mode when Redis goes away, but not for Find/FindIndex when ensureValid fails (e.g. due to context cancellation during reload). Since ensureValid is on the hot path, it’d be good to add a focused test that:
- Constructs via
newTestRedisBacked. - Marks the instance stale with
ac.markStale(). - Calls
Find/FindIndexwith a cancelledcontext.Context. - Asserts the error is
context.Canceled(or at least non-nil) and that there’s no panic.
This would directly exercise the error and cancellation path in ensureValid without relying on Redis failures.
Suggested implementation:
func TestEnsureValidContextCanceled(t *testing.T) {
ac, mr := newTestRedisBacked(t, "test-ensure-valid-context-canceled")
defer func() { _ = ac.Close() }()
defer mr.Close()
// Mark the instance stale so that the next Find/FindIndex calls go through ensureValid.
ac.markStale()
// Use a context that is already canceled to force ensureValid to fail fast.
ctx, cancel := context.WithCancel(context.Background())
cancel()
_, err := ac.Find(ctx, "hello world")
if err == nil {
t.Fatalf("Find with canceled context: got nil error, want non-nil")
}
if !errors.Is(err, context.Canceled) {
t.Fatalf("Find with canceled context: got %v, want %v", err, context.Canceled)
}
_, err = ac.FindIndex(ctx, "hello world")
if err == nil {
t.Fatalf("FindIndex with canceled context: got nil error, want non-nil")
}
if !errors.Is(err, context.Canceled) {
t.Fatalf("FindIndex with canceled context: got %v, want %v", err, context.Canceled)
}
}
func TestRedisBackedNew(t *testing.T) {
mr := miniredis.RunT(t)
ac, err := NewRedisBacked(context.Background(), &RedisBackedArgs{
AhoCorasickArgs: AhoCorasickArgs{
Addr: mr.Addr(),
Name: "test-new",- Ensure the test file imports the
errorspackage:- In the import block at the top of
pkg/acor/redis_backed_test.go, adderrorsif it is not already present:
import ("errors" ... ).
- In the import block at the top of
- Confirm that
newTestRedisBackedreturns(ac *RedisBacked, mr *miniredis.Miniredis)or analogous types and that the package name of the test matches wheremarkStaleis defined so the test can access this unexported method. - If
FindIndexhas a different signature (e.g., additional parameters or a different return type), adjust the call accordingly while still using the canceledctxand asserting the returned error as shown.
Extend AhoCorasickArgs with InMemory and Preset fields so all modes go through the same Create() constructor and AhoCorasick type. Remove NewInMemory, NewRedisBacked, InMemoryAC, RedisBackedAC to preserve backward compatibility of the existing public API. - Add PresetNone (zero value) to avoid implicit preset activation - Add inMemoryOps and presetRedisOps adapters for operations interface - Add mode-based branching in Create() and closeFn pattern in Close() - Guard Debug() for in-memory and preset modes - Exclude errcheck/gosec from test files in golangci-lint config - Update all tests and benchmarks to use unified Create API - Update documentation to reflect unified API
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 6 issues, and left some high level feedback:
- In
redisBackedAC.ensureValid/reloadFromRedisLocked, reload errors are swallowed andstalemay remain true while callers see a nil error; consider at least logging or surfacing a wrapped error so operators can detect and debug persistent reload failures instead of silently staying on a stale snapshot. - There is a fair amount of duplicated logic between the original V2 path and the new Redis-backed preset path (e.g., trie snapshot read/unmarshal and version handling); consider factoring common helpers to reduce maintenance overhead and keep schema/V2 behavior changes in one place.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `redisBackedAC.ensureValid`/`reloadFromRedisLocked`, reload errors are swallowed and `stale` may remain true while callers see a nil error; consider at least logging or surfacing a wrapped error so operators can detect and debug persistent reload failures instead of silently staying on a stale snapshot.
- There is a fair amount of duplicated logic between the original V2 path and the new Redis-backed preset path (e.g., trie snapshot read/unmarshal and version handling); consider factoring common helpers to reduce maintenance overhead and keep schema/V2 behavior changes in one place.
## Individual Comments
### Comment 1
<location path="pkg/acor/acor.go" line_range="260-269" />
<code_context>
// complete even if the caller's context is already canceled.
RollbackTimeout time.Duration
+
+ // InMemory enables pure in-memory mode with no Redis dependency.
+ // When true, Addr, Addrs, MasterName, RingAddrs, Password, DB, SchemaVersion,
+ // and EnableCache must all be unset (zero values). A Preset may optionally be
+ // specified to select the engine architecture (defaults to PresetBalanced).
+ InMemory bool
+
+ // Preset selects the architecture for the local match engine.
+ // When set and InMemory is true, selects the in-memory engine architecture.
+ // When set and InMemory is false, uses Redis-backed engine with a local
+ // preset-optimized automaton for fast reads.
+ // When unset (zero), the original AhoCorasick engine is used.
+ // Preset mode forces V2 schema and is incompatible with EnableCache.
+ Preset Preset
}
</code_context>
<issue_to_address>
**issue (bug_risk):** The documented invariants for InMemory (SchemaVersion, EnableCache) are not fully enforced in Create.
The InMemory docs require SchemaVersion and EnableCache to be unset, but the Create() path for args.InMemory only relies on hasAnyRedisConfig(), which doesn’t consider those fields. As a result, callers can set InMemory=true along with SchemaVersion/EnableCache and still get an in-memory instance, violating the documented contract. Please either extend hasAnyRedisConfig() to include SchemaVersion/EnableCache or add explicit checks in the InMemory branch so behavior matches the documented invariants.
</issue_to_address>
### Comment 2
<location path="pkg/acor/errors.go" line_range="30-32" />
<code_context>
+ // ErrInMemoryWithRedisConfig is returned when InMemory is true but Redis
+ // configuration fields are also set.
+ ErrInMemoryWithRedisConfig = errors.New("InMemory mode cannot be used with Redis configuration")
+ // ErrPresetRequiresRedis is returned when a Preset is specified without
+ // InMemory=true and without any Redis address.
+ ErrPresetRequiresRedis = errors.New("Preset requires either InMemory mode or a Redis address")
+ // ErrPresetRequiresV2 is returned when a Preset is set with SchemaVersion=1.
+ ErrPresetRequiresV2 = errors.New("Preset engine requires V2 schema")
</code_context>
<issue_to_address>
**issue:** ErrPresetRequiresRedis message is inconsistent with how Preset is actually validated.
Create() routes InMemory=true into the in-memory branch before the Preset branch, and the Preset branch requires hasAnyRedisConfig()==true with InMemory=false. So a Preset with InMemory=true never hits the Preset-Redis branch, and a Preset without Redis always yields ErrPresetRequiresRedis regardless of InMemory. Please either update the error text to reflect the actual constraint (e.g. "Preset requires a Redis configuration when InMemory is false") or adjust the branching logic if the intention is to support both modes.
</issue_to_address>
### Comment 3
<location path="pkg/acor/redis_backed_test.go" line_range="434-442" />
<code_context>
+ }
+}
+
+func TestPresetRedisSuggestError(t *testing.T) {
+ ac := newTestPresetRedis(t, PresetBalanced)
+ _, err := ac.Suggest("he")
</code_context>
<issue_to_address>
**suggestion (testing):** Also cover SuggestIndex error behavior in preset-Redis mode
Please add a similar test for `SuggestIndex` (e.g. `TestPresetRedisSuggestIndexError`) that asserts it also returns `ErrSuggestRequiresRedis` in preset-Redis mode, so both methods’ error contracts are covered and protected against regressions.
```suggestion
func TestPresetRedisSuggestError(t *testing.T) {
ac := newTestPresetRedis(t, PresetBalanced)
_, err := ac.Suggest("he")
if err != ErrSuggestRequiresRedis {
t.Errorf("expected ErrSuggestRequiresRedis, got %v", err)
}
}
func TestPresetRedisSuggestIndexError(t *testing.T) {
ac := newTestPresetRedis(t, PresetBalanced)
_, err := ac.SuggestIndex("he")
if err != ErrSuggestRequiresRedis {
t.Errorf("expected ErrSuggestRequiresRedis, got %v", err)
}
}
func TestPresetRedisV1Error(t *testing.T) {
```
</issue_to_address>
### Comment 4
<location path="pkg/acor/engine_test.go" line_range="32-33" />
<code_context>
+func TestInMemoryAdd(t *testing.T) {
+ for _, preset := range allPresets() {
+ t.Run(preset.String(), func(t *testing.T) {
+ ac := createTestInMemory(t, preset)
+ ac.Close()
+
+ added, err := ac.Add("hello")
</code_context>
<issue_to_address>
**suggestion (testing):** Avoid closing in-memory instances at the start of tests or explicitly assert post-close behavior
Several in-memory tests (e.g., `TestInMemoryAdd`, `TestInMemoryRemove`, `TestInMemoryFind`) call `createTestInMemory` and then immediately `ac.Close()` before using `ac`. This makes the tests depend on undocumented post-close behavior and could let them keep passing even if `Close` later starts returning errors or invalidating the instance. Consider either using `t.Cleanup(ac.Close)` so the instance stays open during the test, or adding explicit assertions that operations after `Close` are still valid and error-free.
Suggested implementation:
```golang
func TestInMemoryAdd(t *testing.T) {
for _, preset := range allPresets() {
t.Run(preset.String(), func(t *testing.T) {
ac := createTestInMemory(t, preset)
t.Cleanup(ac.Close)
added, err := ac.Add("hello")
```
The same pattern should be applied to other in-memory tests that currently call `ac.Close()` immediately after `createTestInMemory` (e.g., `TestInMemoryRemove`, `TestInMemoryFind`, and any others with this pattern). For each such test:
1. Replace `ac.Close()` directly after initialization with `t.Cleanup(ac.Close)` so the instance remains valid during the test.
2. If you intentionally want to test post-close behavior, keep `ac.Close()` but then add explicit assertions about the expected behavior of subsequent operations after close (errors or success), rather than relying on implicit behavior.
</issue_to_address>
### Comment 5
<location path="pkg/acor/engine_test.go" line_range="336-345" />
<code_context>
+ }
+}
+
+func TestInMemorySuggestError(t *testing.T) {
+ ac := createTestInMemory(t, PresetBalanced)
+ ac.Close()
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for SuggestIndex error in in-memory mode
There’s no equivalent coverage for `SuggestIndex`. Please add a `TestInMemorySuggestIndexError` (mirroring the preset-Redis tests) that asserts `SuggestIndex` returns `ErrSuggestRequiresRedis` in in-memory mode, so both APIs are consistently guarded against this configuration and regressions are less likely.
```suggestion
func TestInMemorySuggestError(t *testing.T) {
ac := createTestInMemory(t, PresetBalanced)
ac.Close()
_, err := ac.Suggest("he")
if err != ErrSuggestRequiresRedis {
t.Errorf("expected ErrSuggestRequiresRedis, got %v", err)
}
}
func TestInMemorySuggestIndexError(t *testing.T) {
ac := createTestInMemory(t, PresetBalanced)
ac.Close()
_, err := ac.SuggestIndex("he")
if err != ErrSuggestRequiresRedis {
t.Errorf("expected ErrSuggestRequiresRedis, got %v", err)
}
}
func TestInMemoryWithRedisConfigError(t *testing.T) {
```
</issue_to_address>
### Comment 6
<location path="pkg/acor/redis_backed_test.go" line_range="442" />
<code_context>
+ }
+}
+
+func TestPresetRedisV1Error(t *testing.T) {
+ _, err := Create(&AhoCorasickArgs{
+ Addr: "localhost:6379",
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for Preset with no Redis and InMemory=false (ErrPresetRequiresRedis)
One remaining constructor edge case is when `Preset` is set, `InMemory` is false, and no Redis address is provided. In that case `Create` should return `ErrPresetRequiresRedis`. Please add a test (e.g. `TestPresetRequiresRedisError`) that calls `Create(&AhoCorasickArgs{Name: "x", Preset: PresetBalanced})` with no `Addr` and asserts that `ErrPresetRequiresRedis` is returned.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| func TestPresetRedisSuggestError(t *testing.T) { | ||
| ac := newTestPresetRedis(t, PresetBalanced) | ||
| _, err := ac.Suggest("he") | ||
| if err != ErrSuggestRequiresRedis { | ||
| t.Errorf("expected ErrSuggestRequiresRedis, got %v", err) | ||
| } | ||
| } | ||
|
|
||
| func TestPresetRedisV1Error(t *testing.T) { |
There was a problem hiding this comment.
suggestion (testing): Also cover SuggestIndex error behavior in preset-Redis mode
Please add a similar test for SuggestIndex (e.g. TestPresetRedisSuggestIndexError) that asserts it also returns ErrSuggestRequiresRedis in preset-Redis mode, so both methods’ error contracts are covered and protected against regressions.
| func TestPresetRedisSuggestError(t *testing.T) { | |
| ac := newTestPresetRedis(t, PresetBalanced) | |
| _, err := ac.Suggest("he") | |
| if err != ErrSuggestRequiresRedis { | |
| t.Errorf("expected ErrSuggestRequiresRedis, got %v", err) | |
| } | |
| } | |
| func TestPresetRedisV1Error(t *testing.T) { | |
| func TestPresetRedisSuggestError(t *testing.T) { | |
| ac := newTestPresetRedis(t, PresetBalanced) | |
| _, err := ac.Suggest("he") | |
| if err != ErrSuggestRequiresRedis { | |
| t.Errorf("expected ErrSuggestRequiresRedis, got %v", err) | |
| } | |
| } | |
| func TestPresetRedisSuggestIndexError(t *testing.T) { | |
| ac := newTestPresetRedis(t, PresetBalanced) | |
| _, err := ac.SuggestIndex("he") | |
| if err != ErrSuggestRequiresRedis { | |
| t.Errorf("expected ErrSuggestRequiresRedis, got %v", err) | |
| } | |
| } | |
| func TestPresetRedisV1Error(t *testing.T) { |
| ac := createTestInMemory(t, preset) | ||
| ac.Close() |
There was a problem hiding this comment.
suggestion (testing): Avoid closing in-memory instances at the start of tests or explicitly assert post-close behavior
Several in-memory tests (e.g., TestInMemoryAdd, TestInMemoryRemove, TestInMemoryFind) call createTestInMemory and then immediately ac.Close() before using ac. This makes the tests depend on undocumented post-close behavior and could let them keep passing even if Close later starts returning errors or invalidating the instance. Consider either using t.Cleanup(ac.Close) so the instance stays open during the test, or adding explicit assertions that operations after Close are still valid and error-free.
Suggested implementation:
func TestInMemoryAdd(t *testing.T) {
for _, preset := range allPresets() {
t.Run(preset.String(), func(t *testing.T) {
ac := createTestInMemory(t, preset)
t.Cleanup(ac.Close)
added, err := ac.Add("hello")The same pattern should be applied to other in-memory tests that currently call ac.Close() immediately after createTestInMemory (e.g., TestInMemoryRemove, TestInMemoryFind, and any others with this pattern). For each such test:
- Replace
ac.Close()directly after initialization witht.Cleanup(ac.Close)so the instance remains valid during the test. - If you intentionally want to test post-close behavior, keep
ac.Close()but then add explicit assertions about the expected behavior of subsequent operations after close (errors or success), rather than relying on implicit behavior.
| func TestInMemorySuggestError(t *testing.T) { | ||
| ac := createTestInMemory(t, PresetBalanced) | ||
| ac.Close() | ||
| _, err := ac.Suggest("he") | ||
| if err != ErrSuggestRequiresRedis { | ||
| t.Errorf("expected ErrSuggestRequiresRedis, got %v", err) | ||
| } | ||
| } | ||
|
|
||
| func TestInMemoryWithRedisConfigError(t *testing.T) { |
There was a problem hiding this comment.
suggestion (testing): Add a test for SuggestIndex error in in-memory mode
There’s no equivalent coverage for SuggestIndex. Please add a TestInMemorySuggestIndexError (mirroring the preset-Redis tests) that asserts SuggestIndex returns ErrSuggestRequiresRedis in in-memory mode, so both APIs are consistently guarded against this configuration and regressions are less likely.
| func TestInMemorySuggestError(t *testing.T) { | |
| ac := createTestInMemory(t, PresetBalanced) | |
| ac.Close() | |
| _, err := ac.Suggest("he") | |
| if err != ErrSuggestRequiresRedis { | |
| t.Errorf("expected ErrSuggestRequiresRedis, got %v", err) | |
| } | |
| } | |
| func TestInMemoryWithRedisConfigError(t *testing.T) { | |
| func TestInMemorySuggestError(t *testing.T) { | |
| ac := createTestInMemory(t, PresetBalanced) | |
| ac.Close() | |
| _, err := ac.Suggest("he") | |
| if err != ErrSuggestRequiresRedis { | |
| t.Errorf("expected ErrSuggestRequiresRedis, got %v", err) | |
| } | |
| } | |
| func TestInMemorySuggestIndexError(t *testing.T) { | |
| ac := createTestInMemory(t, PresetBalanced) | |
| ac.Close() | |
| _, err := ac.SuggestIndex("he") | |
| if err != ErrSuggestRequiresRedis { | |
| t.Errorf("expected ErrSuggestRequiresRedis, got %v", err) | |
| } | |
| } | |
| func TestInMemoryWithRedisConfigError(t *testing.T) { |
| } | ||
| } | ||
|
|
||
| func TestPresetRedisV1Error(t *testing.T) { |
There was a problem hiding this comment.
suggestion (testing): Add a test for Preset with no Redis and InMemory=false (ErrPresetRequiresRedis)
One remaining constructor edge case is when Preset is set, InMemory is false, and no Redis address is provided. In that case Create should return ErrPresetRequiresRedis. Please add a test (e.g. TestPresetRequiresRedisError) that calls Create(&AhoCorasickArgs{Name: "x", Preset: PresetBalanced}) with no Addr and asserts that ErrPresetRequiresRedis is returned.
- Enforce InMemory invariants: reject SchemaVersion and EnableCache when InMemory=true (matches documented contract) - Update ErrInMemoryWithRedisConfig message to include SchemaVersion and EnableCache - Update ErrPresetRequiresRedis message to reflect actual constraint - Surface reload errors in ensureValid instead of swallowing them - Replace ac.Close() with t.Cleanup in in-memory tests so instances remain valid during test execution - Add TestPresetRedisSuggestIndexError for SuggestIndex error coverage - Add TestInMemorySuggestIndexError for SuggestIndex error coverage - Add TestPresetRequiresRedisError for constructor edge case - Expand TestInMemoryWithRedisConfigError with sub-tests for SchemaVersion and EnableCache
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 4 issues, and left some high level feedback:
- In
TestInMemoryCaseSensitiveyou callac.Close()and then continue to useac(callingAddandFind), which violates the usual expectation that a closed handle is no longer usable and could mask bugs in future Close semantics. AhoCorasick.Debugbecomes a no-op for the new in-memory and preset-Redis modes, which might surprise callers who relied on debug output; consider either documenting this explicitly or adding a basic debug implementation for the new backends.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `TestInMemoryCaseSensitive` you call `ac.Close()` and then continue to use `ac` (calling `Add` and `Find`), which violates the usual expectation that a closed handle is no longer usable and could mask bugs in future Close semantics.
- `AhoCorasick.Debug` becomes a no-op for the new in-memory and preset-Redis modes, which might surprise callers who relied on debug output; consider either documenting this explicitly or adding a basic debug implementation for the new backends.
## Individual Comments
### Comment 1
<location path="pkg/acor/redis_backed.go" line_range="143-152" />
<code_context>
+func (ac *redisBackedAC) reloadFromRedis(ctx context.Context) error {
</code_context>
<issue_to_address>
**suggestion:** Deduplicate the reload-from-Redis logic to avoid divergence between locked and unlocked paths.
`reloadFromRedis` and `reloadFromRedisLocked` are almost identical except for locking, which risks future changes (e.g., new trie state fields) being applied to only one path. Please extract the common `HGETALL` + unmarshal logic into a shared helper that returns the parsed data, and let each function focus only on locking and assigning the results. This centralizes correctness-sensitive logic and simplifies maintenance.
Suggested implementation:
```golang
func (ac *redisBackedAC) loadTrieDataFromRedis(ctx context.Context) (keywords []string, suffixes []string, version int64, err error) {
trieData, err := ac.storage.HGetAll(ctx, trieKey(ac.name))
if err != nil {
return nil, nil, 0, fmt.Errorf("HGETALL %s: %w", trieKey(ac.name), err)
}
if data, ok := trieData["keywords"]; ok {
if err := json.Unmarshal([]byte(data), &keywords); err != nil {
return nil, nil, 0, fmt.Errorf("unmarshal keywords: %w", err)
}
}
if data, ok := trieData["suffixes"]; ok {
if err := json.Unmarshal([]byte(data), &suffixes); err != nil {
return nil, nil, 0, fmt.Errorf("unmarshal suffixes: %w", err)
}
}
if data, ok := trieData["version"]; ok {
v, errParse := strconv.ParseInt(data, 10, 64)
if errParse != nil {
return nil, nil, 0, fmt.Errorf("parse version: %w", errParse)
}
version = v
}
return keywords, suffixes, version, nil
}
func (ac *redisBackedAC) reloadFromRedis(ctx context.Context) error {
```
```golang
func (ac *redisBackedAC) reloadFromRedis(ctx context.Context) error {
keywords, suffixes, version, err := ac.loadTrieDataFromRedis(ctx)
if err != nil {
return err
}
// NOTE: the assignment to ac's trie state should remain here,
// reusing the existing logic but now using keywords, suffixes, and version.
```
1. In the same file, update `reloadFromRedisLocked` to call `ac.loadTrieDataFromRedis(ctx)` and remove its duplicated `HGetAll`/`json.Unmarshal`/version-parsing code. It should:
- Acquire the lock as it currently does.
- Call `loadTrieDataFromRedis`.
- On success, assign `keywords`, `suffixes`, `version` to the locked trie state.
2. Anywhere in `reloadFromRedis` and `reloadFromRedisLocked` that currently references `trieData["suffixes"]`, `trieData["version"]`, or does separate JSON unmarshalling/parsing should be replaced to use the `suffixes` and `version` returned from `loadTrieDataFromRedis`.
3. Ensure the file imports `strconv` if it is not already imported:
- Add `strconv` to the existing import block: `import ( ... "strconv" ... )`.
4. If the trie state struct has additional fields that were previously deserialized in both functions (e.g., new fields added in future), move that deserialization into `loadTrieDataFromRedis` so both paths stay in sync.
</issue_to_address>
### Comment 2
<location path="pkg/acor/engine_test.go" line_range="199-207" />
<code_context>
+ }
+}
+
+func TestInMemoryCaseSensitive(t *testing.T) {
+ for _, preset := range allPresets() {
+ t.Run(preset.String(), func(t *testing.T) {
+ ac, _ := Create(&AhoCorasickArgs{
+ InMemory: true,
+ Name: "test",
+ Preset: preset,
+ })
+ ac.Close()
+ ac.Add("Hello")
+ if matches, _ := ac.Find("say HELLO world"); len(matches) == 0 {
</code_context>
<issue_to_address>
**issue (testing):** Test uses an AhoCorasick instance after it has been closed, which makes the assertions unreliable
In `TestInMemoryCaseSensitive`, `ac` is closed before `ac.Add("Hello")` and `ac.Find(...)`, so the test is exercising undefined behavior rather than case sensitivity. Please either delay `ac.Close()` until after the assertions, or create a new instance for the case-insensitive path and close it via `t.Cleanup` so the test reliably verifies case handling on a valid automaton.
</issue_to_address>
### Comment 3
<location path="pkg/acor/benchmark_redis_backed_test.go" line_range="48" />
<code_context>
+ }
+}
+
+func BenchmarkInMemoryAdd(b *testing.B) {
+ for _, preset := range allPresets() {
+ b.Run(preset.String(), func(b *testing.B) {
+ b.ResetTimer()
+ for i := 0; i < b.N; i++ {
</code_context>
<issue_to_address>
**issue (performance):** BenchmarkInMemoryAdd creates instances inside the benchmark loop and registers b.Cleanup each iteration, which distorts results
This pattern both inflates `b.Cleanup` registrations and measures construction/teardown rather than steady-state `Add` performance. Instead, initialize the `AhoCorasick` instance once per sub-benchmark before `b.ResetTimer`, register a single cleanup, and only loop over `ac.Add` inside the timed section, e.g.:
```go
for _, preset := range allPresets() {
b.Run(preset.String(), func(b *testing.B) {
ac := createTestInMemory(b, preset)
b.Cleanup(func() { _ = ac.Close() })
b.ResetTimer()
for i := 0; i < b.N; i++ {
for j := 0; j < 100; j++ {
ac.Add(fmt.Sprintf("keyword%d", j))
}
}
})
}
```
</issue_to_address>
### Comment 4
<location path="docs/content/guides/in-memory-engine.md" line_range="96-97" />
<code_context>
+ac.Remove("keyword")
+
+// Find
+matches := ac.Find("text") // []string
+positions := ac.FindIndex("text") // map[string][]int
+
+// Stats
</code_context>
<issue_to_address>
**issue:** Update `Find` and `FindIndex` examples to include the error return for consistency with the actual API.
Elsewhere in the docs (e.g., Quick Start), these functions are shown as returning `(result, error)` (e.g., `matches, _ := ac.Find(...)`), but here they appear to return a single value, implying a different signature. Please update these examples to include the error return (e.g., `matches, err := ac.Find("text")`, `positions, err := ac.FindIndex("text")`) to align with the real API and maintain consistency across the documentation.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| func BenchmarkEngineFind(b *testing.B) { | ||
| keywords := []string{"he", "she", "his", "hers", "hello", "world", "benchmark"} | ||
| for _, preset := range allPresets() { | ||
| b.Run(preset.String(), func(b *testing.B) { |
There was a problem hiding this comment.
issue (performance): BenchmarkInMemoryAdd creates instances inside the benchmark loop and registers b.Cleanup each iteration, which distorts results
This pattern both inflates b.Cleanup registrations and measures construction/teardown rather than steady-state Add performance. Instead, initialize the AhoCorasick instance once per sub-benchmark before b.ResetTimer, register a single cleanup, and only loop over ac.Add inside the timed section, e.g.:
for _, preset := range allPresets() {
b.Run(preset.String(), func(b *testing.B) {
ac := createTestInMemory(b, preset)
b.Cleanup(func() { _ = ac.Close() })
b.ResetTimer()
for i := 0; i < b.N; i++ {
for j := 0; j < 100; j++ {
ac.Add(fmt.Sprintf("keyword%d", j))
}
}
})
}- Deduplicate reload-from-Redis logic: extract loadTrieDataFromRedis and applyReload helpers, remove reloadFromRedisLocked - Fix TestInMemoryCaseSensitive to use t.Cleanup instead of Close before assertions, preventing use-after-close - Fix BenchmarkInMemoryAdd to create instance once outside the benchmark loop instead of per-iteration - Update in-memory engine docs to show error returns on Find/FindIndex - Document that Debug() is a no-op for in-memory and preset-Redis modes
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The new in-memory and preset-Redis paths (
createInMemory,createPresetRedis,redisBackedAC) hard-codecontext.Background()for long-lived operations, which ignores caller cancellation; consider threading the caller’s context (or a derived one) through so shutdown and reload behavior can respect upstream cancellation/timeouts. - Keyword normalization (trim + optional lowercasing) is duplicated across
inMemoryOpsandredisBackedACmethods; consider extracting a shared helper to reduce drift and keep behavior consistent across backends.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new in-memory and preset-Redis paths (`createInMemory`, `createPresetRedis`, `redisBackedAC`) hard-code `context.Background()` for long-lived operations, which ignores caller cancellation; consider threading the caller’s context (or a derived one) through so shutdown and reload behavior can respect upstream cancellation/timeouts.
- Keyword normalization (trim + optional lowercasing) is duplicated across `inMemoryOps` and `redisBackedAC` methods; consider extracting a shared helper to reduce drift and keep behavior consistent across backends.
## Individual Comments
### Comment 1
<location path="pkg/acor/engine_flat.go" line_range="241-250" />
<code_context>
+ return len(seen)
+}
+
+func sortRunes(runes []rune) {
+ for i := 1; i < len(runes); i++ {
+ key := runes[i]
+ j := i - 1
+ for j >= 0 && runes[j] > key {
+ runes[j+1] = runes[j]
+ j--
+ }
+ runes[j+1] = key
+ }
+}
+
+func sortRunesFromPairs(pairs []struct {
+ ch rune
+ child int
+}) {
+ for i := 1; i < len(pairs); i++ {
+ key := pairs[i]
+ j := i - 1
+ for j >= 0 && pairs[j].ch > key.ch {
+ pairs[j+1] = pairs[j]
+ j--
+ }
+ pairs[j+1] = key
+ }
+}
+
+func sortStrings(s []string) {
+ for i := 1; i < len(s); i++ {
+ key := s[i]
</code_context>
<issue_to_address>
**suggestion (performance):** Custom insertion-sort implementations can be slower and harder to maintain than using the standard library sort package.
sortRunes, sortRunesFromPairs, and sortStrings all manually implement insertion sort, which is O(n²) and can be slow for larger inputs. The standard library’s sort.Slice/sort.Strings use more efficient, well-optimized algorithms and are easier to read. Unless you have benchmarks showing these custom sorts are faster for your typical input sizes, consider replacing them with the standard sort package.
Suggested implementation:
```golang
func sortRunes(runes []rune) {
sort.Slice(runes, func(i, j int) bool {
return runes[i] < runes[j]
})
}
```
```golang
func sortRunesFromPairs(pairs []struct {
ch rune
child int
}) {
sort.Slice(pairs, func(i, j int) bool {
return pairs[i].ch < pairs[j].ch
})
}
```
```golang
func sortStrings(s []string) {
sort.Strings(s)
}
```
You’ll also need to ensure the file imports the `sort` package. If you have a standard Go import block like:
```go
import (
"some/package"
)
```
add `"sort"` to it, for example:
```go
import (
"sort"
"some/package"
)
```
Adjust the exact location/order to match your existing import style.
</issue_to_address>
### Comment 2
<location path="pkg/acor/engine_test.go" line_range="199-202" />
<code_context>
+ }
+}
+
+func TestInMemoryCaseSensitive(t *testing.T) {
+ for _, preset := range allPresets() {
+ t.Run(preset.String(), func(t *testing.T) {
+ ac, _ := Create(&AhoCorasickArgs{
+ InMemory: true,
+ Name: "test",
</code_context>
<issue_to_address>
**suggestion (testing):** Check and assert `Create` errors in `TestInMemoryCaseSensitive` instead of ignoring them
In `TestInMemoryCaseSensitive`, both `Create` calls discard the error (`ac, _ := ...`, `ac2, _ := ...`). If construction fails, the test may panic later with an unclear cause. Please capture and assert `err` is nil for both instances and fail the test immediately otherwise, consistent with the other tests.
Suggested implementation:
```golang
ac, err := Create(&AhoCorasickArgs{
InMemory: true,
Name: "test",
Preset: preset,
})
if err != nil {
t.Fatalf("failed to create in-memory case-sensitive automaton: %v", err)
}
t.Cleanup(func() { _ = ac.Close() })
```
```golang
ac2, err := Create(&AhoCorasickArgs{
```
Right after the full `AhoCorasickArgs` literal for the `ac2` creation (i.e., immediately after the closing `})` of `Create(&AhoCorasickArgs{...})`), add an error check analogous to the first one:
```go
if err != nil {
t.Fatalf("failed to create second in-memory case-sensitive automaton: %v", err)
}
```
This ensures both `Create` calls in `TestInMemoryCaseSensitive` assert `err` is nil before proceeding.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Replace custom insertion sorts with sort.Slice/sort.Strings in engine_flat.go for better performance and readability - Extract normalizeKeyword and normalizeText helpers in modes.go to deduplicate keyword normalization across inMemoryOps and redisBackedAC - Document context.Background() usage in createInMemory and createPresetRedis (Create API does not accept caller context) - Assert Create errors in TestInMemoryCaseSensitive instead of discarding them
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The Redis-backed preset path (
redisBackedAC) has separate snapshot loading (loadTrieDataFromRedis) andreadTrieSnapshot/marshalTrieArgslogic, which now diverges from the existing V2 code; consider consolidating these into shared helpers so schema evolution only has one code path to maintain. - In
Create, several different invalid configurations for in-memory mode currently map toErrInMemoryWithRedisConfig(includingSchemaVersionandEnableCache), which might be confusing; consider either splitting these into more precise errors or adjusting the error name/message to reflect all the conditions it covers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Redis-backed preset path (`redisBackedAC`) has separate snapshot loading (`loadTrieDataFromRedis`) and `readTrieSnapshot`/`marshalTrieArgs` logic, which now diverges from the existing V2 code; consider consolidating these into shared helpers so schema evolution only has one code path to maintain.
- In `Create`, several different invalid configurations for in-memory mode currently map to `ErrInMemoryWithRedisConfig` (including `SchemaVersion` and `EnableCache`), which might be confusing; consider either splitting these into more precise errors or adjusting the error name/message to reflect all the conditions it covers.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Split ErrInMemoryWithRedisConfig into three precise errors: ErrInMemoryWithRedisConfig, ErrInMemoryWithSchemaVersion, ErrInMemoryWithCache - Consolidate trie snapshot loading: rename loadTrieDataFromRedis to loadTrieSnapshot returning trieSnapshot, and remove duplicate readTrieSnapshot from redis_backed_ops.go. Both redisBackedAC and v2Operations now deserialize from the same pattern.
Extract readTrieSnapshot(ctx, storage, name) as a package-level function replacing duplicate implementations in redisBackedAC and v2Operations. Removes ~30 lines of identical deserialization code.
Remove the pure in-memory mode (InMemory bool field) since preset mode with Redis persistence supersedes it. Preset without Redis now returns ErrPresetRequiresRedis. Also rename RedisBacked-prefixed test/benchmark functions to Preset-prefixed for consistency.
Summary
RedisBackedAC) that uses Redis as source of truth and a local preset-optimized automaton for fast readsTest plan
go test ./pkg/acor/... -race -count=1and verify all tests passgo test ./pkg/acor/... -bench=. -benchmemand verify benchmarks produce results