Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .golangci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,13 @@ linters:
- linters:
- staticcheck
text: "S1038:"
- linters:
- mnd
path: pkg/acor/engine_
- linters:
- errcheck
- gosec
path: _test\.go$

formatters:
enable:
Expand Down
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,37 @@ Chunk boundaries ensure matches aren't split across chunks:
- `ChunkBoundaryLine`: Split at line breaks
- `ChunkBoundarySentence`: Split at sentence endings

## Redis-Backed Engine with Presets

For distributed deployments that need both Redis persistence and local speed, use the `Preset` field:

```go
ac, err := acor.Create(&acor.AhoCorasickArgs{
Addr: "localhost:6379",
Name: "my-collection",
Preset: acor.PresetBalanced,
CaseSensitive: false,
})
defer ac.Close()

ac.Add(ctx, "hello")
matches, _ := ac.Find(ctx, "hello world") // 0 RTT on hot path
```

Redis is the source of truth; a local preset-optimized automaton handles reads with no Redis I/O on the hot path. Cross-instance invalidation uses Redis Pub/Sub.

## Architecture Presets

| Preset | Engine | Best For | Trade-off |
|--------|--------|----------|-----------|
| `PresetSpeed` | Full DFA + flat array | Real-time packet inspection, latency-critical paths | Higher memory (states x alphabet) |
| `PresetBalanced` | Double-Array Trie + Banded DFA | General-purpose keyword filtering | Balanced speed and memory |
| `PresetMemoryEfficient` | Map-based + Bloom filter | Large-scale domain blocking, millions of patterns | Slower search |
| `PresetUltimate` | SIMD pre-filter + Double-Array + Banded DFA | Production systems needing max throughput | Reasonable memory with highest speed |

## Local Caching

For read-heavy workloads, enable local caching to eliminate Redis round-trips:
For read-heavy workloads with the original `AhoCorasick`, enable local caching to eliminate Redis round-trips:

```go
ac, _ := acor.Create(&acor.AhoCorasickArgs{
Expand Down
8 changes: 8 additions & 0 deletions docs/content/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,14 @@ import (
)
```

## In-Memory and Redis-Backed Engines

ACOR also provides a preset-optimized engine variant:

- **Preset-optimized Redis mode** (`Preset: PresetBalanced`): Redis persistence with a local preset-optimized automaton for 0-RTT reads

Both support four presets: Speed, Balanced, MemoryEfficient, and Ultimate.

## Documentation Sections

- [Getting Started][getting-started-link] - Installation and quick start guide
Expand Down
1 change: 1 addition & 0 deletions docs/content/getting-started/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,5 @@ args := &acor.AhoCorasickArgs{

- [Batch Operations](../guides/batch-operations/) - Optimize bulk operations
- [Parallel Matching](../guides/parallel-matching/) - Process large texts efficiently
- [Redis-Backed Engine](../guides/redis-backed-engine/) - Redis persistence with local speed
- [API Reference](../reference/api/) - Complete API documentation
1 change: 1 addition & 0 deletions docs/content/guides/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Practical guides for using ACOR effectively.

- [Batch Operations](batch-operations/) - Optimize bulk keyword operations
- [Parallel Matching](parallel-matching/) - Process large texts with multiple workers
- [Redis-Backed Engine](redis-backed-engine/) - Redis persistence with local preset-optimized speed

## Navigation

Expand Down
112 changes: 112 additions & 0 deletions docs/content/guides/preset-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
title: "Preset-Optimized Engine"
weight: 3
---

# Preset-Optimized Engine

ACOR provides a Redis-backed Aho-Corasick engine with selectable architecture presets. Created via the unified `Create` API with a `Preset` field. Writes go to Redis atomically (V2 Lua scripts with optimistic locking); reads hit the local engine with no Redis I/O.

## When to Use

- Production deployments requiring Redis persistence
- Distributed systems with multiple instances sharing a keyword collection
- High-throughput text matching with zero read-latency on the hot path
- Applications needing both durability and speed

## Quick Start

```go
package main

import (
"fmt"
"github.com/skyoo2003/acor/pkg/acor"
)

func main() {
ac, _ := acor.Create(&acor.AhoCorasickArgs{
Addr: "localhost:6379",
Name: "my-collection",
Preset: acor.PresetBalanced,
})
defer ac.Close()

ac.Add("he")
ac.Add("her")
ac.Add("him")

matches, _ := ac.Find("he is him")
fmt.Println(matches) // [he him]

positions, _ := ac.FindIndex("he is him")
fmt.Println(positions) // map[he:[0] him:[6]]

info, _ := ac.Info()
fmt.Printf("Keywords: %d, Nodes: %d, Memory: %d bytes\n",
info.Keywords, info.Nodes, info.MemoryBytes)
}
```

## Architecture Presets

Each preset optimizes for a different trade-off between speed, memory, and feature set. The preset is fixed at creation time.

| Preset | Engine | Best For | Trade-off |
|--------|--------|----------|-----------|
| `PresetSpeed` | Full DFA + flat array trie + compact alphabet mapping | Real-time packet inspection, high-speed log scanning, latency-critical paths | Higher memory proportional to states x alphabet size |
| `PresetBalanced` | Double-Array Trie + Banded DFA + output link compression | General-purpose backend keyword filtering, search engines | Balanced speed and memory |
| `PresetMemoryEfficient` | Map-based sparse trie + Bloom filter pre-filtering + standard NFA | Large-scale domain blocking, malware signature matching, millions of patterns | Slower search due to failure link traversal and map lookups |
| `PresetUltimate` | SIMD-aware byte scanning pre-filter + Double-Array Trie + Banded DFA + deferred bit-set output collection | Production systems needing highest throughput | Reasonable memory with highest speed |

### Choosing a Preset

- **Start with `PresetBalanced`** — it provides the best speed-to-memory ratio for most workloads.
- Use `PresetSpeed` when latency is critical and memory is available.
- Use `PresetMemoryEfficient` when you have millions of patterns and memory is constrained.
- Use `PresetUltimate` for production systems that need maximum throughput.

## Case Sensitivity

By default, matching is case-insensitive. Enable case-sensitive matching when needed:

```go
ac, _ := acor.Create(&acor.AhoCorasickArgs{
Addr: "localhost:6379",
Name: "my-collection",
Preset: acor.PresetBalanced,
CaseSensitive: true,
})
defer ac.Close()
```

## API Reference

```go
// Create
ac, err := acor.Create(&acor.AhoCorasickArgs{
Addr: "localhost:6379",
Name: "my-collection",
Preset: acor.PresetBalanced,
})
defer ac.Close()

// Add/Remove — returns 1 if changed, 0 if no-op
ac.Add("keyword")
ac.Remove("keyword")

// Find (0 RTT on hot path — reads from local engine)
matches, _ := ac.Find("text") // ([]string, error)
positions, _ := ac.FindIndex("text") // (map[string][]int, error)

// Stats
info, err := ac.Info() // (*AhoCorasickInfo, error)

// Reset
ac.Flush()
```

## Next Steps

- [Redis-Backed Engine](redis-backed-engine/) - Redis persistence details
- [API Reference](../reference/api/) - Complete API documentation
144 changes: 144 additions & 0 deletions docs/content/guides/redis-backed-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
title: "Redis-Backed Engine"
weight: 4
---

# Redis-Backed Engine

The preset-optimized Redis mode (enabled via `Preset` on `AhoCorasickArgs`) combines Redis persistence with a local preset-optimized automaton. Redis is the source of truth; reads hit the local engine with no Redis I/O on the hot path.

## When to Use

- Distributed deployments across multiple instances
- Need for Redis persistence and cross-instance synchronization
- Want preset-optimized local speed without giving up Redis durability
- Migrating from the original `AhoCorasick` for better read performance

## Architecture

```
Write Path
Instance A ──Add()──▶ Lua Script (optimistic lock) ──▶ Redis
Pub/Sub invalidate ◀────────┘
Instance B ◀────────────────┘
└─ ensureValid() ──▶ reload from Redis ──▶ rebuild local engine

Read Path
Instance A ──Find()──▶ local engine (0 RTT)
```

- **Writes**: V2 Lua scripts with optimistic locking (up to 3 retries with backoff)
- **Reads**: Local preset-optimized automaton — no Redis I/O
- **Invalidation**: Redis Pub/Sub notifies all instances on mutation
- **Degraded mode**: If reload fails, the last-good engine continues serving reads

## Quick Start

```go
package main

import (
"fmt"
"github.com/skyoo2003/acor/pkg/acor"
)

func main() {
ac, err := acor.Create(&acor.AhoCorasickArgs{
Addr: "localhost:6379",
Name: "my-collection",
Preset: acor.PresetBalanced,
CaseSensitive: false,
})
if err != nil {
panic(err)
}
defer ac.Close()

added, err := ac.Add("hello")
if err != nil {
panic(err)
}
fmt.Printf("Added: %d\n", added)

matches, err := ac.Find("hello world")
if err != nil {
panic(err)
}
fmt.Println(matches) // [hello]
}
```

## AhoCorasickArgs (Preset field)

The `AhoCorasickArgs` struct includes a `Preset` field for selecting the local engine architecture:

```go
type AhoCorasickArgs struct {
// ... Addr, Addrs, RingAddrs, Password, DB, Name ...
Preset Preset // Architecture preset (default: PresetBalanced)
CaseSensitive bool // Enable case-sensitive matching (default: false)
// ... other fields ...
}
```

All standard Redis topologies are supported (Standalone, Sentinel, Cluster, Ring) via the connection fields on `AhoCorasickArgs`.

## Preset Selection

The same [architecture presets](preset-engine/#architecture-presets) are available:

| Preset | Use Case |
|--------|----------|
| `PresetSpeed` | Latency-critical, memory available |
| `PresetBalanced` | Default — best speed-to-memory ratio |
| `PresetMemoryEfficient` | Millions of patterns, memory constrained |
| `PresetUltimate` | Maximum throughput production systems |

## API Reference

```go
// Create
ac, err := acor.Create(&acor.AhoCorasickArgs{
Addr: "localhost:6379",
Name: "my-collection",
Preset: acor.PresetBalanced,
})

// Add/Remove
added, err := ac.Add("keyword") // (int, error)
removed, err := ac.Remove("keyword") // (int, error)

// Find (0 RTT on hot path — reads from local engine)
matches, err := ac.Find("text") // ([]string, error)
positions, err := ac.FindIndex("text") // (map[string][]int, error)

// Stats
info, err := ac.Info() // (*AhoCorasickInfo, error)

// Flush and Close
err := ac.Flush()
err := ac.Close()
```

## Comparison with AhoCorasick

| Feature | `AhoCorasick` (no Preset) | `AhoCorasick` (with Preset) |
|---------|--------------|-----------------|
| Read latency | 3 RTT (V2) or cached | 0 RTT (local engine) |
| Write latency | Lua script | Lua script + optimistic lock |
| Cross-instance sync | Pub/Sub cache invalidation | Pub/Sub engine rebuild |
| Schema | V1 or V2 | V2 only |
| Presets | N/A | Speed, Balanced, MemoryEfficient, Ultimate |
| Suggest/SuggestIndex | Yes | No |
| Batch operations | Yes | No |
| Parallel matching | Yes | No |

Use a `Preset`-optimized `AhoCorasick` when you need the fastest possible reads in a distributed setup and can accept the V2-only constraint.

## Next Steps

- [Preset-Optimized Engine](preset-engine/) - Redis-backed engine with local speed
- [API Reference](../reference/api/) - Complete API documentation
2 changes: 1 addition & 1 deletion docs/content/reference/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Technical reference documentation for ACOR.

## Sections

- [API Reference](api/) - Public API documentation
- [API Reference](api/) - Public API documentation (unified `Create` API with `Preset` options)
- [Schema V1](schema-v1/) - Legacy schema details
- [Schema V2](schema-v2/) - Optimized schema (recommended)

Expand Down
Loading
Loading