Horizontal Scaling

Scale PATAS to 10M+ messages/day using multiple instances.

Current Limitation

Distributed locks prevent concurrent processing of the same dataset. You cannot simply run 10 instances on the same dataset - they will block each other.

Solution: Shard data across instances:

Split data into N shards (by message_id or timestamp)
Each instance processes its own shard with unique lock key
Merge results after processing

Data Sharding

Approach 1: By message_id

# Instance 1: message_id 1-1,000,000
patas mine-patterns --days=7 --shard-id=1 --total-shards=10

# Instance 2: message_id 1,000,001-2,000,000
patas mine-patterns --days=7 --shard-id=2 --total-shards=10

Lock key: pattern_mining:7:10:shard:1 (unique per shard)

Approach 2: By timestamp

Split data into time windows:

Instance 1: days 1-2
Instance 2: days 3-4
etc.

Requirements for 10M Messages/Day

Option 1: Incremental Mining (Recommended)

Process only new messages (~1.4M/day):

CPU: 4-8 vCPU
RAM: 16-32 GB
Disk: 200 GB SSD
PostgreSQL: 32-64 GB RAM
Redis: 4-8 GB RAM
Time: ~10 hours

Option 2: 10 Instances (Parallel)

Per instance:

CPU: 4-8 vCPU
RAM: 16-32 GB

Shared infrastructure:

PostgreSQL: 64-128 GB RAM
Redis: 8-16 GB RAM

Time: ~7 hours (parallel)

Quality Impact

Sharding maintains quality if results are properly merged:

Deduplicate patterns across shards
Use global metrics for quality tiers
Set low min_spam_count to catch rare patterns

Roadmap

P1 (after successful pilot):

Automatic sharding in CLI/API
Result merging and deduplication
Database partitioning

P2 (for 100M+ messages):

Sharded evaluation
Read replicas for evaluation queries

Horizontal Scaling

Horizontal Scaling

Current Limitation

Data Sharding

Approach 1: By message_id

Approach 2: By timestamp

Requirements for 10M Messages/Day

Option 1: Incremental Mining (Recommended)

Option 2: 10 Instances (Parallel)

Quality Impact

Roadmap

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally