Need to implement quality scoring and historical learning system to prioritize Reddit engagement candidates based on success probability rather than FIFO order.
Implement as 4-phase system with feature flags for safe rollout:
- Phase 1: Quality scoring with 7 factors
- Phase 2: Performance data collection
- Phase 3: Historical learning
- Phase 4: Engagement tracking
Decision: Add two new workflow nodes between select_by_ratio and filter_candidates:
score_candidates_node- Calculate scoressort_by_score_node- Sort by score with exploration logic
Rationale:
- Scores candidates before filtering removes any
- Works with full candidate set
- Minimal changes to existing workflow
- Can be disabled via feature flag without breaking workflow
Alternative Considered: Modify select_candidate_node directly
Rejected Because: Would require re-sorting on every iteration (inefficient)
Decision: Use dataclasses.replace() to attach quality_score field to candidates
Rationale:
- Maintains immutability of dataclass pattern
- No need to modify underlying PRAW objects
- Clear separation of concerns
Alternative Considered: Make dataclasses mutable Rejected Because: Would violate existing architecture patterns
Decision: Implement 15% exploration rate - randomly select from top N candidates
Rationale:
- Prevents detectable patterns that could lead to shadowban
- Maintains behavioral diversity
- Allows system to discover new opportunities
- Configurable via
SCORE_EXPLORATION_RATEenv var
Alternative Considered: Always pick top-scored candidate Rejected Because: Creates predictable patterns, shadowban risk
Decision: Create separate performance_history table instead of extending draft_queue
Rationale:
- Separation of concerns (drafts vs performance metrics)
- Allows multiple metric samples over time
- Easier to query for aggregations
- Can track metrics independently of draft lifecycle
Alternative Considered: Extend draft_queue with all performance fields
Rejected Because: Would bloat draft_queue table, mixing concerns
Decision: Use time-based decay weights (1.0 → 0.7 → 0.4 → 0.2) for 7/30/90+ day buckets
Rationale:
- Recent data more relevant than old data
- Reddit communities evolve over time
- Subreddit rules and culture change
- Prevents overfitting to outdated patterns
Alternative Considered: Equal weighting for all historical data Rejected Because: Would give too much weight to stale patterns
Decision: Require 5 samples before using historical score, else return 0.5 (neutral)
Rationale:
- Prevents premature optimization on insufficient data
- Avoids overfitting to small sample size
- 5 samples gives enough signal without requiring months of data
- Neutral 0.5 score doesn't penalize new subreddits
Alternative Considered: Use historical score from first sample Rejected Because: High variance with small sample size
Decision: Weighted composite with freshness=0.20 (highest), karma=0.10 (lowest)
Rationale:
- Freshness most important (engage while thread is active)
- Historical learning at 0.15 (significant but not dominant)
- All weights configurable via .env for tuning
- Weights normalized in constructor (handles misconfiguration)
Alternative Considered: Equal weights for all factors Rejected Because: Not all factors equally important for success
Decision: Independent flags for each phase: QUALITY_SCORING_ENABLED, LEARNING_ENABLED, ENGAGEMENT_CHECK_ENABLED
Rationale:
- Safe rollout (can enable incrementally)
- Graceful degradation (workflow works with flags off)
- Easy rollback if issues detected
- Can disable learning without disabling scoring
Alternative Considered: Single master flag Rejected Because: All-or-nothing approach too risky
Decision: Cache historical scores for 5 minutes per subreddit
Rationale:
- Avoids repeated DB queries during workflow runs
- Historical scores don't change rapidly
- 5 min TTL balances freshness vs performance
- Low memory footprint (only active subreddits cached)
Alternative Considered: No caching Rejected Because: Would cause performance issues with DB queries
Decision: Stateless services with dependency injection via functools.partial()
Rationale:
- Consistent with existing architecture
- Easy to test (mock dependencies)
- Services composable
- No shared state issues
Alternative Considered: Singleton services with global state Rejected Because: Would make testing difficult, violate existing patterns
- No breaking changes to existing workflow
- All new DB columns nullable or have defaults
- Backward compatible (old code paths remain functional)
- Can deploy with features disabled for validation
✅ Approved - Implementation in progress
During Phase 1 implementation, need to handle errors in scoring gracefully to avoid workflow failures.
Wrap each scoring method in try/except and return 0.5 (neutral) on error. Also wrap overall score_candidate() method.
Rationale:
- Scoring failures shouldn't break the entire workflow
- 0.5 neutral score allows candidate to continue (not rejected, not prioritized)
- Detailed error logging for debugging
- Maintains workflow resilience
Implementation:
try:
score = self._score_upvote_ratio(candidate)
except Exception as e:
logger.warning("upvote_ratio_scoring_failed", error=str(e))
return 0.5- Workflow continues even if scoring fails
- Candidates with failed scoring get neutral priority
- No silent failures (all errors logged)
✅ Implemented in Phase 1
Question signal scoring needs to search for multiple keywords in text, which could be inefficient if done repeatedly.
Compile regex patterns once in init() and reuse for all candidates.
Rationale:
- Regex compilation is expensive
- Keywords are static configuration
- Significant performance improvement for high-volume scoring
- Pattern object is thread-safe
Implementation:
help_keywords = [k.strip() for k in settings.score_help_keywords.split(',')]
self._help_pattern = re.compile(
'|'.join(re.escape(kw) for kw in help_keywords),
re.IGNORECASE
)- Faster scoring (patterns compiled once)
- No performance degradation with scale
- Memory footprint: negligible (2 compiled patterns)
✅ Implemented in Phase 1