Autonomous agent system for a competitive restaurant simulation game. Each turn the system reasons about zone placement, menu composition, ingredient bidding, pricing, and diplomatic manipulation.
A gpt-oss-120b-powered StrategyAgent acts as the turn brain. It produces a structured TurnStrategy JSON before any skill runs, setting parameters for every downstream module: which zone to target, how aggressively to bid, what price multiplier to apply, and which diplomatic posture to take.
The market is segmented into strategic zones (e.g. DIVERSIFIED, PREMIUM, BUDGET). An algorithmic heuristic picks the zone; the StrategyAgent can override it when confidence ≥ 0.5. Each zone maps to a dedicated datapizza Agent with a tailored system prompt.
Integer Linear Programming solves menu composition and ingredient bid allocation under budget and prep-time constraints, guided by the StrategyAgent's bid_aggressiveness and menu_* parameters.
Prices are computed per recipe based on competitor observations, then multiplied by the strategy agent's price_adjustment_factor. Undercutting mode is toggled by the agent.
A DagPipeline DAG runs every turn:
DataCollector → StateBuilder → FeatureExtractor → StrategyInferrer
→ Embedding → TrajectoryPredictor → ClusterClassifier → BriefingGenerator
- StrategyInferrer classifies each competitor into archetypes:
PREMIUM_MONOPOLIST,BUDGET_OPPORTUNIST,AGGRESSIVE_HOARDER,MARKET_ARBITRAGEUR,REACTIVE_CHASER,DECLINING_PASSIVE,DORMANT. - TrajectoryPredictor forecasts competitor behavior over the next few turns.
- VectorStore holds competitor embeddings for similarity search and clustering.
Thompson Sampling (Beta priors, per-competitor) selects from 7 manipulation arms:
truthful_warning · inflated_intel · manufactured_scarcity · ingredient_misdirect · alliance_offer · price_anchoring · silence
Rewards are measured by observable competitor state changes (bid shifts, menu price changes) tracked across turns.
A generator (gpt-oss-120b) drafts diplomatic messages in Italian; a discriminator (gpt-oss-20b) scores believability. The loop refines up to 3 iterations until score > 0.7. Messages are grounded in real tracker intel to maximize credibility.
Incoming competitor messages are filtered and cross-checked against observed game state to prevent our own agent from acting on manipulated intel.
