Your Phase 5 training revealed that agents couldn't learn because the world (300×300) was too large for them to discover resources during their exploration phase. I've implemented three critical fixes:
- World size reduced: 300×300 → 150×150
- Resource counts adjusted: Scaled to maintain density
- Spawn location optimized: Agents now spawn near resource clusters
cd soliter-develop
python scripts/train_soliter.py --cycles 50 --output-dir experiments/phase5_quick_testWhat to look for in the output:
- First consumption should appear before cycle 20
- "Consumed" column should show numbers (not just dots)
- By cycle 50, you should see 10+ consumption events
- 400,000 ticks, 200 cycles
- 7 total consumptions (0.00175% rate)
- Agent explored only 23% of world
- No learning occurred
- 100,000 ticks, 50 cycles
- 50-150 consumptions (>10% rate)
- Agent explores >50% of world
- Actual learning visible
configs/default.yaml- World size and resource countsscripts/train_soliter.py- Spawn and respawn logic
No other changes needed - all your Phase 4 work (memory consolidation, EWC, drive system) is intact.
After running the 50-cycle test, check:
- Console shows consumption events (food/water/heat)
- Satisfaction values increase over time
- Agent position varies widely (not stuck in one spot)
- Fewer deaths from the same cause repeatedly
- Buffer size grows steadily (experiences being collected)
Run longer experiment:
python scripts/train_soliter.py --cycles 200 --output-dir experiments/phase5_validationThen analyze with:
python scripts/plot_training.py experiments/phase5_validation/training_*.jsonLow consumption (<5 events):
- Try even smaller world (100×100)
- Increase initial action_std to 1.0
Resource discovery but no consumption:
- Check drive_system activation in logs
- Verify gradient sensor responses
Consumption but no learning trend:
- Check if rewards are being collected
- Verify PPO updates are occurring
The problem wasn't your code - it was a scale mismatch:
Old Configuration:
World: 300×300 (90,000 units²)
Agent exploration: ~60 unit radius (11,300 units²)
Coverage: 12.5%
Result: Most resources unreachable → no learning signal
New Configuration:
World: 150×150 (22,500 units²)
Agent exploration: ~60 unit radius (11,300 units²)
Coverage: 50%
Result: Most resources discoverable → learning possible
Your project aims to demonstrate consciousness prerequisites through continual learning. But you can't test consciousness emergence without functional learning first. These fixes ensure:
- Agents can learn - Resources are discoverable
- Drive system functions - Internal rewards get satisfied
- Memory consolidates - There's something to remember
- Context integrates - Experiences across cycles connect
Without these fixes, you were testing memory systems in a "sensory deprivation" environment. Now the agent can actually engage with its world.
If you encounter issues, the key diagnostic is consumption rate:
- <1% = Still broken (try smaller world or higher action_std)
- 1-10% = Marginal (might work with more cycles)
-
10% = Working! (proceed to long-horizon testing)
Good luck! The mechanical systems are solid - they just needed a world they could actually explore.