Authors: Daniel Namaki, NiccolΓ² Settimelli
Course: Symbolic and Evolutionary Artificial Intelligence
Academic Year: 2024/2025 β University of Pisa
This project investigates non-standard reinforcement learning (RL) methods that leverage lexicographic reward prioritization on the classic LunarLander-v2 environment. Instead of a single scalar reward, our agents optimize a vector reward with strict priorities:
- β Survival (avoid crashing)
- π― Landing quality (upright, centered touchdown)
- β½ Fuel efficiency
We implement and compare:
- Potential-Based Survival Shaping
- Cone-Aware Survival Shaping
- Curriculum Learning with Prioritized Replay
- Standard DQN Baselines
2025_SEAI_F01/
βββ models/ # Saved model checkpoints
βββ networks/ # LexQNetwork & standard Q-network code
βββ v_cone/ # Cone-aware shaping agent
βββ v_potential_shaping/ # Potential-based shaping agent
βββ v_prioritized_curriculum_learning/ # Curriculum + prioritized replay agent
βββ v_standard/ # Standard & prioritized DQN agents
βββ requirements.txt # Python dependencies
βββ doc_seai_f01.pdf # Full project report
βββ README.md # This overview