Skip to content
/ SkillNav Public

Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents

Notifications You must be signed in to change notification settings

HLR/SkillNav

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SKillNav

Official implementation of the paper:

Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents

Paper and Appendix

1. Matterport3D Simulator Setup

We use the latest version of the Matterport3D Simulator (not v0.1).

We use python==3.9 and

  • Install system packages (EGL/OSMesa/OpenGL)
sudo apt-get update
# Core dependencies
sudo apt-get install -y libjsoncpp-dev libepoxy-dev libglm-dev libopencv-dev \
                        libegl1 libegl1-mesa-dev libgl1-mesa-dev libtiff-dev
# For OSMesa / GLEW builds
sudo apt-get install -y libosmesa6 libosmesa6-dev libglew-dev
  • Conda packages
conda install -c conda-forge cmake gdal libtiff -y
# If you hit missing C++ symbols at runtime, this newer libstdc++ helps:
conda install -c conda-forge libstdcxx-ng -y
  • Build the simulator
# Replace [your_python_bin_path] with the absolute path to Python in the 'vlnde' env
# Example: /home/$USER/miniconda3/envs/vlnde/bin/python
cd Matterport3DSimulator
mkdir -p build && cd build
cmake -DEGL_RENDERING=ON -DPYTHON_EXECUTABLE=[your_python_bin_path] ..
make -j
  • Add to your environment:
export PYTHONPATH=$(pwd):$PYTHONPATH

2. Construct Data

R2R skill-specific data

Download from: Google Drive Link

Place the extracted files into:

ScaleVLN/datasets/R2R/annotations/

3. Baselines

Both baselines are included in this repo.

**4. **Training & Testing

  • Train
cd ./ScaleVLN/map_nav_src
bash ./scripts/train_r2r_b16_mix_vertical.sh
  • Test
cd ./ScaleVLN/map_nav_src
bash ./scripts/test_r2r_b16_moe-top1.sh

5. Citation

@misc{ma2025breakingbuildingupmixture,
      title={Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents}, 
      author={Tianyi Ma and Yue Zhang and Zehao Wang and Parisa Kordjamshidi},
      year={2025},
      eprint={2508.07642},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.07642}, 
}

About

Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published