Could you please provide some instructions about evaluation? Besides, the model architecture seems to be LLaVA-1.5. Is it VILA?