Fix typo: s3_scales should be ps3_scales in multimodal_encoder/builder.py (AttributeError)

**Description**
When initializing VILA-HD PS3 models for inference, an AttributeError is thrown due to a typo in the code: the attribute s3_scales is referenced, but the correct attribute name (as defined in the PS3VisionEncoder class) is ps3_scales.

**Error Log**
`Traceback (most recent call last):
  File "<vila-infer-executable>", line 6, in <module>
    sys.exit(main())
  ...
  File "<builder.py>", line 75, in build_vision_tower
    config.mm_scale_num = len(vision_tower.vision_tower.vision_model.s3_scales)
  ...
AttributeError: 'PS3VisionEncoder' object has no attribute 's3_scales'. Did you mean: 'ps3_scales'?`


**Root Cause**
The PS3VisionEncoder class explicitly defines the multi-scale configuration attribute as ps3_scales (consistent with the PS3 model naming convention), but the code in builder.py incorrectly uses s3_scales (missing the "p" prefix), leading to a failed attribute lookup.

**Suggested Fix**
In the build_vision_tower function of multimodal_encoder/builder.py (line 75):
Change:
`config.mm_scale_num = len(vision_tower.vision_tower.vision_model.s3_scales)`

To:
`config.mm_scale_num = len(vision_tower.vision_tower.vision_model.ps3_scales)`

**Additional Context**
This typo blocks the initialization of VILA-HD PS3 models during inference.
The PS3VisionEncoder class uses ps3_scales consistently for multi-scale processing configuration (e.g., defining resolution scales for the vision encoder).
Correcting this single typo resolves the AttributeError and allows the model to load successfully.
Thanks for maintaining this great project! Let me know if any additional details are needed to validate this fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix typo: s3_scales should be ps3_scales in multimodal_encoder/builder.py (AttributeError) #282

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fix typo: s3_scales should be ps3_scales in multimodal_encoder/builder.py (AttributeError) #282

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions