Skip to content

AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 results in the LLaVA model achieving a loss of 0,grad_norm of NAN. #30

@shengyuwoo

Description

@shengyuwoo

When using AIMV2 as the encoder, unfreezing it and setting the learning rate to 1e-6 leads to the LLaVA model reaching a loss of 0 after 5000 steps. The original paper kept the encoder frozen. Why is it not recommended to unfreeze it for training? If I decide to unfreeze it, What should I do?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions