Skip to content

why deepseek-v4-pro can deploy with dp=8 on H20? #402

@yiminghub2024

Description

@yiminghub2024

why deepseek-v4-pro can deploy with dp=8 on H20?
as we know , dp=8 means all weight files load in one h20,but have 8 repilca to support more sessions , only tp=8 spit all weight files to 8 h20 ,

but one h20 gpu memory can not load all weight files (900G) , why it can works? who knows?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions