why deepseek-v4-pro can deploy with dp=8 on H20?
as we know , dp=8 means all weight files load in one h20,but have 8 repilca to support more sessions , only tp=8 spit all weight files to 8 h20 ,
but one h20 gpu memory can not load all weight files (900G) , why it can works? who knows?
why deepseek-v4-pro can deploy with dp=8 on H20?
as we know , dp=8 means all weight files load in one h20,but have 8 repilca to support more sessions , only tp=8 spit all weight files to 8 h20 ,
but one h20 gpu memory can not load all weight files (900G) , why it can works? who knows?