DeepSeekV2 Model
2025-04-11 17:16:40 0 举报
DeepSeekV2 Model
作者其他创作
大纲/内容
kv
CastToFP8
FP8
(core_attention): TEDotProductAttention
kv_combined
BF16
hidden_states
core_attn_out
CastToBF16
TransformerLayer i
(input_layernorm): RMSNorm()
(linear_q_proj): TEColumnParallelLinear
(mlp): MoELayer
(self_attention): MLASelfAttention
TransformerLayer i-1
q
kv_compressed
(pre_mlp_layernorm): RMSNorm()
(linear_kv_down_proj): TEColumnParallelLinear
(linear_kv_up_proj): TELayerNormColumnParallelLinear
(linear_proj): TERowParallelLinear
TransformerLayer i+1
P2P:FP8

收藏
0 条评论
下一页