登录免费注册

首页  流程图  详情

DeepSeekV2 Model

2025-04-11 17:16:40   0  举报





DeepSeekV2 Model

model架构

作者其他创作

大纲/内容

kv

CastToFP8

FP8

(core_attention): TEDotProductAttention

kv_combined

BF16

hidden_states

core_attn_out

CastToBF16

TransformerLayer i

(input_layernorm): RMSNorm()

(linear_q_proj): TEColumnParallelLinear

(mlp): MoELayer

(self_attention): MLASelfAttention

TransformerLayer i-1

q

kv_compressed

(pre_mlp_layernorm): RMSNorm()

(linear_kv_down_proj): TEColumnParallelLinear

(linear_kv_up_proj): TELayerNormColumnParallelLinear

(linear_proj): TERowParallelLinear

TransformerLayer i+1

P2P:FP8

DeepSeekV2 Model

 收藏

立即使用

DeepSeekV2 Model

职业：暂无













评论

0 条评论

下一页

为你推荐

查看更多



Model View Controller

Model View Controller

topview-project-model

topview-project-model

Assignment 6.1 - Level 2 - Operational Process Model

Assignment 6.1 - Level 2 - Operational Process Model

CMMI model components

CMMI model components

Model-Based BPM on a Business Process Platform

Model-Based BPM on a Business Process Platform