标准FFN实现

(mlp): LlamaMLP(
  (gate_proj): Linear(in_features=2048, out_features=8192, bias=False)
  (up_proj): Linear(in_features=2048, out_features=8192, bias=False)
  (down_proj): Linear(in_features=8192, out_features=2048, bias=False)
  (act_fn): SiLU()
)

增加Gating network，forward function中通过Gating network选择最适合的expert，每个expert进行以下计算：
- self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x)

LLM智能应用开发

LLM结构的学习路径

Transformer经典结构

Transformer经典结构

Mixer-of-Experts (MoE)

Example of mixture of experts

NLP模型中的Mixture of experts

MoE中的稀疏

MoE中的稀疏

MoE中的稀疏

MoE结构示例

MoE与Transformers

MoE与Transformers

如何确定MoE中的expert

前馈神经网络(FFN)

标准FFN实现

MoE的训练

Low-rank adaptation (LoRA)

LoRA基本思路

LoRA推理

LoRA实现

LoRA实现

浮点数表示

浮点数表示

浮点数表示

浮点数表示