Ships·2개월 전

Qwen3.5-397B-A17B: Hybrid Attention MoE with 1M Context, 17B Active Parameters

Alibaba Qwen team released Qwen3.5-397B-A17B, a 397B parameter MoE model with 17B active parameters per token and 1M context window. It uses a hybrid attention architecture alternating Gated DeltaNet (linear) and full attention layers at a 3:1 ratio, improving long-context efficiency. The API version Qwen3.5-Plus includes built-in tools and adaptive tool use. Released amid a cluster of Chinese foundation models (GLM-5, MiniMax M2.5, Kimi K2.5), it emphasizes scalable RL training with million-agent scale.

#huggingface
#qwen3.5
#mixture-of-experts
#hybrid-attention
#alibaba

Hugging Face

원문 보기 →

Qwen3.5-397B-A17B: Hybrid Attention MoE with 1M Context, 17B Active Parameters

Comments