Ships·2개월 전
Qwen3.5-397B-A17B: Hybrid Attention MoE with 1M Context, 17B Active Parameters

Alibaba Qwen team released Qwen3.5-397B-A17B, a 397B parameter MoE model with 17B active parameters per token and 1M context window. It uses a hybrid attention architecture alternating Gated DeltaNet (linear) and full attention layers at a 3:1 ratio, improving long-context efficiency. The API version Qwen3.5-Plus includes built-in tools and adaptive tool use. Released amid a cluster of Chinese foundation models (GLM-5, MiniMax M2.5, Kimi K2.5), it emphasizes scalable RL training with million-agent scale.
- #huggingface
- #qwen3.5
- #mixture-of-experts
- #hybrid-attention
- #alibaba
Hugging Face