hypes.news
← Back to feed
Ships·2개월 전

Qwen3.5-397B-A17B: Hybrid Attention MoE with 1M Context, 17B Active Parameters

Qwen3.5-397B-A17B: Hybrid Attention MoE with 1M Context, 17B Active Parameters

Alibaba Qwen team released Qwen3.5-397B-A17B, a 397B parameter MoE model with 17B active parameters per token and 1M context window. It uses a hybrid attention architecture alternating Gated DeltaNet (linear) and full attention layers at a 3:1 ratio, improving long-context efficiency. The API version Qwen3.5-Plus includes built-in tools and adaptive tool use. Released amid a cluster of Chinese foundation models (GLM-5, MiniMax M2.5, Kimi K2.5), it emphasizes scalable RL training with million-agent scale.

Hugging Face

Comments

— 첫 댓글을 남겨보세요 —