Papers·1주 전

Stanford, MoE 통신 병목 해소하는 Federation of Experts — latency 최대 5.2x 개선

Stanford 팀이 MoE 구조의 분산 통신 병목을 해결하는 Federation of Experts (FoE) 아키텍처를 제안했습니다. 기존 MoE의 all-to-all 통신을 없애기 위해, KV head별로 전문가 클러스터를 구성하고 intra-node 통신으로 제한했습니다. LongBench에서 end-to-end latency를 최대 5.2x, TTFT를 3.62x, TBT를 1.95x 줄이면서도 생성 품질은 동등하게 유지했습니다. 단, 구현이 single-node 및 multi-node 환경에 특화되어 있어, extreme-scale 분산 환경에서의 일반화는 추가 검증이 필요합니다.

#mixture-of-experts
#distributed-inference
#stanford

Stanford University

원문 보기 →

Stanford, MoE 통신 병목 해소하는 Federation of Experts — latency 최대 5.2x 개선

Comments