hypes.news
← Back to feed
Papers·4일 전

Omni: Unified multimodal model with Context Unrolling improves reasoning across text, image, video, 3D

Omni: Unified multimodal model with Context Unrolling improves reasoning across text, image, video, 3D

Omni, a unified multimodal model trained on text, images, videos, 3D geometry, and hidden representations, achieves strong performance on multimodal generation and understanding benchmarks. Its key innovation, Context Unrolling, enables explicit reasoning across multiple modal representations before prediction, aggregating complementary information for more faithful multimodal reasoning. The model demonstrates in-context generation across all trained modalities.

Ceyuan Yang

Comments

— 첫 댓글을 남겨보세요 —