Papers·4일 전

Omni: Unified multimodal model with Context Unrolling improves reasoning across text, image, video, 3D

Omni, a unified multimodal model trained on text, images, videos, 3D geometry, and hidden representations, achieves strong performance on multimodal generation and understanding benchmarks. Its key innovation, Context Unrolling, enables explicit reasoning across multiple modal representations before prediction, aggregating complementary information for more faithful multimodal reasoning. The model demonstrates in-context generation across all trained modalities.

#multimodal
#context-unrolling
#omni
#reasoning

Ceyuan Yang

원문 보기 →

Omni: Unified multimodal model with Context Unrolling improves reasoning across text, image, video, 3D

Comments