hypes.news
← Back to feed
Papers·4일 전

Structured Motion Description converts 3D pose to text, achieving 66.7% on BABEL-QA and 90.1% on HuMMan-QA

Structured Motion Description converts 3D pose to text, achieving 66.7% on BABEL-QA and 90.1% on HuMMan-QA

SMD transforms joint positions into rule-based natural language descriptions (joint angles, body parts, trajectory), enabling LLMs to reason about motion without cross-modal encoders. It achieves SOTA on motion QA (66.7% BABEL-QA, 90.1% HuMMan-QA) and captioning (R@1 0.584, CIDEr 53.16 on HumanML3D). The text representation works across 8 LLMs with lightweight LoRA adaptation and enables interpretable attention analysis.

Yao Zhang

Comments

— 첫 댓글을 남겨보세요 —