Papers·4일 전
Structured Motion Description converts 3D pose to text, achieving 66.7% on BABEL-QA and 90.1% on HuMMan-QA

SMD transforms joint positions into rule-based natural language descriptions (joint angles, body parts, trajectory), enabling LLMs to reason about motion without cross-modal encoders. It achieves SOTA on motion QA (66.7% BABEL-QA, 90.1% HuMMan-QA) and captioning (R@1 0.584, CIDEr 53.16 on HumanML3D). The text representation works across 8 LLMs with lightweight LoRA adaptation and enables interpretable attention analysis.
- #motion understanding
- #LLM
- #structured description
- #question answering
- #captioning
Yao Zhang