Papers·3개월 전

Structured Motion Description converts 3D pose to text, achieving 66.7% on BABEL-QA and 90.1% on HuMMan-QA

SMD transforms joint positions into rule-based natural language descriptions (joint angles, body parts, trajectory), enabling LLMs to reason about motion without cross-modal encoders. It achieves SOTA on motion QA (66.7% BABEL-QA, 90.1% HuMMan-QA) and captioning (R@1 0.584, CIDEr 53.16 on HumanML3D). The text representation works across 8 LLMs with lightweight LoRA adaptation and enables interpretable attention analysis.

#motion understanding
#LLM
#structured description
#question answering
#captioning

Yao Zhang

원문 보기 →

Structured Motion Description converts 3D pose to text, achieving 66.7% on BABEL-QA and 90.1% on HuMMan-QA

Comments