Papers·5일 전
DeVI: Dexterous Video Imitation from Synthetic Videos for Physically Plausible Hand-Object Interaction

DeVI enables physically plausible dexterous agent control by imitating text-conditioned synthetic videos, outperforming prior 3D demonstration-based methods in hand-object interaction fidelity. The framework uses a hybrid tracking reward combining 3D human tracking with robust 2D object tracking to overcome imprecise generative cues. It achieves zero-shot generalization across diverse objects and interaction types, validated in multi-object scenes and text-driven action diversity.
- #dexterous manipulation
- #video imitation
- #human-object interaction
- #synthetic data
Visual Computing Lab