Papers·3개월 전

DeVI: Dexterous Video Imitation from Synthetic Videos for Physically Plausible Hand-Object Interaction

DeVI enables physically plausible dexterous agent control by imitating text-conditioned synthetic videos, outperforming prior 3D demonstration-based methods in hand-object interaction fidelity. The framework uses a hybrid tracking reward combining 3D human tracking with robust 2D object tracking to overcome imprecise generative cues. It achieves zero-shot generalization across diverse objects and interaction types, validated in multi-object scenes and text-driven action diversity.

#dexterous manipulation
#video imitation
#human-object interaction
#synthetic data

Visual Computing Lab

원문 보기 →

DeVI: Dexterous Video Imitation from Synthetic Videos for Physically Plausible Hand-Object Interaction

Comments