DiffH2o

  • The object is represented by its position and orientation, while the hand is represented by its global position, global orientation, pose, and the signed distance field (SDF) between each hand joint and its nearest point on the object mesh.

  • The synthesis process is divided into two sequential stages: grasping and interaction.

  • The diffusion model takes as input the text prompt, the object mesh, and the diffusion timestep *t*. The prompt is encoded using CLIP, whereas the mesh is encoded using BPS.

  • During the grasping stage, the hand approaches the object while the object remains stationary. This stage is considered complete once the object’s horizontal and vertical velocities exceed 0.01 m/s and at least seven hand vertices are in contact with the object. The guiding prompt for this stage is: *“The person grasps the .”*

  • Imputation is employed to ensure a smooth transition from the grasping stage to the interaction stage.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • a post with code