Object Reconstruction Tasks

3D Shape Reconstruction

Given an RGB image of an object, a sequence of tactile readings from the object’s surface, or a sequence of impact sounds of striking its surface locations, the task is to reconstruct the point cloud of the target object given combinations of these multisensory observations. This task is related to prior efforts on visuo-tactile 3D reconstruction, but here we use all three sensory modalities and study their respective roles.

Sound Generation of Dynamic Objects

Given a video clip of a falling object, the goal of this task is to generate the corresponding sound based on the visual appearance and motion of the object. The generated sound must match the object’s intrinsic properties (e.g., material type) and temporally align with the object’s movement in the given video. This task is related to prior work on sound generation from in-the-wild videos, but here we focus more on predicting soundtracks that closely match the object dynamics.

Visuo-Tactile Cross-Generation

Given a video clip of a falling object, the goal of this task is to generate the corresponding sound based on the visual appearance and motion of the object. The generated sound must match the object’s intrinsic properties (e.g., material type) and temporally align with the object’s movement in the given video. This task is related to prior work on sound generation from in-the-wild videos, but here we focus more on predicting soundtracks that closely match the object dynamics.