CVPR 2025
6D object pose estimation has shown strong generalizability to novel objects. However, existing methods often require either a complete, well-reconstructed 3D model or numerous reference images that fully cover the object. Estimating 6D poses from partial references, which capture only fragments of an object’s appearance and geometry, remains challenging. To address this, we propose UA-Pose, an uncertainty-aware approach for 6D object pose estimation and online object completion specifically designed for partial references. We assume access to either (1) a limited set of RGBD images with known poses or (2) a single 2D image. For the first case, we initialize a partial object 3D model based on the provided images and poses, while for the second, we use image-to-3D techniques to generate an initial object 3D model. Our method integrates uncertainty into the incomplete 3D model, distinguishing between seen and unseen regions. This uncertainty enables confidence assessment in pose estimation and guides an uncertainty-aware sampling strategy for online object completion, enhancing robustness in pose estimation accuracy and improving object completeness. We evaluate our method on the YCB-Video, YCBInEOAT, and HO3D datasets, including RGBD sequences of YCB objects manipulated by robots and human hands. Experimental results demonstrate significant performance improvements over existing methods, particularly when object observations are incomplete or partially captured.
As a start, an object model is required for pose estimation, we propose to utilize a hybrid object representation that integrates the object’s texture, geometry, and uncertainty:
1-a. First, if RGBD object images are provided, a neural SDF will be trained and extracted as a mesh representing the object’s appearance and geometry.
1-b. Otherwise, if only one object image is available, the object mesh will be generated mesh by single-image-to-3D approaches.
2. Then, we check the visibility of each mesh vertex from the viewpoint of each reference image to create the uncertainty map, which reflects the seen and unseen regions of an incomplete 3D object model.
Then, we present an uncertainty-aware pipeline for 6D object pose estimation and online object completion using our proposed hybrid object representation:
1. Pose Estimation: Given the object 3D model and a test RGBD image with object mask, we first estimate the object’s pose using FoundationPose[1].
2. Confidence Assessment: To determine the confidence of an estimated pose, we compute its seen IoU—the overlap between the rendered visible region of the 3D model and the 2D object mask in the image.
3-a. High-Confidence Path: If the seen IoU is high, the pose is treated as a Good estimation. The test image and estimated pose are stored in a memory pool to support future object completion.
3-b. Low-Confidence Path: If the seen IoU is low, the pose is considered a Wrong estimation (or unreliable). In this case, we perform online object completion by sampling informative views from the memory pool. This process refines the hybrid object model during testing, enabling more accurate future pose estimation.
@article{ming2025UAPose, author = {Ming-Feng Li, Xin Yang, Fu-En Wang, Hritam Basak, Yuyin Sun, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo}, title = {UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References}, journal = {CVPR}, year = {2025} }
[1] Wen, Bowen, et al. "Foundationpose: Unified 6d pose estimation and tracking of novel objects." CVPR 2024.