Robotics paper index

Towards in-the-wild Egocentric 3D Hand-Object Pose Estimation

2026-06-29 · arXiv: 2606.30598

One-line summary

A robotics research paper on Towards in-the-wild Egocentric 3D Hand-Object Pose Estimation.

Engineering notes

Engineering notes will be added by the Robot Papers editorial team.

Chinese explanation / 中文解读

中文解读待补充:本站会优先为 VLA、具身智能、人形机器人控制、机器人操作等高价值论文补充中文说明。

Original abstract

Estimating accurate 3D hand-object pose from in-the-wild egocentric RGB remains challenging due to severe occlusions and ambiguous contact. Existing learning-based methods often struggle to generalise to in-the-wild scenes and are limited by the scarcity of supervision. We address these issues with two contributions. First, we introduce EPIC-Contact, an in-the-wild egocentric dataset of 2.3K clips (62.3K frames) with dense, bijective 3D hand-object contact correspondences and posed meshes. Second, we propose HOPformer, an end-to-end transformer that jointly predicts bi-manual hand and object pose in a single forward pass. A cross-attention decoder conditions object features on hand priors, producing robust pose estimation. We test HOPformer on the in-lab 3D dataset, ARCTIC, as well as our newly introduced EPIC-Contact dataset. HOPformer reaches 82.4% success rate on ARCTIC (+6.2 pts over current SOTA). On EPIC-Contact, it nearly doubles the success rate while reducing contact deviation by 75%. EPIC-Contact, HOPformer code and checkpoints are released: https://sid2697.github.io/epic-contact.

5.0Engineering value
7.0Research novelty
4.0Business relevance

Links and sources

Need this topic turned into a technical roadmap?

Robot Papers can prepare a custom robotics literature review, code map, dataset map, and B2B technology assessment.

Request B2B research

Comments

No comments yet. Be the first to share your thoughts on this paper.
Login or register to leave a comment