Robotics paper index
PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception
One-line summary
A robotics research paper on PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception.
Engineering notes
Engineering notes will be added by the Robot Papers editorial team.
Chinese explanation / 中文解读
中文解读待补充:本站会优先为 VLA、具身智能、人形机器人控制、机器人操作等高价值论文补充中文说明。
Original abstract
We introduce PerceptionRubrics, a rubric-based evaluation framework that addresses the gap between saturated benchmark scores and real-world brittleness. Shifting evaluation from holistic semantic matching to rigorous atomic auditing, PerceptionRubrics pairs 1,038 information-dense images with over 12,000 instance-specific rubrics. These criteria are derived from golden captions constructed via a novel Circular Peer-Review consensus pipeline and then distilled into a dual-stream system of Must-Right (essential facts) and Easy-Wrong (fine-grained details) rubrics. Crucially, PerceptionRubrics implements a Gated Scoring mechanism: unlike linear averages, failure on mandatory visual facts triggers sharp binary penalties. Extensive evaluation yields critical insights: (1) The Reliability Gap: models often verify fragmented elements correctly yet fail strict conjunctive constraints, exposing brittleness in dense domains; (2) Open-Closed Stratification: contrary to reasoning trends, we reveal a persistent 8% perception deficit between open-source and proprietary frontiers; and (3) Human-Aligned Rigor: our gated metrics substantially out-align conventional benchmarks, validating that strict perceptual fidelity is the prerequisite for reliable generation.
Links and sources
Need this topic turned into a technical roadmap?
Robot Papers can prepare a custom robotics literature review, code map, dataset map, and B2B technology assessment.
Request B2B research
Comments