Observing and Understanding Hands in Action
in conjunction with ICCV 2025

Join Us: Oct.20 13:00-17:00, 305 B, Hawai'i Convention Center

Poster Sessions: Oct.20 14:00 - 16:30, Boards 112 - 131, Exhibit Hall I

Overview

Welcome to our HANDS@ICCV25.

We are very happy to organize HANDS workshop. This year's workshop will be held at ICCV25. See you in Honolulu.

The ninth edition of this workshop will emphasize the use of multimodal LLMs for hand-related tasks. Multimodal LLMs have revolutionized the perceptions of AI, and demonstrated groundbreaking contributions to multimodal understanding, zero-shot learning, and transfer learning. Those models can process and integrate information from different types of hand data (or modalities), allowing the model to better understand complex hand-object/-hand interaction situations by capturing richer, more diverse representations.

During the workshop, we will explore multimodal LLMs for hand-related tasks through the talks of invited speakers, the presentation of accepted papers, and workshop challenges.

Invited Speakers

Seungryul Baek
UNIST

Jihyun Lee
Meta

Rolandos Potamias
Imperial College London

Srinath Sridhar
Brown University

Jingya Wang
ShanghaiTech University

Schedule

Time: 13:00 - 17:00 Oct. 20 (Hawai'i time)

Location: 305 B, ICCV25 Hawai'i Convention Center

The detailed schedule is below.

13:00 - 13:10	Opening Remarks
13:10 - 13:40	Invited Talk: Srinath Sridhar Bio: Srinath Sridhar (srinathsridhar.com) is the John E. Savage Assistant Professor of Computer Science at Brown University, where he leads the Interactive 3D Vision & Learning Lab (ivl.cs.brown.edu). He received his PhD at the Max Planck Institute for Informatics and was subsequently a postdoctoral researcher at Stanford. His research interests are in 3D computer vision and artificial intelligence. Specifically, his group builds foundational methods for 3D spatiotemporal (4D) visual understanding of the world including objects in it, humans in motion, and human-object interactions, with applications ranging from robotics to mixed reality. He is the recipient of an NSF CAREER award, a Google Research Scholar award, and his work received a Best Student Paper award at WACV and a Best Paper Honorable Mention at Eurographics. He spends part of his time as an Amazon Scholar and a visiting faculty at the Indian Institute of Science (IISc).
	Title: Vision and Touch in Robot Learning and Interaction Abstract: Touch, together with vision, is a fundamental sensing modality for robots. However, sensing and combining touch with vision has been hard due to hardware and algorithmic challenges. In this talk, I will discuss my group's work on visuo-tactile sensing and fusion. Specifically, I will introduce (1) GigaHands, a new large-scale 3D human hand activity dataset that provides visual and contact information for robot manipulation learning, and (2) UniTac, a new method for touch sensing that operates without any tactile sensors. We show that touch sensing does not always need cumbersome hardware, and can add significant information for better robot learning.
13:40 - 14:10	Invited Talk: Jihyun Lee
	Title: Towards a Universal Generative Prior for Hands and Their Interactions Abstract: We are witnessing remarkable progress in generative modeling, with recent diffusion- and flow-based models demonstrating powerful generative capabilities. In this talk, I will discuss our recent efforts to harness these advances to build a deep generative prior for hands and their interactions. Such priors capture the distribution of plausible hand shapes, poses, and interactions, serving as a universal regularizer for long-standing hand-related vision problems, such as monocular 3D reconstruction. By constraining the solution space to what is physically and semantically plausible, generative priors reduce the ill-posedness of these problems and are particularly effective for in-the-wild generalization, where training supervision is often noisy or insufficient — helping advance progress toward more robust and reliable real-world vision systems.
14:10 - 14:40	Invited Talk: Seungryul Baek
	Title: Two Hands and an Object: From Perception to Generation Abstract: In this talk, I will present our lab's recent efforts to advance the understanding of hand–object interactions, which play a crucial role in human activities and everyday manipulation. In particular, we are addressing one of the most challenging scenarios: the complex interaction between two hands and an object, where coordination, occlusion, and fine-grained motion understanding become highly demanding. I will begin by describing a Transformer-based framework that we have developed for modeling and interpreting the dynamics of two hands interacting with an object. Next, I will introduce BiGS, a method that goes a step further by extending this setting to previously unseen or unknown objects. Finally, I will present Text2HOI, a generative model that takes natural language text prompts as input and synthesizes plausible two-hand–object interaction motions.
14:40 - 15:30	Coffee break time & Poster
15:30 - 16:00	Invited Talk: Jingya Wang
	Title: Open-World Hand-Object Interaction Synthesis:Towards Generalizable and Dexterous Embodied Manipulation Abstract: Dexterous hand-object interaction constitutes a fundamental component of human physical intelligence, enabling the execution of complex manipulation tasks in unstructured environments. The synthesis of such interactions from open-ended instructions presents significant challenges, particularly in cross-object generalization, long-horizon task reasoning, and physical plausibility assurance. In this talk, we will introduce OpenHOI, a framework that employs a 3D Multimodal Large Language Model to translate open-vocabulary instructions into executable interaction sequences through semantic task decomposition and affordance reasoning. Subsequently, we will discuss UniHM, which establishes a unified representation space for heterogeneous hand morphologies to facilitate cross-dexterous-hand manipulation. Furthermore, we will examine how the integration of human gaze as a biological prior in our GHO-Diffusion model enhances intentionality and human-likeness in synthesized interactions.
16:00 - 16:30	Invited Talk: Rolandos Potamias
	Title: Building the Tools of Embodied AI: From Human Hands to Dexterous Agents Abstract: Hands are essential tools for humans to act, interact, and communicate in nearly all daily activities, highlighting the critical need for precise modeling to achieve highly realistic digital agents. However, the complexity of hands, characterized by their scale, degrees of freedom, and versatility, poses significant challenges for current human-centered AI frameworks. These limitations are evident across various domains, including human motion modeling, generative image and video synthesis, and 3D human reconstruction. In this talk, I will discuss key challenges in 3D hand shape and appearance modeling, introducing our large-scale Handy model and WiLoR, our approach for real-time hand detection and reconstruction of hand-object interactions from in-the-wild images. I will also present HaWoR, designed to reconstruct 4D hand motion in world space, particularly from egocentric wearable camera settings where both the hands and camera are in motion. Finally, I will introduce our recent work, CEDex, a large-scale dataset for cross-embodied dexterous grasping derived from human-like contact representations.
16:30 - 16:50	Challenge winner talks
16:50 - 17:00	Closing Remarks

Accepted Papers & Extended Abstracts

We are delighted to announce the following accepted papers and extended abstracts will appear in the workshop! All full-length papers, extended abstracts and invited posters should prepare posters for communication during the workshop. Poster size is 84” x 42”.

Full-length Papers

Board 112
DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation
Xiaoyan Cong, Angela Xing, Chandradeep Pokhariya, Rao Fu, Srinath Sridhar

[pdf]

Board 113
HOSt3R: Keypoint-free Hand-Object 3D Reconstruction from RGB images
Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Jean-Sébastien Franco, Grégory Rogez

[pdf]

Board 114
WACU: Multi-Modal Wristband Assistant for Contextual Understanding
Constantin Patsch, Jaden Goter, Joseph Greer, Lingni Ma, Rajinder Sodhi

[pdf]

Extended Abstracts

Board 115
Towards Consistent Long-Term Pose Generation
Yayuan Li, Filippos Bellos, Jason J Corso
[pdf]

Board 116
HANDI: Hand-Centric Text-and-Image Conditioned Video Generation
Yayuan Li, Zhi Cao, Jason J Corso
[pdf]

Board 117
DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions
Takehiko Ohkawa, Yifan Zhou, Guwenxiao Zhou, Kanoko Goto, Takumi Hirose, Yusuke Sekikawa, Nakamasa Inoue
[pdf]

Board 118
Generative Modeling of Shape-Dependent Self-Contact Human Poses
Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito, Jason Saragih, Fabian Prada, Yichen Xu, Shoou-I Yu, Ryosuke Furuta, Yoichi Sato, Takaaki Shiratori
[pdf]

Board 119
AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities
Tatsuro Banno, Takehiko Ohkawa, Ruicong Liu, Ryosuke Furuta, Yoichi Sato
[pdf]

Board 120
Affordance-Guided Diffusion Prior for 3D Hand Reconstruction
Naru Suzuki, Takehiko Ohkawa, Tatsuro Banno, Jihyun Lee, Ryosuke Furuta, Yoichi Sato
[pdf]

Board 121
Leveraging RGB Images for Pre-Training of Event-Based Hand Pose Estimation
Ruicong Liu, Takehiko Ohkawa, Tze Ho Elden Tse, Mingfang Zhang, Angela Yao, Yoichi Sato
[pdf]

Board 122
Egocentric 3D Hand-Object Tracking in the Wild with Mobile Multi-Camera Rig
Patrick Rim, Kun He, Kevin Harris, Braden Copple, Shangchen Han, Sizhe An, Ivan Shugurov, Tomas Hodan, He Wen, Xu Xie
[pdf]

Board 123
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger, Snehal Jauhri, Vignesh Prasad, Georgia Chalvatzaki
[pdf]

Board 124
Generating Egocentric View from Exocentric View via Multimodal Observations
Junho Park, Andrew Sangwoo Ye, Taein Kwon
[pdf]

Board 125
Replace-in-Ego: Text-Guided Object Replacement in Egocentric Hand-Object Interaction
Minsuh Song, Junho Park, Suk-Ju Kang
[pdf]

Board 126
HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics
Masatoshi Tateno, Gido Kato, Kensho Hara, Hirokatsu Kataoka, Yoichi Sato, Takuma Yagi
[pdf]

Board 127
Music Performance Hands-included-Motion Generation via dual domain loss with Audio Reconstruction
Hiroki Nishizawa, Seong Jong Yoo, Keitaro Tanaka, Shugo Yamaguchi, Qi Feng, Masatoshi Hamanaka, Cornelia Fermuller, Shigeo Morishima
[pdf]

Board 128
VQ-MyoPose: Movement tokenization improves decoding of hand kinematics from surface EMG wristbands
Rossana Lovecchio, Pranav Mamidanna, Bart Jansen, Tom Verstraten, Dario Farina
[pdf]

Board 129
Understanding Co-speech Gestures in-the-wild
Sindhu B Hegde, K R Prajwal, Taein Kwon, Andrew Zisserman

Board 130
MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
Shibo Wang, Haonan He, Maria Parelli, Christoph Gebhardt, Zicong Fan, Jie Song

Technical Report

GHOST: Gaussian Hand–Object Surface Reconstruction with Geometric Priors
Ahmed Tawfik Aboukhadra, Marcel Rogge, Nadia Robertini, Ahmed Elhayek, Abdalla Arafa, Jameel Malik, Didier Stricker
[pdf]

Technical Report of HCB-Hand Team for Dexterous HO Tracker Challenge on HANDS 2025 Challenge
Hang Xu, Yang Xiao, Changlong Jiang, Haohong Kuang, Yangfan Deng, Qihang Zhou, Ran Wang, Min Du, Zhiwen Fang, Zhiguo Cao, Joey Tianyi Zhou
[pdf]