HANDS

Observing and Understanding Hands in Action
in conjunction with ICCV 2025



Join Us: Oct.20 13:00-17:00, 305 B, Hawai'i Convention Center

Poster Sessions: Oct.20 14:00 - 16:30, Boards 112 - 131, Exhibit Hall I

Overview

Welcome to our HANDS@ICCV25.

We are very happy to organize HANDS workshop. This year's workshop will be held at ICCV25. See you in Honolulu.

The ninth edition of this workshop will emphasize the use of multimodal LLMs for hand-related tasks. Multimodal LLMs have revolutionized the perceptions of AI, and demonstrated groundbreaking contributions to multimodal understanding, zero-shot learning, and transfer learning. Those models can process and integrate information from different types of hand data (or modalities), allowing the model to better understand complex hand-object/-hand interaction situations by capturing richer, more diverse representations.

During the workshop, we will explore multimodal LLMs for hand-related tasks through the talks of invited speakers, the presentation of accepted papers, and workshop challenges.

Invited Speakers

Jihyun Lee
Meta

Rolandos Potamias
Imperial College London

Srinath Sridhar
Brown University

Jingya Wang
ShanghaiTech University

Schedule

Time: 13:00 - 17:00 Oct. 20 (Hawai'i time)

Location: 305 B, ICCV25 Hawai'i Convention Center

The detailed schedule is below.

13:00 - 13:10 Opening Remarks
13:10 - 13:40 Invited Talk: Srinath Sridhar
Bio: Srinath Sridhar (srinathsridhar.com) is the John E. Savage Assistant Professor of Computer Science at Brown University, where he leads the Interactive 3D Vision & Learning Lab (ivl.cs.brown.edu). He received his PhD at the Max Planck Institute for Informatics and was subsequently a postdoctoral researcher at Stanford. His research interests are in 3D computer vision and artificial intelligence. Specifically, his group builds foundational methods for 3D spatiotemporal (4D) visual understanding of the world including objects in it, humans in motion, and human-object interactions, with applications ranging from robotics to mixed reality. He is the recipient of an NSF CAREER award, a Google Research Scholar award, and his work received a Best Student Paper award at WACV and a Best Paper Honorable Mention at Eurographics. He spends part of his time as an Amazon Scholar and a visiting faculty at the Indian Institute of Science (IISc).
Title: Vision and Touch in Robot Learning and Interaction
Abstract: Touch, together with vision, is a fundamental sensing modality for robots. However, sensing and combining touch with vision has been hard due to hardware and algorithmic challenges. In this talk, I will discuss my group's work on visuo-tactile sensing and fusion. Specifically, I will introduce (1) GigaHands, a new large-scale 3D human hand activity dataset that provides visual and contact information for robot manipulation learning, and (2) UniTac, a new method for touch sensing that operates without any tactile sensors. We show that touch sensing does not always need cumbersome hardware, and can add significant information for better robot learning.
13:40 - 14:10 Invited Talk: Jihyun Lee
Title: Towards a Universal Generative Prior for Hands and Their Interactions
Abstract: We are witnessing remarkable progress in generative modeling, with recent diffusion- and flow-based models demonstrating powerful generative capabilities. In this talk, I will discuss our recent efforts to harness these advances to build a deep generative prior for hands and their interactions. Such priors capture the distribution of plausible hand shapes, poses, and interactions, serving as a universal regularizer for long-standing hand-related vision problems, such as monocular 3D reconstruction. By constraining the solution space to what is physically and semantically plausible, generative priors reduce the ill-posedness of these problems and are particularly effective for in-the-wild generalization, where training supervision is often noisy or insufficient — helping advance progress toward more robust and reliable real-world vision systems.
14:10 - 14:40 Invited Talk: Seungryul Baek
Title: Two Hands and an Object: From Perception to Generation
Abstract: In this talk, I will present our lab's recent efforts to advance the understanding of hand–object interactions, which play a crucial role in human activities and everyday manipulation. In particular, we are addressing one of the most challenging scenarios: the complex interaction between two hands and an object, where coordination, occlusion, and fine-grained motion understanding become highly demanding. I will begin by describing a Transformer-based framework that we have developed for modeling and interpreting the dynamics of two hands interacting with an object. Next, I will introduce BiGS, a method that goes a step further by extending this setting to previously unseen or unknown objects. Finally, I will present Text2HOI, a generative model that takes natural language text prompts as input and synthesizes plausible two-hand–object interaction motions.
14:40 - 15:30 Coffee break time & Poster
15:30 - 16:00 Invited Talk: Jingya Wang
Title: Open-World Hand-Object Interaction Synthesis:Towards Generalizable and Dexterous Embodied Manipulation
Abstract: Dexterous hand-object interaction constitutes a fundamental component of human physical intelligence, enabling the execution of complex manipulation tasks in unstructured environments. The synthesis of such interactions from open-ended instructions presents significant challenges, particularly in cross-object generalization, long-horizon task reasoning, and physical plausibility assurance. In this talk, we will introduce OpenHOI, a framework that employs a 3D Multimodal Large Language Model to translate open-vocabulary instructions into executable interaction sequences through semantic task decomposition and affordance reasoning. Subsequently, we will discuss UniHM, which establishes a unified representation space for heterogeneous hand morphologies to facilitate cross-dexterous-hand manipulation. Furthermore, we will examine how the integration of human gaze as a biological prior in our GHO-Diffusion model enhances intentionality and human-likeness in synthesized interactions.
16:00 - 16:30 Invited Talk: Rolandos Potamias
Title: Building the Tools of Embodied AI: From Human Hands to Dexterous Agents
Abstract: Hands are essential tools for humans to act, interact, and communicate in nearly all daily activities, highlighting the critical need for precise modeling to achieve highly realistic digital agents. However, the complexity of hands, characterized by their scale, degrees of freedom, and versatility, poses significant challenges for current human-centered AI frameworks. These limitations are evident across various domains, including human motion modeling, generative image and video synthesis, and 3D human reconstruction. In this talk, I will discuss key challenges in 3D hand shape and appearance modeling, introducing our large-scale Handy model and WiLoR, our approach for real-time hand detection and reconstruction of hand-object interactions from in-the-wild images. I will also present HaWoR, designed to reconstruct 4D hand motion in world space, particularly from egocentric wearable camera settings where both the hands and camera are in motion. Finally, I will introduce our recent work, CEDex, a large-scale dataset for cross-embodied dexterous grasping derived from human-like contact representations.
16:30 - 16:50 Challenge winner talks
16:50 - 17:00 Closing Remarks

Accepted Papers & Extended Abstracts

We are delighted to announce the following accepted papers and extended abstracts will appear in the workshop! All full-length papers, extended abstracts and invited posters should prepare posters for communication during the workshop. Poster size is 84” x 42”.


Full-length Papers

  • Board 112
    DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation
    Xiaoyan Cong, Angela Xing, Chandradeep Pokhariya, Rao Fu, Srinath Sridhar
  • [pdf]
  • Board 113
    HOSt3R: Keypoint-free Hand-Object 3D Reconstruction from RGB images
    Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Jean-Sébastien Franco, Grégory Rogez
  • [pdf]
  • Board 114
    WACU: Multi-Modal Wristband Assistant for Contextual Understanding
    Constantin Patsch, Jaden Goter, Joseph Greer, Lingni Ma, Rajinder Sodhi
  • [pdf]

Extended Abstracts

  • Board 115
    Towards Consistent Long-Term Pose Generation
    Yayuan Li, Filippos Bellos, Jason J Corso
    [pdf]
  • Board 116
    HANDI: Hand-Centric Text-and-Image Conditioned Video Generation
    Yayuan Li, Zhi Cao, Jason J Corso
    [pdf]
  • Board 117
    DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions
    Takehiko Ohkawa, Yifan Zhou, Guwenxiao Zhou, Kanoko Goto, Takumi Hirose, Yusuke Sekikawa, Nakamasa Inoue
    [pdf]
  • Board 118
    Generative Modeling of Shape-Dependent Self-Contact Human Poses
    Takehiko Ohkawa, Jihyun Lee, Shunsuke Saito, Jason Saragih, Fabian Prada, Yichen Xu, Shoou-I Yu, Ryosuke Furuta, Yoichi Sato, Takaaki Shiratori
    [pdf]
  • Board 119
    AssemblyHands-X: Modeling 3D Hand-Body Coordination for Understanding Bimanual Human Activities
    Tatsuro Banno, Takehiko Ohkawa, Ruicong Liu, Ryosuke Furuta, Yoichi Sato
    [pdf]
  • Board 120
    Affordance-Guided Diffusion Prior for 3D Hand Reconstruction
    Naru Suzuki, Takehiko Ohkawa, Tatsuro Banno, Jihyun Lee, Ryosuke Furuta, Yoichi Sato
    [pdf]
  • Board 121
    Leveraging RGB Images for Pre-Training of Event-Based Hand Pose Estimation
    Ruicong Liu, Takehiko Ohkawa, Tze Ho Elden Tse, Mingfang Zhang, Angela Yao, Yoichi Sato
    [pdf]
  • Board 122
    Egocentric 3D Hand-Object Tracking in the Wild with Mobile Multi-Camera Rig
    Patrick Rim, Kun He, Kevin Harris, Braden Copple, Shangchen Han, Sizhe An, Ivan Shugurov, Tomas Hodan, He Wen, Xu Xie
    [pdf]
  • Board 123
    2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
    Marvin Heidinger, Snehal Jauhri, Vignesh Prasad, Georgia Chalvatzaki
    [pdf]
  • Board 124
    Generating Egocentric View from Exocentric View via Multimodal Observations
    Junho Park, Andrew Sangwoo Ye, Taein Kwon
    [pdf]
  • Board 125
    Replace-in-Ego: Text-Guided Object Replacement in Egocentric Hand-Object Interaction
    Minsuh Song, Junho Park, Suk-Ju Kang
    [pdf]
  • Board 126
    HanDyVQA: A Video QA Benchmark for Fine-Grained Hand-Object Interaction Dynamics
    Masatoshi Tateno, Gido Kato, Kensho Hara, Hirokatsu Kataoka, Yoichi Sato, Takuma Yagi
    [pdf]
  • Board 127
    Music Performance Hands-included-Motion Generation via dual domain loss with Audio Reconstruction
    Hiroki Nishizawa, Seong Jong Yoo, Keitaro Tanaka, Shugo Yamaguchi, Qi Feng, Masatoshi Hamanaka, Cornelia Fermuller, Shigeo Morishima
    [pdf]
  • Board 128
    VQ-MyoPose: Movement tokenization improves decoding of hand kinematics from surface EMG wristbands
    Rossana Lovecchio, Pranav Mamidanna, Bart Jansen, Tom Verstraten, Dario Farina
    [pdf]
  • Board 129
    Understanding Co-speech Gestures in-the-wild
    Sindhu B Hegde, K R Prajwal, Taein Kwon, Andrew Zisserman
  • Board 130
    MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
    Shibo Wang, Haonan He, Maria Parelli, Christoph Gebhardt, Zicong Fan, Jie Song

Technical Report

  • GHOST: Gaussian Hand–Object Surface Reconstruction with Geometric Priors
    Ahmed Tawfik Aboukhadra, Marcel Rogge, Nadia Robertini, Ahmed Elhayek, Abdalla Arafa, Jameel Malik, Didier Stricker
    [pdf]
  • Technical Report of HCB-Hand Team for Dexterous HO Tracker Challenge on HANDS 2025 Challenge
    Hang Xu, Yang Xiao, Changlong Jiang, Haohong Kuang, Yangfan Deng, Qihang Zhou, Ran Wang, Min Du, Zhiwen Fang, Zhiguo Cao, Joey Tianyi Zhou
    [pdf]

Organizers

Hyung Jin Chang
University of Birmingham

Rongyu Chen
National University of Singapore

Zicong Fan
Meshcapade

Rao Fu
Brown University

Kun He
Meta Reality Labs

Kailin Li
Shanghai AI Laboratory

Take Ohkawa
University of Tokyo

Yoichi Sato
University of Tokyo

Linlin Yang
Communication University of China

Lixin Yang
Shanghai Jiao Tong University

Angela Yao
National University of Singapore

Qi Ye
Zhejiang University

Linguang Zhang
Meta Reality Labs

Zhongqun Zhang
University of Birmingham

Sponsors

Contact

hands2025@googlegroups.com