HANDS

Observing and Understanding Hands in Action
in conjunction with ECCV 2024


The recordings and the papers are available below

Overview

Welcome to our HANDS@ECCV24.

Our HANDS workshop will gather vision researchers working on perceiving hands performing actions, including 2D & 3D hand detection, segmentation, pose/shape estimation, tracking, etc. We will also cover related applications including gesture recognition, hand-object manipulation analysis, hand activity understanding, and interactive interfaces.

The eighth edition of this workshop will emphasize the use of large foundation models (e.g. CLIP, Point-E, Segment Anything, Latent Diffusion Models) for hand-related tasks. These models have revolutionized the perceptions of AI, and demonstrate groundbreaking contributions to multimodal understanding, zero-shot learning, and transfer learning. However, there remains an untapped potential for exploring their applications in hand-related tasks.

Schedule (Italy Time)

September 30th (2 pm-6 pm), 2024
Room Suite 8, MiCo Milano
Poster Boards Position: inside: 11, outside: 11
Online Zoom Link: https://nus-sg.zoom.us/j/9015323166?omn=82071666030

14:00 - 14:10 Opening Remarks
14:10 - 14:40 Invited Talk: Hanbyul Joo
Title: Towards Capturing Everyday Movements to Scale Up and Enrich Human Motion Data
Abstract: In this talk, I will present our lab's efforts to scale and enrich 3D human motion data by capturing everyday human movements and natural human-object interactions. I will begin by describing our new multi-camera system, ParaHome, designed to capture human-object interactions in a natural home environment. Next, I will introduce MocapEve, a lightweight, cost-effective motion capture solution that uses two smartwatches and a head-mounted camera, enabling full-body 3D motion capture in diverse settings. Finally, I will discuss our recent work, CHRUS and COMa, which enable machines to model comprehensive affordances for 3D objects by leveraging pre-trained 2D diffusion models, allowing for unbounded object interactions.
14:40 - 15:10 Invited Talk: Shubham Tulsiani
Title: Understanding Human-object Interactions for Enabling Generalizable Robot Manipulation
Abstract: We humans continually use our hands to interact with the world around us. From making our morning coffee to cleaning dishes after dinner, we effortlessly perform a plethora of tasks in our everyday lives. A central goal in robot learning is to develop similar generalist agents — ones capable of performing a diverse set of tasks across a wide range of environments. In this talk, I will highlight some of our recent efforts to build perception systems that better understand human interactions and allow robots to act in diverse scenarios. I will show how learning a (3D) generative model over human-object interactions can allow reconstructing interactions from in-the-wild clips, and how (2D) generative models over human interactions can guide robots acting in the real world.
15:10 - 16:10 Coffee break time & Poster
16:10 - 16:40 Invited Talk: Qi Ye
Title: Understanding Hand-Object Interaction – From human hand reconstruction and generation to dexterous manipulation of robotic hands
Abstract: In recent years, humanoid robots and embodied intelligence have attracted extensive attentions. One of the most challenging of endowing humanoid robots with intelligence is human-like dexterous manipulation with robotic hands. Unlike simple parallel grippers, human-like multi-fingered hands involve a high degree of freedom and complex variations in hand-object interaction, making it difficult for humanoid robots to acquire these manipulation skills. This report will explore how to effectively use human manipulation experience to overcome these challenges and develop and transfer human-like dexterous manipulation skills. It will cover our recent works on hand-object reconstruction, grasp generation and motion planning, multi-modal pretraining with human manipulation data for robotic hands, etc.
16:40 - 17:10 Invited Talk: Shunsuke Saito
Title: Foundations for 3D Digital Hand Avatars
Abstract: What constitutes the foundation for 3D digital hand avatars? In this presentation, we aim to establish the essential components necessary for creating high-fidelity digital hand models. We argue that relighting, animation/interaction, and in-the-wild generalization are crucial for bringing high-quality avatars to everyone. We will discuss several relightable appearance representations that achieve a photorealistic appearance under various lighting conditions. Furthermore, we will introduce techniques to effectively model animation and interaction priors, and demonstrate how to estimate complex hand-to-hand and hand-to-object interactions, even with data captured in uncontrolled environments. Finally, the talk will cover bridging the domain gap between high-quality studio data and large-scale in-the-wild data, which is key to enhancing robustness and diversity in avatar modeling algorithms. We will also explore how these foundations can complement and enhance each other.
17:10 - 17:25 Invited Talk: Prithviraj Banerjee
Title: HOT3D: A new benchmark dataset for vision-based understanding of 3D hand-object interactions
Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects.In addition to simple pick-up/observe/putdown actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment. The dataset is recorded by two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. Ground-truth poses were obtained by a professional motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner.We aim to accelerate research on egocentric hand-object interaction by making the HOT3D dataset publicly available and by co-organizing public challenges on the dataset. The dataset can be downloaded from the project website: facebookresearch.github.io
17:25 - 17:53 Competition Talks: Team JVhands
Competition Talks: Team HCB
Competition Talks: Team UVHANDS
Competition Talks: Team ACE
17:53 - 18:00 Closing Remarks

Accepted Papers & Extended Abstracts

We are delighted to announce the following accepted papers and extended abstracts will appear in the workshop! All full-length papers, extended abstracts and invited posters should prepare posters for communication during the workshop.


Poster size: the posters should be portrait (vertical), with a maximum size of 90x180 cm.

Full-length Papers

  • W01 AirLetters: An Open Video Dataset of Characters Drawn in the Air
    Rishit Dagli, Guillaume Berger, Joanna Materzynska, Ingo Bax, Roland Memisevic
  • [pdf]
  • W02 RegionGrasp: A Novel Task for Contact Region Controllable Hand Grasp Generation
    Yilin Wang, Chuan Guo, Li Cheng, Hai Jiang
    [pdf]
  • W03 Generative Hierarchical Temporal Transformer for Hand Pose and Action Modeling
    Yilin Wen, Hao Pan, Takehiko Ohkawa, Lei Yang, Jia Pan, Yoichi Sato, Taku Komura, Wenping Wang
    [pdf]
  • W04 Adaptive Multi-Modal Control of Digital Human Hand Synthesis using a Region-Aware Cycle Loss
    Qifan Fu, Xiaohang Yang, Muhammad Asad, Changjae Oh, Shanxin Yuan, Gregory Slabaugh
    [pdf]
  • W05 Conditional Hand Image Generation using Latent Space Supervision in Random Variable Variational Autoencoders
    Vassilis Nicodemou, Iason Oikonomidis , Giorgos Karvounas, Antonis Argyros
    [pdf]
  • W06 ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild
    Arya Farkhondeh*, Samy Tafasca*, Jean-Marc Odobez
    [pdf]
  • W07 EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
    Masashi Hatano, Ryo Hachiuma, Hideo Saito
    [pdf]

Extended Abstracts

  • W08 AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
    Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Jose J Guerrero, Giovanni Maria Farinella, Antonino Furnari
    [pdf]
  • W09 Diffusion-based Interacting Hand Pose Transfer
    Junho Park*, Yeieun Hwang*, Suk-Ju Kang#
    [pdf]
  • W10 Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
    Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella
    [pdf]
  • W11 Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer
    Xueyi Liu, Kangbo Lyu, jieqiong zhang, Tao Du, Li Yi
    [pdf]
  • W12 Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild
    Nie Lin*, Takehiko Ohkawa*, Mingfang Zhang, Yifei Huang, Ryosuke Furuta, Yoichi Sato
    [pdf]
  • W13 Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers
    An-Lun Liu, Yu-Wei Chao, Yi-Ting Chen
    [pdf]
  • W14 Action Scene Graphs for Long-Form Understanding of Egocentric Videos
    Ivan Rodin*, Antonino Furnari*, Kyle Min*, Subarna Tripathi, Giovanni Maria Farinella
    [pdf]
  • W15 Get a Grip: Reconstructing Hand-Object Stable Grasps in Egocentric Videos
    Zhifan Zhu, Dima Damen
    [pdf]
  • W16 Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
    Huan Yang, Jiahui Chen, Chaofan Ding, Runhua Shi, Siyu Xiong, Qingqi Hong, Xiaoqi Mo, Xinhan Di
    [pdf]
  • W17 OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning
    Shuxin Yang, Xinhan Di
    [pdf]
  • W18 Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
    Zhengdi Yu, Alara Dirik, Stefanos Zafeiriou, Tolga Birdal
    [pdf]
  • W19 Learning Dexterous Object Manipulation with a Robotic Hand via Goal-Conditioned Visual Reinforcement Learning Using Limited Demonstrations
    Samyeul Noh, Hyun Myung
    [pdf]

Invited Posters

  • W20 AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
    Junho Park*, Kyeongbo Kong*, Suk-Ju Kang#
    [pdf]
  • W21 HandDAGT : A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
    Wencan Cheng, Eunji Kim, Jong Hwan Ko
    [poster]
  • W22 On the Utility of 3D Hand Poses for Action Recognition
    Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao
    [pdf]
  • W23 ActionVOS: Actions as Prompts for Video Object Segmentation
    Liangyang Ouyang, Ruicong Liu, Yifei Huang, Ryosuke Furuta, Yoichi Sato
    [poster]
  • W24 GraspXL: Generating Grasping Motions for Diverse Objects at Scale
    Hui Zhang, Sammy Christen, ZicongFan, OtmarHilliges, Jie Song
    [poster]

Technical Reports

  • 3DGS-based Bimanual Category-agnostic Interaction Reconstruction
    Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Hyein Hwang, Soohyun Hwang, Junuk Cha, Jaewook Han, Seungryul Baek
    [pdf]
  • 2nd Place Solution Technical Report for Hands’24 ARCTIC Challenge from Team ACE
    Congsheng Xu*, Yitian Liu*, Yi Cui, Jinfan Liu, Yichao Yan, Weiming Zhao, Yunhui Liu, Xingdong Sheng
    [pdf]
  • Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024
    Minqiang Zou, Zhi Lv, Riqiang Jin, Tian Zhan, Mochen Yu, Yao Tang, Jiajun Liang#
    [pdf]
  • Technical report of HCB team for Multiview Egocentric Hand Tracking Challenge on HANDS 2024 Challenge
    Haohong Kuang, Yang Xiao#, Changlong Jiang, Jinghong Zheng, Hang Xu, Ran Wang, Zhiguo Cao, Min Du, Zhiwen Fang, Joey Tianyi Zhou
    [pdf]

Invited Speakers

Hanbyul Joo
Seoul National University

Qi Ye
Zhejiang University

Shunsuke Saito
Reality Labs Research

Shubham Tulsiani
Carnegie Mellon University

Organizers

Hyung Jin Chang
University of Birmingham

Rongyu Chen
National University of Singapore

Zicong Fan
ETH Zurich

Otmar Hilliges
ETH Zurich

Kun He
Meta Reality Labs

Take Ohkawa
University of Tokyo

Yoichi Sato
University of Tokyo

Elden Tse
National University of Singapore

Linlin Yang
Communication University of China

Lixin Yang
Shanghai Jiao Tong University

Angela Yao
National University of Singapore

Linguang Zhang
Facebook Reality Labs (Oculus)

Technical Program Committee

Thank you so much to the Technical Program Committee for their thoughtful reviews.

  • Chenyangguang Zhang (Tsinghua University)
  • Gyeongsik Moon (Meta)
  • Jiayin Zhu (NUS)
  • Jihyun Lee (KAIST)
  • Junuk Cha (UNIST)
  • Kailin Li (Shanghai Jiao Tong University)
  • Keyang Zhou (University of Tübingen)
  • Pengzhan Sun (NUS)
  • Rolandos Alexandros Potamias (Imperial College London)
  • Seungryul Baek (UNIST)
  • Takuma Yagi (AIST)
  • Zerui Chen (Inria Paris)
  • Zhiying Leng (Beihang University)
  • Zhongqun Zhang (University of Birmingham)

Sponsors

Contact

hands2024@googlegroups.com