Overview

Welcome to our HANDS@ECCV24.

Our HANDS workshop will gather vision researchers working on perceiving hands performing actions, including 2D & 3D hand detection, segmentation, pose/shape estimation, tracking, etc. We will also cover related applications including gesture recognition, hand-object manipulation analysis, hand activity understanding, and interactive interfaces.

The eighth edition of this workshop will emphasize the use of large foundation models (e.g. CLIP, Point-E, Segment Anything, Latent Diffusion Models) for hand-related tasks. These models have revolutionized the perceptions of AI, and demonstrate groundbreaking contributions to multimodal understanding, zero-shot learning, and transfer learning. However, there remains an untapped potential for exploring their applications in hand-related tasks.

Schedule (Italy Time)

September 30th (2 pm-6 pm), 2024
Room Suite 8, MiCo Milano
Poster Boards Position: inside: 11, outside: 11
Online Zoom Link: https://nus-sg.zoom.us/j/9015323166?omn=82071666030

14:00 - 14:10	Opening Remarks
14:10 - 14:40	Invited Talk: Hanbyul Joo
	Title: Towards Capturing Everyday Movements to Scale Up and Enrich Human Motion Data Abstract: In this talk, I will present our lab's efforts to scale and enrich 3D human motion data by capturing everyday human movements and natural human-object interactions. I will begin by describing our new multi-camera system, ParaHome, designed to capture human-object interactions in a natural home environment. Next, I will introduce MocapEve, a lightweight, cost-effective motion capture solution that uses two smartwatches and a head-mounted camera, enabling full-body 3D motion capture in diverse settings. Finally, I will discuss our recent work, CHRUS and COMa, which enable machines to model comprehensive affordances for 3D objects by leveraging pre-trained 2D diffusion models, allowing for unbounded object interactions.
</center>
14:40 - 15:10	Invited Talk: Shubham Tulsiani
	Title: Understanding Human-object Interactions for Enabling Generalizable Robot Manipulation Abstract: We humans continually use our hands to interact with the world around us. From making our morning coffee to cleaning dishes after dinner, we effortlessly perform a plethora of tasks in our everyday lives. A central goal in robot learning is to develop similar generalist agents — ones capable of performing a diverse set of tasks across a wide range of environments. In this talk, I will highlight some of our recent efforts to build perception systems that better understand human interactions and allow robots to act in diverse scenarios. I will show how learning a (3D) generative model over human-object interactions can allow reconstructing interactions from in-the-wild clips, and how (2D) generative models over human interactions can guide robots acting in the real world.
</center>
15:10 - 16:10	Coffee break time & Poster
16:10 - 16:40	Invited Talk: Qi Ye
	Title: Understanding Hand-Object Interaction – From human hand reconstruction and generation to dexterous manipulation of robotic hands Abstract: In recent years, humanoid robots and embodied intelligence have attracted extensive attentions. One of the most challenging of endowing humanoid robots with intelligence is human-like dexterous manipulation with robotic hands. Unlike simple parallel grippers, human-like multi-fingered hands involve a high degree of freedom and complex variations in hand-object interaction, making it difficult for humanoid robots to acquire these manipulation skills. This report will explore how to effectively use human manipulation experience to overcome these challenges and develop and transfer human-like dexterous manipulation skills. It will cover our recent works on hand-object reconstruction, grasp generation and motion planning, multi-modal pretraining with human manipulation data for robotic hands, etc.
</center>
16:40 - 17:10	Invited Talk: Shunsuke Saito
	Title: Foundations for 3D Digital Hand Avatars Abstract: What constitutes the foundation for 3D digital hand avatars? In this presentation, we aim to establish the essential components necessary for creating high-fidelity digital hand models. We argue that relighting, animation/interaction, and in-the-wild generalization are crucial for bringing high-quality avatars to everyone. We will discuss several relightable appearance representations that achieve a photorealistic appearance under various lighting conditions. Furthermore, we will introduce techniques to effectively model animation and interaction priors, and demonstrate how to estimate complex hand-to-hand and hand-to-object interactions, even with data captured in uncontrolled environments. Finally, the talk will cover bridging the domain gap between high-quality studio data and large-scale in-the-wild data, which is key to enhancing robustness and diversity in avatar modeling algorithms. We will also explore how these foundations can complement and enhance each other.
</center>
17:10 - 17:25	Invited Talk: Prithviraj Banerjee
	Title: HOT3D: A new benchmark dataset for vision-based understanding of 3D hand-object interactions Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects.In addition to simple pick-up/observe/putdown actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment. The dataset is recorded by two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. Ground-truth poses were obtained by a professional motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner.We aim to accelerate research on egocentric hand-object interaction by making the HOT3D dataset publicly available and by co-organizing public challenges on the dataset. The dataset can be downloaded from the project website: facebookresearch.github.io
17:25 - 17:53	Competition Talks: Team JVhands Competition Talks: Team HCB Competition Talks: Team UVHANDS Competition Talks: Team ACE
17:53 - 18:00	Closing Remarks

Accepted Papers & Extended Abstracts

We are delighted to announce the following accepted papers and extended abstracts will appear in the workshop! All full-length papers, extended abstracts and invited posters should prepare posters for communication during the workshop.

Poster size: the posters should be portrait (vertical), with a maximum size of 90x180 cm.

Full-length Papers

W01 AirLetters: An Open Video Dataset of Characters Drawn in the Air
Rishit Dagli, Guillaume Berger, Joanna Materzynska, Ingo Bax, Roland Memisevic

[pdf]

W02 RegionGrasp: A Novel Task for Contact Region Controllable Hand Grasp Generation
Yilin Wang, Chuan Guo, Li Cheng, Hai Jiang
[pdf]

W03 Generative Hierarchical Temporal Transformer for Hand Pose and Action Modeling
Yilin Wen, Hao Pan, Takehiko Ohkawa, Lei Yang, Jia Pan, Yoichi Sato, Taku Komura, Wenping Wang
[pdf]

W04 Adaptive Multi-Modal Control of Digital Human Hand Synthesis using a Region-Aware Cycle Loss
Qifan Fu, Xiaohang Yang, Muhammad Asad, Changjae Oh, Shanxin Yuan, Gregory Slabaugh
[pdf]

W05 Conditional Hand Image Generation using Latent Space Supervision in Random Variable Variational Autoencoders
Vassilis Nicodemou, Iason Oikonomidis , Giorgos Karvounas, Antonis Argyros
[pdf]

W06 ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild
Arya Farkhondeh*, Samy Tafasca*, Jean-Marc Odobez
[pdf]

W07 EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
Masashi Hatano, Ryo Hachiuma, Hideo Saito
[pdf]

Extended Abstracts

W08 AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Jose J Guerrero, Giovanni Maria Farinella, Antonino Furnari
[pdf]

W09 Diffusion-based Interacting Hand Pose Transfer
Junho Park*, Yeieun Hwang*, Suk-Ju Kang#
[pdf]

W10 Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
Rosario Leonardi, Antonino Furnari, Francesco Ragusa, Giovanni Maria Farinella
[pdf]

W11 Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer
Xueyi Liu, Kangbo Lyu, jieqiong zhang, Tao Du, Li Yi
[pdf]

W12 Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild
Nie Lin*, Takehiko Ohkawa*, Mingfang Zhang, Yifei Huang, Ryosuke Furuta, Yoichi Sato
[pdf]

W13 Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers
An-Lun Liu, Yu-Wei Chao, Yi-Ting Chen
[pdf]

W14 Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin*, Antonino Furnari*, Kyle Min*, Subarna Tripathi, Giovanni Maria Farinella
[pdf]

W15 Get a Grip: Reconstructing Hand-Object Stable Grasps in Egocentric Videos
Zhifan Zhu, Dima Damen
[pdf]

W16 Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
Huan Yang, Jiahui Chen, Chaofan Ding, Runhua Shi, Siyu Xiong, Qingqi Hong, Xiaoqi Mo, Xinhan Di
[pdf]

W17 OCC-MLLM-Alpha:Empowering Multi-modal Large Language Model for the Understanding of Occluded Objects with Self-Supervised Test-Time Learning
Shuxin Yang, Xinhan Di
[pdf]

W18 Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
Zhengdi Yu, Alara Dirik, Stefanos Zafeiriou, Tolga Birdal
[pdf]

W19 Learning Dexterous Object Manipulation with a Robotic Hand via Goal-Conditioned Visual Reinforcement Learning Using Limited Demonstrations
Samyeul Noh, Hyun Myung
[pdf]

Invited Posters

W20 AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
Junho Park*, Kyeongbo Kong*, Suk-Ju Kang#
[pdf]

W21 HandDAGT : A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation
Wencan Cheng, Eunji Kim, Jong Hwan Ko
[poster]

W22 On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao
[pdf]

W23 ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang, Ruicong Liu, Yifei Huang, Ryosuke Furuta, Yoichi Sato
[poster]

W24 GraspXL: Generating Grasping Motions for Diverse Objects at Scale
Hui Zhang, Sammy Christen, ZicongFan, OtmarHilliges, Jie Song
[poster]

Technical Reports

3DGS-based Bimanual Category-agnostic Interaction Reconstruction
Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Hyein Hwang, Soohyun Hwang, Junuk Cha, Jaewook Han, Seungryul Baek
[pdf]

2nd Place Solution Technical Report for Hands’24 ARCTIC Challenge from Team ACE
Congsheng Xu*, Yitian Liu*, Yi Cui, Jinfan Liu, Yichao Yan, Weiming Zhao, Yunhui Liu, Xingdong Sheng
[pdf]

Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024
Minqiang Zou, Zhi Lv, Riqiang Jin, Tian Zhan, Mochen Yu, Yao Tang, Jiajun Liang#
[pdf]

Technical report of HCB team for Multiview Egocentric Hand Tracking Challenge on HANDS 2024 Challenge
Haohong Kuang, Yang Xiao#, Changlong Jiang, Jinghong Zheng, Hang Xu, Ran Wang, Zhiguo Cao, Min Du, Zhiwen Fang, Joey Tianyi Zhou
[pdf]