OHTA is a novel approach capable of creating implicit animatable hand avatars using just a single image. It facilitates 1) text-to-avatar conversion, 2) hand texture and geometry editing, and 3) interpolation and sampling within the latent space.
[06/2024] 🤩 Code released!
[02/2024] 🥳 OHTA is accepted to CVPR 2024! Working on code release!
Create the conda environment for OHTA with the given script:
bash scripts/create_env.sh
You should accept SMPL-X Model License and install SMPL-X.
You should accept MANO License and download the MANO model from the official website.
Download the pre-trained PairOF and MANO-HD from here, which are provided by HandAvatar. We refer to the MANO-HD implementation from HandAvatar.
We provide the pre-trained model after prior learning, which can be used for one-shot creation. Please download the weights from link.
You should download the dataset from the official website to train the prior model or evaluate the one-shot performance on InterHand2.6M. After downloading the pre-trained models and data, you should organize the folder as follows:
ROOT
├── data
│ └── InterHand
│ └── 5
│ └── annotations
│ └── InterHand2.6M_5fps_batch1
├── output
│ └── pretrained_prior_learning.tar
├── third_parties
│ ├── mano
│ │ ├── MANO_RIGHT.pkl -> models/MANO_RIGHT.pkl
│ │ ├── models
│ ├── pairof
│ │ ├── out
│ ├── smplx
│ │ ├── out
For training and evaluation, you also need to generate hand segmentations.
First, you should follow HandAvatar to generate masks by MANO rendering.
Please refer to scripts/seg_interhand2.6m_from_mano.py
for generating the MANO segmentation:
python scripts/seg_interhand2.6m_from_mano.py
To better train the prior model, we further utilize SAM to generate more hand-aligned segmentations with joint and bounding box prompts.
We strongly recommend using segmentations as well as possible for prior learning.
Please refer to scripts/seg_with_sam.py
for more details:
python scripts/seg_with_sam.py
For one-shot creation, you should use the hand pose estimator to predict the MANO parameters of the input image, and then process the data to the input format.
We have provided a tool for obtaining HandMesh through fitting, along with metadata in the required format. You can refer to HandMesh for data preparation tools. Our method is not limited to using HandMesh; you can also use other Hand Mesh Estimators such as Hamer. You can also refer to scripts/seg_with_sam.py
for generating the hand mask of in-the-wild hand images.
We provide the process script in scripts/process_interhand2.6m
, which can process the data of InterHand2.6M to the format for one-shot creation.
python scripts/process_interhand2.6m.py
We also provide some processed samples in example_data
.
After processing the image to the input format, you can use the create.py
script to create the hand avatar as below:
python create.py --cfg configs/interhand/ohta_create.yaml \
--input example_data/in_the_wild/img/02023.jpg \
--checkpoint output/pretrained_prior_learning.tar
You can also edit the avatar with the given content and the corresponding mask:
python create.py --cfg configs/interhand/ohta_create.yaml \
--input example_data/editing/img/rainbow.jpg
--checkpoint output/pretrained_prior_learning.tar \
--edit
If you are interested in generating hand avatars using text prompts, you can utilize image generation tools (e.g., ControlNet) with text and depth map (obtained by MANO rendering) prompts. After that, you can convert the data to the input format described above for avatar generation.
After creating the one-shot avatar using InterHand2.6M, you can evaluate the performance on the subset.
python train.py --cfg configs/interhand/ohta_create.yaml
You can use the script to train the prior model on InterHand2.6M:
python train.py --cfg configs/interhand/ohta_train.yaml
If you find our work useful for your research, please consider citing the paper:
@inproceedings{
zheng2024ohta,
title={OHTA: One-shot Hand Avatar via Data-driven Implicit Priors},
author={Zheng, Xiaozheng and Wen, Chao and Zhuo, Su and Xu, Zeran and Li, Zhaohu and Zhao, Yang and Xue, Zhou},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
Distributed under the MIT License. See LICENSE
for more information.
This project is built on source codes shared by HandAvatar and PyTorch3D. We thank the authors for their great job!