EmoStyle: One-Shot Facial Expression Editing Using Continuous Emotion Parameters

Bita Azari*Angelica Lim*

*Simon Fraser University, Burnaby, BC, Canada
samples.png

Paper Code

Abstract


Recent studies have achieved impressive results in face generation and editing of facial expressions. However, existing approaches either generate a discrete number of facial expressions or have limited control over the emotion of the output image. To overcome this limitation, we introduced EmoStyle, a method to edit facial expressions based on valence and arousal, two continuous emotional parameters that can specify a broad range of emotions. EmoStyle is designed to separate emotions from other facial characteristics and to edit the face to display a desired emotion. We employ the pre-trained generator from StyleGAN2, taking advantage of its rich latent space. We also proposed an adapted inversion method to be able to apply our system on out-of-StyleGAN2 domain (OOD) images in a one-shot manner. The qualitative and quantitative evaluations show that our approach has the capability to synthesize a wide range of expressions to output high-resolution images.

Approach


workflow.png

Phase 1 - Training EmoExtract: We train the EmoExtract and up-sampling modules (green) by alternating Emotion Variation with random emotion parameters from the valence-arousal space (top), with Emotion Reconstruction of the input face (bottom). Five auxiliary losses are used for this purpose, as indicated by the dashed lines. The Inversion module is employed to extract the latent code w of the input image Iinput. The EmoExtract module is trained to determine the necessary modifications d that should be applied to a latent code w. Note that d should result in 0 for the Emotion Reconstruction segment. The final latent code is generated by adding d to the original latent code w. Finally, the StyleGAN2 generator is used to create our desired image.

Phase 2 - Fine-tuning StyleGAN2: we freeze the EmoExtract module trained previously and fine-tune our StyleGAN2 component. Our inputs during this phase are emotion parameters and one out-of-domain face. First, we determine the face's latent code utilizing an inversion framework to extract the latent code in the StyleGAN2 W space, then perform a fine-tuning step.

Results


Results on StyleGAN images

results.ng

Results on real images

barak_obama.gif
taylor_swift.gif

Resources


BibTeX

If you find this work useful for your research, please cite:
          @inproceedings{azari2024emostyle,
            title={EmoStyle: One-Shot Facial Expression Editing Using Continuous Emotion Parameters},
            author={Azari, Bita and Lim, Angelica},
            booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
            pages={6385--6394},
            year={2024}
          }
      

© This webpage was in part inspired from this template.