Zero-shot audio-driven digital human
Style-Talking
Style-Talking clones speaking style from a reference video, predicts LivePortrait expression motion from target audio, and renders a talking video while preserving the original non-face regions.
Speaking-style prompt
wav2vec audio features
LivePortrait rendering
Video-driven identity
Trump / English style-clone result, driven by cloned English speech.
Pipeline
Audio to expression to video
source video
wav2vec
style-prompted expression
renderer
face blend
Multilingual style clone
One reference, four target languages
English
Japanese
Korean
Cross-identity samples
Source video and generated result
Michelle Yeoh / English source
Style-Talking
Zendaya / English source
Style-Talking
Liu Yifei / Chinese source
Style-Talking
Morgan Freeman / English source
Style-Talking
Jake Gyllenhaal / Japanese source
Style-Talking
Timothee Chalamet / Korean source
Style-Talking
Method comparison