Home

Zero-shot audio-driven digital human

Style-Talking

Style-Talking clones speaking style from a reference video, predicts LivePortrait expression motion from target audio, and renders a talking video while preserving the original non-face regions.

Speaking-style prompt wav2vec audio features LivePortrait rendering Video-driven identity

Trump / English style-clone result, driven by cloned English speech.

Pipeline

Audio to expression to video

source video wav2vec style-prompted expression renderer face blend

Multilingual style clone

One reference, four target languages

Chinese

English

Japanese

Korean

Cross-identity samples

Source video and generated result

Michelle Yeoh / English source

Style-Talking

Zendaya / English source

Style-Talking

Liu Yifei / Chinese source

Style-Talking

Morgan Freeman / English source

Style-Talking

Jake Gyllenhaal / Japanese source

Style-Talking

Timothee Chalamet / Korean source

Style-Talking

Method comparison

Same source and target audio settings

Trump / English

Style-Talking
Wav2Lip
MuseTalk
LatentSync

Liu Yifei / Chinese

Style-Talking
Wav2Lip
MuseTalk
LatentSync

Morgan Freeman / English

Style-Talking
Wav2Lip
MuseTalk
LatentSync

Michelle Yeoh / English

Style-Talking
Wav2Lip
MuseTalk
LatentSync

Zendaya / English

Style-Talking
Wav2Lip
MuseTalk
LatentSync

Jake Gyllenhaal / Japanese

Style-Talking
Wav2Lip
MuseTalk
LatentSync

Timothee Chalamet / Korean

Style-Talking
Wav2Lip
MuseTalk
LatentSync