Turn a Photo Into a Talking Video
Upload a still portrait, add a voice clip or a script, and gVideo renders a fully animated talking-video — lip-sync, natural motion, broadcast-quality audio. Launching with HeyGen V3 (TTS) and Omnihuman (bring-your-own audio) inside the gVideo Avatar Studio.
Avatar Studio
Generate your avatar video in Studio
Avatar generation uses HeyGen V3 (prompt → talking video) or Omnihuman (photo + audio → full-body avatar). Both need inputs bigger than an inline box can handle — open Studio’s avatar mode to upload and generate from the same credit pool you already use for Kling, Veo, Sora and the rest.
Video Examples
See it in action
Why gVideo
Built for results
Any decent portrait works
Phone snapshot, historical scan, corporate headshot, concept-art character — if it's front-facing and sharp enough at 512px on the short edge, it can be animated.
Two audio paths
Use HeyGen V3's built-in TTS (30+ languages, natural narration voices) or bring your own audio file into Omnihuman for maximum voice control.
Tight lip-sync, believable motion
Not old-school 'animate the mouth-rectangle' avatar work. Both models learn from real speech data — lip shapes, head nods, micro-expression timing all match the audio.
Iterate without re-shooting
Change the script, re-render. Swap the photo, re-render. Pay per render, not per seat — one Pro subscription covers 20+ HeyGen V3 renders per month.
Model Recommendation
Best model for this use case
For most 'photo to talking video' work — explainers, testimonials, greetings, memories — HeyGen V3 with built-in TTS is the fastest path. Bring-your-own-audio? Use Omnihuman instead for richer full-body motion.
“I uploaded a 30-year-old family photo and rendered grandpa saying 'happy birthday' in his own recorded voice. My mom cried. Shipped the same day.”
Common questions
What kind of photo works best?
Front-facing or 3/4-facing portrait, clearly lit face, one person only, 512px+ on the short edge. Group photos, side profiles, and heavily stylized art (cartoons, paintings) produce weaker results on HeyGen V3; Omnihuman handles stylized inputs slightly better but still prefers photoreal portraits.
Can I animate old / historical photos?
Yes. Both models are trained on diverse face data and can animate vintage photos, scanned family portraits, or upscaled old images. Best results come from photos that have been gently upscaled (4K is overkill — 1080p on the short edge is plenty) and color-corrected first.
Do I have to write a script, or can I just speak?
Either. HeyGen V3 takes a written script and renders with built-in TTS. Omnihuman takes an audio file (your voice, a pro VO, or any licensed track) and drives motion from it. Recording a 20-second voice memo is often the fastest path when you want a specific voice.
How realistic does the output look?
On a 1080p portrait with clean lighting and a natural-voice audio track, the output passes most casual viewing. At close inspection you'll notice subtle stiffness around cheeks and a slightly off-rhythm blink cadence — state-of-the-art in April 2026, not indistinguishable-from-real.
Can I use this output commercially?
Yes on paid plans. Commercial rights are included with every paid tier on gVideo. But: if the photo is of a real person, you still need their permission to publish a talking video of them (rights-of-publicity laws apply to AI-animated portraits the same as actual video).
How long can the talking video be?
Both HeyGen V3 and Omnihuman render 30 seconds per invocation on gVideo. For longer content, render multiple 30-second clips and stitch them in an editor — the models are frame-consistent so cuts feel natural.
Ready to generate?
Start free — 100 credits on signup, no credit card required.