Generate AI Avatars From a Single Photo
Upload one portrait photo, write a script (or bring your own audio), and get a talking avatar with lip-sync, natural voice, and expressive motion. Ships with two flagship models under one credit pool.
Avatar Studio
Generate your avatar video in Studio
Avatar generation uses HeyGen V3 (prompt → talking video) or Omnihuman (photo + audio → full-body avatar). Both need inputs bigger than an inline box can handle — open Studio’s avatar mode to upload and generate from the same credit pool you already use for Kling, Veo, Sora and the rest.
Video Examples
See it in action
Why gVideo
Built for results
One photo, one script, done
No green screen, no teleprompter, no video booking. Upload a portrait photo, paste 1–3 sentences, get a fully lip-synced talking avatar in under a minute.
Built-in TTS or bring your own audio
HeyGen V3 synthesizes natural narration in 30+ languages. Omnihuman takes a raw audio file (your voice, a pro VO, a licensed clip) and drives whole-body motion from it.
Same credit pool as 9 video models
One subscription covers HeyGen V3, Omnihuman, Kling 3.0, Veo 3.1, Sora 2 Pro, and 5 more. Mix avatar shots with b-roll from t2v models in the same project.
Commercial rights on every paid plan
Generated avatars are yours to use in ads, product videos, tutorials, and social posts. No per-render licensing, no separate enterprise contract.
Model Recommendation
Best model for this use case
HeyGen V3 is the right starting point for most avatar work — photo + script renders a broadcast-quality talking presenter at ~$2 per 30-second clip. Omnihuman is the upgrade pick when you need full-body motion synced to an audio track.
“Cut my explainer-video production time from 3 days to an afternoon. Script → photo → avatar render, straight into the edit.”
Common questions
When does the Avatar Studio launch?
The two avatar models (HeyGen V3, Omnihuman) are integrated in our model catalog with validated pricing. What's pending is the Studio UI — photo upload + script/audio input flow. Pro-plan subscribers get early access the day it ships. Sign up now to be first in line.
What's the difference between HeyGen V3 and Omnihuman?
HeyGen V3 = photo + built-in TTS → head-and-shoulders talking avatar (~$2 per 30s). Omnihuman = photo + your own audio → full-body avatar with torso and gesture motion (~$8 per 30s). Use HeyGen for explainer-style content; use Omnihuman when body language matters.
What kind of photo do I need?
A front-facing or 3/4-facing portrait, well lit, 512px+ on the short edge. Higher resolution (1024px+) produces sharper output. One person per photo — crowds or group shots won't work on V3.
Can I use my own voice for the avatar?
Yes — both models support bring-your-own-audio. Record a voice memo, upload a pro VO file, or use any licensed audio (MP3, WAV, AAC, OGG, M4A). If you'd rather skip the recording step, HeyGen V3's built-in TTS covers 30+ languages.
Is the avatar output commercial-safe?
Yes on all paid plans. gVideo passes through HeyGen's and ByteDance Omnihuman's commercial license terms. Generated avatars can be used in paid ads, client deliverables, and product marketing. Free-tier generations are for personal use.
How much does an avatar render cost?
On the Pro plan ($39.99/mo, 1,800 credits): HeyGen V3 is ~90 credits per 30-second clip (about $2.00), Omnihuman is ~360 credits per 30s (about $8.00). Prices drop at higher tiers; Premium ($79.99, 4,000 credits) brings per-credit cost down another ~10%.
Ready to generate?
Start free — 100 credits on signup, no credit card required.