Photo + audio → full-body avatar
Omnihuman v1.5 — Drive a Whole-Body Avatar From One Photo
Upload a photo and an audio track; Omnihuman drives head pose, facial expression, torso motion, and gesture, all synced to the speech signal. Launching inside the gVideo Avatar Studio — same credit pool as Kling 3.0, Veo 3.1, Sora 2 Pro, and 6 others.
Avatar Studio
Generate with Omnihuman in Studio
Omnihumanneeds a photo plus an audio track to animate a full-body avatar — larger inputs than an inline box can handle. Open Studio’s avatar mode to upload both and generate directly from the same credit pool you already use for Kling, Veo, Sora and the rest.
What’s different
Why creators reach for Omnihuman
Whole-body motion, not just head + shoulders
Unlike most talking-avatar tools that only animate the face, Omnihuman drives torso posture, shoulder position, and hand gestures from the audio track's rhythm and emphasis. Characters feel like they're actually presenting, not reading a teleprompter.
Bring-your-own audio (not tied to TTS)
Omnihuman takes an audio file as input — your own voice recording, a professionally produced VO, licensed music with vocals, anything. No built-in TTS means no language lock and no voice-match limitation.
Trained on diverse movement data
ByteDance trained v1.5 on a large corpus of single-person speaking videos across age, ethnicity, and body type ranges. The output holds up across subjects far better than earlier audio-driven avatar models.
Premium tier on gVideo — worth the credit draw
At $0.16/s fal wholesale and 12 credits / second on our Pro plan, a 30-second Omnihuman render runs about $8. Steeper than HeyGen V3, but the full-body motion quality is the main reason to use it vs. cheaper talking-head models.
Sample generations
Credits
Omnihuman credit cost on gVideo
Omnihuman costs 360 credits per 30-second video. All 9 models share a single credit pool under your gVideo subscription.
Omnihuman is billed at 12 credits per second (roughly $0.27 / s at the Pro plan). One 30-second full-body avatar render ≈ 360 credits ≈ $8.00 — the highest per-render cost in the gVideo catalog. The audio is bring-your-own (no TTS surcharge).
Common questions about Omnihuman
When does Omnihuman launch on gVideo?
The ByteDance Omnihuman endpoint is already integrated in our model catalog with validated pricing. What's pending is the Avatar Studio UI — photo upload + audio upload flow. Pro-plan subscribers get early access the day it ships. Join the waitlist on the pricing page to get notified.
What inputs does Omnihuman need?
A photo (full-body or half-body recommended, front or 3/4 facing, 720px+ on the short edge) and an audio file (MP3, WAV, AAC, M4A, or OGG). Omnihuman does NOT use TTS — you bring the voice. If you need TTS, use HeyGen V3 instead.
How does Omnihuman compare to HeyGen V3?
HeyGen V3 = head + shoulders, built-in TTS, ~$2 per 30s. Omnihuman = full-body motion, bring-your-own audio, ~$8 per 30s. Use HeyGen for talking-head explainers where you want quick TTS; use Omnihuman for keynote-style or social-first content where body language matters.
Can I use Omnihuman output commercially?
Yes on all paid plans. Commercial usage rights are included with every paid tier on gVideo. ByteDance's Omnihuman license permits commercial use on fal's API tier, which we pass through.
What aspect ratios are supported?
16:9 landscape, 9:16 vertical, and 1:1 square. Vertical 9:16 is the common choice for social posts; 16:9 for keynote / presentation use.
Is the generated motion copyrighted to the audio source?
Motion is an Omnihuman-generated derivative — not a recording of a real person. However, if the audio you feed in is someone else's copyrighted speech, the audio portion still carries its original license. Use your own recordings, public-domain, or licensed audio to stay clean.
Ready to generate with Omnihuman?
Start free — 100 credits on signup, no credit card required.