How It Works

Deep Word only needs video and audio input to create your videos. For the video input, you can select one of our video actors or upload your own. For the audio input, you can select one of our samples, type what you want your video actor to say (text-to-speech), or upload your own audio. Deep Word will sync the lip and jaw movements of this video actor with your selected audio in minutes.

Custom URL Setup

Scaled Video Campaigns

Upload Guidelines

Note: The length of your video and audio inputs DO NOT have to be the same length. Deep Word works by repeatedly looping the video of your actor forwards and backwards until it reaches the length of your audio input. Our servers then modify the actor’s lip and jaw movements to sync with your audio.

For example, if your uploaded video is 20 seconds long, and your typed or uploaded audio is 60 seconds long, Deep Word will automatically loop your video forwards (20 seconds), backwards (20 seconds), and then forwards again (20 seconds) to reach the length of your 60 second audio.

Video Input

The person you want talking

Audio Input

The words you want them to say

Guidelines

This is a short video clip of the actor you want talking. For best results:

Not too close. Not too far. 5-10 feet away from the camera is optimal, approximately waist up.
Looking directly at the camera.
Stationary (seated or standing) and not moving excessively.
Against a solid colored background, that contrasts the clothing and skin tone of the actor.
Actor’s lips should not be moving, but they can be making facial expressions and hand gestures as long their head is not moving very much.
No obstructions to the nose, lips, or jaw. Even a single rogue frame can produce poor results.