One channel, multiple content types. Switch between a calm tutorial, an upbeat launch video, and a punchy Short with a single selection — no settings to rebuild.
Feature page
Voice styles that match your brand, format, and audience.
Voice styles are reusable narration presets that define how your AI voiceover sounds — the tone, pacing, energy, and delivery pattern — across every video you produce. Define once. Apply everywhere.
Problem
Why voice styles matter
You sound different when you teach a colleague than when you pitch an investor. Your videos should too.
Most AI voice tools give you a voice selector and a speed slider. That is it. Every video sounds the same — or you manually reconfigure settings every single time. At scale, that creates two problems:
Without locked presets, team members pick different voices, speeds, and tones. Your tutorial sounds like a sales pitch. Your product demo sounds like a bedtime story.
Copying voice settings across 15 pipeline runs per week is busywork. Forget one field, and a video goes out with the wrong energy.
Voice styles solve both. Define the style once. Apply it forever. Change it in one place when your brand evolves.
Anatomy
What is a voice style in Outbox?
A voice style is a saved configuration bundle that controls how your narration renders. Each style combines six settings into a reusable preset:
| Setting | What it controls | Example |
|---|---|---|
| Voice ID | Which voice profile renders the audio | echo, onyx, coral |
| Voice speed | Playback rate from 0.25x to 4.0x | 1.0 for tutorials, 1.15 for Shorts |
| Narrator brief | Plain-language tone and delivery description | Calm, technical, unhurried. Like pair-programming. |
| Style hint | Audience-facing tone context | Professional but approachable |
| Audience hint | Who is watching | SaaS founders, 30-45, technical |
| Environment hint | Content type context | Screen recording tutorial |
Instead of setting these six fields every time you start a pipeline run, you select a style. One click. Done.
Pipeline
How voice styles work in the pipeline
Pick a saved preset or let your workspace default apply.
The voiceover stage activates after script approval.
All six fields from your style are applied to the voice rendering engine.
The rendered track passes to alignment and auto captions automatically.
Templates
Built-in style templates
Outbox ships with eight starter templates covering the most common video formats. Use them as-is or duplicate and customize.
| Style | Voice | Speed | Character | Best for |
|---|---|---|---|---|
| Tutorial — Calm | echo | 1.0x | Clear, unhurried, technical | Dev walkthroughs, how-to videos |
| Explainer — Warm | ash | 1.05x | Warm, steady, conversational | Feature overviews, product education |
| Commentary — Authoritative | onyx | 1.05x | Deep, measured, confident | Thought leadership, industry analysis |
| Product Launch — Upbeat | coral | 1.1x | Bright, energetic, articulate | Feature releases, launch announcements |
| Shorts — Punchy | nova | 1.15x | Fast, energetic, direct | YouTube Shorts, TikTok clips, Reels |
| Course — Educational | sage | 0.95x | Calm, knowledgeable, patient | Online courses, educational series |
| Story — Cinematic | ballad | 1.0x | Smooth, measured, dramatic | Case studies, brand storytelling |
| Demo — Founder | verse | 1.05x | Refined, premium, confident | Product demos, investor updates |
Custom
Creating a custom voice style
Something descriptive: 'Client X — Tutorial' or 'Main Channel — Commentary.'
Choose from 11 voice profiles powered by OpenAI's TTS engine.
0.95x for patient tutorials. 1.15x for Shorts. Dial it in.
Describe delivery in plain language. Be specific — not just 'professional.'
Style hint, audience hint, and environment hint shape the output.
Your style is available across all future pipeline runs.
| Format | Narrator brief |
|---|---|
| SaaS product demo | Confident, measured pace. Short pauses between features. Sounds like a founder walking through their own product — proud but not pushy. |
| Developer tutorial | Calm, clear, and technical. No hype. Explain like pair-programming with a colleague who's two levels junior. |
| Faceless explainer | Warm but authoritative. Slightly faster than conversational. Think documentary narrator for a tech audience. |
| YouTube Shorts | High energy, direct, zero filler. Get to the point in the first two seconds. Think news anchor doing a headline. |
| E-commerce walkthrough | Friendly, upbeat, concise. Highlight benefits without overselling. Natural energy — not a car commercial. |
Formats
One channel. Multiple styles.
Running a single YouTube channel does not mean every video sounds the same. Different content formats need different delivery:
| Content type | Voice style | Why |
|---|---|---|
| Weekly tutorial | Tutorial — Calm | Viewers are learning. Slow, clear delivery helps retention. |
| Feature release | Product Launch — Upbeat | It's a launch. The energy should match the moment. |
| Industry commentary | Commentary — Authoritative | Credibility matters. Measured delivery signals expertise. |
| YouTube Short | Shorts — Punchy | 60 seconds. No room for warm-up. Fast and direct. |
| Customer case study | Story — Cinematic | You're telling a story. Pacing should be smooth and intentional. |
Create all five styles once. Select the right one per run. Your channel has a consistent identity across formats — without adjusting settings every time.
Scale
Multi-channel voice management
Agencies and operators running multiple channels face a harder version of the same problem: every channel needs its own voice identity.
Without styles, you are copying narrator briefs between projects and hoping the new hire remembers which voice "Client X" uses. With styles, create named presets per client — voice, speed, brief, and context hints are all locked in. Your team selects the right style per run. No configuration drift. No cross-client contamination.
Combined with Team Workspaces, workspace admins can enforce which styles are available and lock brand-level base instructions that apply across every style.
Consistency
Voice styles and brand consistency
Voice styles work within Outbox's three-tier instruction system to ensure brand consistency at scale:
| Layer | Controls | Who sets it |
|---|---|---|
| Provider-safe instructions | Baseline audio quality — clean pacing, crisp pronunciation | Outbox (always active) |
| Base instructions | Brand-wide voice rules — pronunciation, forbidden phrases, quality floor | Workspace admin |
| Voice style | Per-format or per-channel tone, energy, and delivery | You (selected per run) |
Your admin can lock rules like "never use filler phrases" or enforce product name pronunciation — and those rules apply inside every style, automatically. Individual creators control the creative direction per video. The brand guardrails stay intact.
Comparison
Voice styles vs. manual configuration
| Dimension | Manual voice config | Voice styles |
|---|---|---|
| Setup per run | Set 6 fields manually every time | Select a style from the dropdown |
| Consistency | Depends on who remembers | Locked into the style preset |
| Team handoff | Check Notion for voice settings | Use the Tutorial — Calm style |
| Brand updates | Update every future run manually | Edit the style once; all runs inherit |
| Multi-channel | Copy-paste configs between channels | Named styles per channel |
| Onboarding | Long training on voice settings | Pick a style from the list |
Examples
Real-world style configurations
Clear, technical, unhurried. Explain like pairing with a mid-level engineer. Pause briefly before key concepts. No excitement — just calm competence.
Warm, premium delivery. Proud of the product but not pushy. Short pauses between feature demonstrations. Think Apple keynote meets indie dev.
Authoritative and measured. Think Bloomberg anchor meets podcast host. No excitement — just confident clarity. Emphasize data points and conclusions.
High energy, direct, zero filler. Get to the point in the first second. Punchy sentences. Think news anchor doing a rapid-fire headline segment.
Audience
Who uses voice styles?
Create named, locked presets for every client. Your team applies the right style per run. No configuration drift. No cross-client voice contamination.
Running 3-5 channels means 15+ pipeline runs per week. Without styles, that is 90+ manual configurations. With styles, it is a dropdown.
Building a 40-lesson course? Lock a style and apply it to every run. Update pacing for the entire series by editing the style once.
Connected
How voice styles connect to the stack
Voice styles configure how voiceover renders. Every style maps to voiceover settings.
Captions are generated from voiced audio. Style-driven pacing directly affects caption timing.
Admins manage approved styles. Base instructions apply across all styles for brand enforcement.
FAQ
Common questions about voice styles
How many voice styles can I create?
No hard limit. Create as many styles as your workflow needs — one per channel, one per content format, one per client.
Can I change a style after creating it?
Yes. Edit any style at any time. Changes apply to all future pipeline runs that use that style. Previously rendered videos stay unchanged.
What happens if I run without selecting a style?
Outbox uses your workspace default voice configuration. If no default is set, standard settings apply (alloy, 1.0x speed, no narrator brief). Set a workspace-level default so every run starts with your preferred config.
Can workspace admins restrict which styles are available?
Yes. With Team Workspaces, admins control which styles are available and lock base instructions that apply across all styles. Individual creators choose from the approved list.
Do voice styles affect caption generation?
Indirectly. Styles control narration pacing and speed. The auto captions stage generates timed subtitles from that audio — a faster style produces more compressed captions, a slower style produces more readable ones.
Do styles work with future voice providers?
The voice style system is provider-agnostic. When additional providers like ElevenLabs ship, existing styles transfer — voice ID mapping is handled at the rendering layer.
Get started
Define the voice once. Apply it everywhere.
Start with a built-in template. Customize the narrator brief and speed to match your brand. Every future video inherits your voice identity automatically — across channels, formats, and team members.