Feature page

Voice styles that match your brand, format, and audience.

Voice styles are reusable narration presets that define how your AI voiceover sounds — the tone, pacing, energy, and delivery pattern — across every video you produce. Define once. Apply everywhere.

Join the waitlist See pricing

TL;DR: Voice styles give your videos a consistent sonic identity. Define a style once — voice, speed, narrator brief, audience context — and apply it to every run. Switch styles per channel or format. No more copying settings between projects or hoping the tone stays consistent across 20 videos.

8 built-in templatesUnlimited custom stylesPer-channel presetsBrand-locked delivery

Voice styles

One dropdown. Consistent delivery.

4 saved

Tutorial — Calm

echo · 1.0x

Apply

Product Launch — Upbeat

coral · 1.1x

Apply

Commentary — Authoritative

onyx · 1.05x

Apply

Shorts — Punchy

nova · 1.15x

Apply

Problem

Why voice styles matter

You sound different when you teach a colleague than when you pitch an investor. Your videos should too.

Most AI voice tools give you a voice selector and a speed slider. That is it. Every video sounds the same — or you manually reconfigure settings every single time. At scale, that creates two problems:

Brand drift

Without locked presets, team members pick different voices, speeds, and tones. Your tutorial sounds like a sales pitch. Your product demo sounds like a bedtime story.

Configuration fatigue

Copying voice settings across 15 pipeline runs per week is busywork. Forget one field, and a video goes out with the wrong energy.

Voice styles solve both. Define the style once. Apply it forever. Change it in one place when your brand evolves.

Professional studio headphones on a mixing console with purple ambient lighting

Anatomy

What is a voice style in Outbox?

A voice style is a saved configuration bundle that controls how your narration renders. Each style combines six settings into a reusable preset:

Setting	What it controls	Example
Voice ID	Which voice profile renders the audio	echo, onyx, coral
Voice speed	Playback rate from 0.25x to 4.0x	1.0 for tutorials, 1.15 for Shorts
Narrator brief	Plain-language tone and delivery description	Calm, technical, unhurried. Like pair-programming.
Style hint	Audience-facing tone context	Professional but approachable
Audience hint	Who is watching	SaaS founders, 30-45, technical
Environment hint	Content type context	Screen recording tutorial

Instead of setting these six fields every time you start a pipeline run, you select a style. One click. Done.

Pipeline

How voice styles work in the pipeline

Select a voice style

Pick a saved preset or let your workspace default apply.

Pipeline reaches voiceover

The voiceover stage activates after script approval.

Style settings injected

All six fields from your style are applied to the voice rendering engine.

Audio flows downstream

The rendered track passes to alignment and auto captions automatically.

Pipeline flow

Analyze -> Script -> Voiceover (style applied) -> Align -> Captions -> Edit -> Render -> Metadata -> Publish

Templates

Built-in style templates

Outbox ships with eight starter templates covering the most common video formats. Use them as-is or duplicate and customize.

Style	Voice	Speed	Character	Best for
Tutorial — Calm	echo	1.0x	Clear, unhurried, technical	Dev walkthroughs, how-to videos
Explainer — Warm	ash	1.05x	Warm, steady, conversational	Feature overviews, product education
Commentary — Authoritative	onyx	1.05x	Deep, measured, confident	Thought leadership, industry analysis
Product Launch — Upbeat	coral	1.1x	Bright, energetic, articulate	Feature releases, launch announcements
Shorts — Punchy	nova	1.15x	Fast, energetic, direct	YouTube Shorts, TikTok clips, Reels
Course — Educational	sage	0.95x	Calm, knowledgeable, patient	Online courses, educational series
Story — Cinematic	ballad	1.0x	Smooth, measured, dramatic	Case studies, brand storytelling
Demo — Founder	verse	1.05x	Refined, premium, confident	Product demos, investor updates

Custom

Creating a custom voice style

Name your style

Something descriptive: 'Client X — Tutorial' or 'Main Channel — Commentary.'

Pick a voice

Choose from 11 voice profiles powered by OpenAI's TTS engine.

Set the speed

0.95x for patient tutorials. 1.15x for Shorts. Dial it in.

Write a narrator brief

Describe delivery in plain language. Be specific — not just 'professional.'

Add context hints

Style hint, audience hint, and environment hint shape the output.

Save and apply

Your style is available across all future pipeline runs.

Writing effective narrator briefs

Format	Narrator brief
SaaS product demo	Confident, measured pace. Short pauses between features. Sounds like a founder walking through their own product — proud but not pushy.
Developer tutorial	Calm, clear, and technical. No hype. Explain like pair-programming with a colleague who's two levels junior.
Faceless explainer	Warm but authoritative. Slightly faster than conversational. Think documentary narrator for a tech audience.
YouTube Shorts	High energy, direct, zero filler. Get to the point in the first two seconds. Think news anchor doing a headline.
E-commerce walkthrough	Friendly, upbeat, concise. Highlight benefits without overselling. Natural energy — not a car commercial.

Formats

One channel. Multiple styles.

Running a single YouTube channel does not mean every video sounds the same. Different content formats need different delivery:

Content type	Voice style	Why
Weekly tutorial	Tutorial — Calm	Viewers are learning. Slow, clear delivery helps retention.
Feature release	Product Launch — Upbeat	It's a launch. The energy should match the moment.
Industry commentary	Commentary — Authoritative	Credibility matters. Measured delivery signals expertise.
YouTube Short	Shorts — Punchy	60 seconds. No room for warm-up. Fast and direct.
Customer case study	Story — Cinematic	You're telling a story. Pacing should be smooth and intentional.

Create all five styles once. Select the right one per run. Your channel has a consistent identity across formats — without adjusting settings every time.

Scale

Multi-channel voice management

Agencies and operators running multiple channels face a harder version of the same problem: every channel needs its own voice identity.

Without styles, you are copying narrator briefs between projects and hoping the new hire remembers which voice "Client X" uses. With styles, create named presets per client — voice, speed, brief, and context hints are all locked in. Your team selects the right style per run. No configuration drift. No cross-client contamination.

Combined with Team Workspaces, workspace admins can enforce which styles are available and lock brand-level base instructions that apply across every style.

Multi-channel management

Named styles per client. Zero cross-contamination.

3 clients

Acme Corp

Product DemosTutorials

DevStream

CommentaryShorts

LearnFast

Course LecturesPromos

Consistency

Voice styles and brand consistency

Voice styles work within Outbox's three-tier instruction system to ensure brand consistency at scale:

Layer	Controls	Who sets it
Provider-safe instructions	Baseline audio quality — clean pacing, crisp pronunciation	Outbox (always active)
Base instructions	Brand-wide voice rules — pronunciation, forbidden phrases, quality floor	Workspace admin
Voice style	Per-format or per-channel tone, energy, and delivery	You (selected per run)

Your admin can lock rules like "never use filler phrases" or enforce product name pronunciation — and those rules apply inside every style, automatically. Individual creators control the creative direction per video. The brand guardrails stay intact.

Comparison

Voice styles vs. manual configuration

Dimension	Manual voice config	Voice styles
Setup per run	Set 6 fields manually every time	Select a style from the dropdown
Consistency	Depends on who remembers	Locked into the style preset
Team handoff	Check Notion for voice settings	Use the Tutorial — Calm style
Brand updates	Update every future run manually	Edit the style once; all runs inherit
Multi-channel	Copy-paste configs between channels	Named styles per channel
Onboarding	Long training on voice settings	Pick a style from the list

Examples

Real-world style configurations

Main Channel — Dev Tutorial

echo1.0x

Narrator brief

Clear, technical, unhurried. Explain like pairing with a mid-level engineer. Pause briefly before key concepts. No excitement — just calm competence.

Style

Technical but approachable

Audience

Developers learning new tools, 25-40

Env

Screen recording tutorial with code

Main Channel — Product Release

coral1.1x

Narrator brief

Warm, premium delivery. Proud of the product but not pushy. Short pauses between feature demonstrations. Think Apple keynote meets indie dev.

Style

Professional, energetic

Audience

SaaS buyers, technical decision-makers, 30-45

Env

Product demo walkthrough

Client — Finance Commentary

onyx1.05x

Narrator brief

Authoritative and measured. Think Bloomberg anchor meets podcast host. No excitement — just confident clarity. Emphasize data points and conclusions.

Style

Premium, institutional

Audience

Retail investors, 30-55, financially literate

Env

Commentary with charts and data overlays

Social Clips — YouTube Shorts

nova1.15x

Narrator brief

High energy, direct, zero filler. Get to the point in the first second. Punchy sentences. Think news anchor doing a rapid-fire headline segment.

Style

Fast, engaging

Audience

Mobile-first scrollers, 18-35

Env

60-second vertical video

Audience

Who uses voice styles?

Solo creators with multiple formats

One channel, multiple content types. Switch between a calm tutorial, an upbeat launch video, and a punchy Short with a single selection — no settings to rebuild.

Agencies managing client channels

Create named, locked presets for every client. Your team applies the right style per run. No configuration drift. No cross-client voice contamination.

Faceless channel operators

Running 3-5 channels means 15+ pipeline runs per week. Without styles, that is 90+ manual configurations. With styles, it is a dropdown.

Course creators

Building a 40-lesson course? Lock a style and apply it to every run. Update pacing for the entire series by editing the style once.

Connected

How voice styles connect to the stack

Related feature

AI Voiceover

Voice styles configure how voiceover renders. Every style maps to voiceover settings.

Related feature

Auto Captions

Captions are generated from voiced audio. Style-driven pacing directly affects caption timing.

Related feature

Team Workspaces

Admins manage approved styles. Base instructions apply across all styles for brand enforcement.

FAQ

Common questions about voice styles

How many voice styles can I create?

No hard limit. Create as many styles as your workflow needs — one per channel, one per content format, one per client.

Can I change a style after creating it?

Yes. Edit any style at any time. Changes apply to all future pipeline runs that use that style. Previously rendered videos stay unchanged.

What happens if I run without selecting a style?

Outbox uses your workspace default voice configuration. If no default is set, standard settings apply (alloy, 1.0x speed, no narrator brief). Set a workspace-level default so every run starts with your preferred config.

Can workspace admins restrict which styles are available?

Yes. With Team Workspaces, admins control which styles are available and lock base instructions that apply across all styles. Individual creators choose from the approved list.

Do voice styles affect caption generation?

Indirectly. Styles control narration pacing and speed. The auto captions stage generates timed subtitles from that audio — a faster style produces more compressed captions, a slower style produces more readable ones.

Do styles work with future voice providers?

The voice style system is provider-agnostic. When additional providers like ElevenLabs ship, existing styles transfer — voice ID mapping is handled at the rendering layer.

Person adjusting audio levels on a studio mixing board with colorful LED indicators

Get started

Define the voice once. Apply it everywhere.

Start with a built-in template. Customize the narrator brief and speed to match your brand. Every future video inherits your voice identity automatically — across channels, formats, and team members.

Join the waitlist Explore AI Voiceover

Feature page

Voice styles that match your brand, format, and audience.

Problem

Why voice styles matter

Anatomy

What is a voice style in Outbox?

Pipeline

How voice styles work in the pipeline

Templates

Built-in style templates

Custom

Creating a custom voice style

Formats

One channel. Multiple styles.

Scale

Multi-channel voice management

Consistency

Voice styles and brand consistency

Comparison

Voice styles vs. manual configuration

Examples

Real-world style configurations

Audience

Who uses voice styles?

Connected

How voice styles connect to the stack

FAQ

Common questions about voice styles

Get started

Define the voice once. Apply it everywhere.

Product

Workflow

Resources

Legal