Back to features

Feature page

Voice styles that match your brand, format, and audience.

Voice styles are reusable narration presets that define how your AI voiceover sounds — the tone, pacing, energy, and delivery pattern — across every video you produce. Define once. Apply everywhere.

TL;DR: Voice styles give your videos a consistent sonic identity. Define a style once — voice, speed, narrator brief, audience context — and apply it to every run. Switch styles per channel or format. No more copying settings between projects or hoping the tone stays consistent across 20 videos.
8 built-in templatesUnlimited custom stylesPer-channel presetsBrand-locked delivery
Voice styles
One dropdown. Consistent delivery.
4 saved
Tutorial — Calm
echo · 1.0x
Apply
Product Launch — Upbeat
coral · 1.1x
Apply
Commentary — Authoritative
onyx · 1.05x
Apply
Shorts — Punchy
nova · 1.15x
Apply

Problem

Why voice styles matter

You sound different when you teach a colleague than when you pitch an investor. Your videos should too.

Most AI voice tools give you a voice selector and a speed slider. That is it. Every video sounds the same — or you manually reconfigure settings every single time. At scale, that creates two problems:

Brand drift

Without locked presets, team members pick different voices, speeds, and tones. Your tutorial sounds like a sales pitch. Your product demo sounds like a bedtime story.

Configuration fatigue

Copying voice settings across 15 pipeline runs per week is busywork. Forget one field, and a video goes out with the wrong energy.

Voice styles solve both. Define the style once. Apply it forever. Change it in one place when your brand evolves.

Professional studio headphones on a mixing console with purple ambient lighting

Anatomy

What is a voice style in Outbox?

A voice style is a saved configuration bundle that controls how your narration renders. Each style combines six settings into a reusable preset:

SettingWhat it controlsExample
Voice IDWhich voice profile renders the audioecho, onyx, coral
Voice speedPlayback rate from 0.25x to 4.0x1.0 for tutorials, 1.15 for Shorts
Narrator briefPlain-language tone and delivery descriptionCalm, technical, unhurried. Like pair-programming.
Style hintAudience-facing tone contextProfessional but approachable
Audience hintWho is watchingSaaS founders, 30-45, technical
Environment hintContent type contextScreen recording tutorial

Instead of setting these six fields every time you start a pipeline run, you select a style. One click. Done.

Pipeline

How voice styles work in the pipeline

01
Select a voice style

Pick a saved preset or let your workspace default apply.

02
Pipeline reaches voiceover

The voiceover stage activates after script approval.

03
Style settings injected

All six fields from your style are applied to the voice rendering engine.

04
Audio flows downstream

The rendered track passes to alignment and auto captions automatically.

Pipeline flow
Analyze -> Script -> Voiceover (style applied) -> Align -> Captions -> Edit -> Render -> Metadata -> Publish

Templates

Built-in style templates

Outbox ships with eight starter templates covering the most common video formats. Use them as-is or duplicate and customize.

StyleVoiceSpeedCharacterBest for
Tutorial — Calmecho1.0xClear, unhurried, technicalDev walkthroughs, how-to videos
Explainer — Warmash1.05xWarm, steady, conversationalFeature overviews, product education
Commentary — Authoritativeonyx1.05xDeep, measured, confidentThought leadership, industry analysis
Product Launch — Upbeatcoral1.1xBright, energetic, articulateFeature releases, launch announcements
Shorts — Punchynova1.15xFast, energetic, directYouTube Shorts, TikTok clips, Reels
Course — Educationalsage0.95xCalm, knowledgeable, patientOnline courses, educational series
Story — Cinematicballad1.0xSmooth, measured, dramaticCase studies, brand storytelling
Demo — Founderverse1.05xRefined, premium, confidentProduct demos, investor updates

Custom

Creating a custom voice style

01
Name your style

Something descriptive: 'Client X — Tutorial' or 'Main Channel — Commentary.'

02
Pick a voice

Choose from 11 voice profiles powered by OpenAI's TTS engine.

03
Set the speed

0.95x for patient tutorials. 1.15x for Shorts. Dial it in.

04
Write a narrator brief

Describe delivery in plain language. Be specific — not just 'professional.'

05
Add context hints

Style hint, audience hint, and environment hint shape the output.

06
Save and apply

Your style is available across all future pipeline runs.

Writing effective narrator briefs
FormatNarrator brief
SaaS product demoConfident, measured pace. Short pauses between features. Sounds like a founder walking through their own product — proud but not pushy.
Developer tutorialCalm, clear, and technical. No hype. Explain like pair-programming with a colleague who's two levels junior.
Faceless explainerWarm but authoritative. Slightly faster than conversational. Think documentary narrator for a tech audience.
YouTube ShortsHigh energy, direct, zero filler. Get to the point in the first two seconds. Think news anchor doing a headline.
E-commerce walkthroughFriendly, upbeat, concise. Highlight benefits without overselling. Natural energy — not a car commercial.

Formats

One channel. Multiple styles.

Running a single YouTube channel does not mean every video sounds the same. Different content formats need different delivery:

Content typeVoice styleWhy
Weekly tutorialTutorial — CalmViewers are learning. Slow, clear delivery helps retention.
Feature releaseProduct Launch — UpbeatIt's a launch. The energy should match the moment.
Industry commentaryCommentary — AuthoritativeCredibility matters. Measured delivery signals expertise.
YouTube ShortShorts — Punchy60 seconds. No room for warm-up. Fast and direct.
Customer case studyStory — CinematicYou're telling a story. Pacing should be smooth and intentional.

Create all five styles once. Select the right one per run. Your channel has a consistent identity across formats — without adjusting settings every time.

Scale

Multi-channel voice management

Agencies and operators running multiple channels face a harder version of the same problem: every channel needs its own voice identity.

Without styles, you are copying narrator briefs between projects and hoping the new hire remembers which voice "Client X" uses. With styles, create named presets per client — voice, speed, brief, and context hints are all locked in. Your team selects the right style per run. No configuration drift. No cross-client contamination.

Combined with Team Workspaces, workspace admins can enforce which styles are available and lock brand-level base instructions that apply across every style.

Multi-channel management
Named styles per client. Zero cross-contamination.
3 clients
Acme Corp
Product DemosTutorials
DevStream
CommentaryShorts
LearnFast
Course LecturesPromos

Consistency

Voice styles and brand consistency

Voice styles work within Outbox's three-tier instruction system to ensure brand consistency at scale:

LayerControlsWho sets it
Provider-safe instructionsBaseline audio quality — clean pacing, crisp pronunciationOutbox (always active)
Base instructionsBrand-wide voice rules — pronunciation, forbidden phrases, quality floorWorkspace admin
Voice stylePer-format or per-channel tone, energy, and deliveryYou (selected per run)

Your admin can lock rules like "never use filler phrases" or enforce product name pronunciation — and those rules apply inside every style, automatically. Individual creators control the creative direction per video. The brand guardrails stay intact.

Comparison

Voice styles vs. manual configuration

DimensionManual voice configVoice styles
Setup per runSet 6 fields manually every timeSelect a style from the dropdown
ConsistencyDepends on who remembersLocked into the style preset
Team handoffCheck Notion for voice settingsUse the Tutorial — Calm style
Brand updatesUpdate every future run manuallyEdit the style once; all runs inherit
Multi-channelCopy-paste configs between channelsNamed styles per channel
OnboardingLong training on voice settingsPick a style from the list

Examples

Real-world style configurations

Main Channel — Dev Tutorial
echo1.0x
Narrator brief

Clear, technical, unhurried. Explain like pairing with a mid-level engineer. Pause briefly before key concepts. No excitement — just calm competence.

Style
Technical but approachable
Audience
Developers learning new tools, 25-40
Env
Screen recording tutorial with code
Main Channel — Product Release
coral1.1x
Narrator brief

Warm, premium delivery. Proud of the product but not pushy. Short pauses between feature demonstrations. Think Apple keynote meets indie dev.

Style
Professional, energetic
Audience
SaaS buyers, technical decision-makers, 30-45
Env
Product demo walkthrough
Client — Finance Commentary
onyx1.05x
Narrator brief

Authoritative and measured. Think Bloomberg anchor meets podcast host. No excitement — just confident clarity. Emphasize data points and conclusions.

Style
Premium, institutional
Audience
Retail investors, 30-55, financially literate
Env
Commentary with charts and data overlays
Social Clips — YouTube Shorts
nova1.15x
Narrator brief

High energy, direct, zero filler. Get to the point in the first second. Punchy sentences. Think news anchor doing a rapid-fire headline segment.

Style
Fast, engaging
Audience
Mobile-first scrollers, 18-35
Env
60-second vertical video

Audience

Who uses voice styles?

Solo creators with multiple formats

One channel, multiple content types. Switch between a calm tutorial, an upbeat launch video, and a punchy Short with a single selection — no settings to rebuild.

Agencies managing client channels

Create named, locked presets for every client. Your team applies the right style per run. No configuration drift. No cross-client voice contamination.

Faceless channel operators

Running 3-5 channels means 15+ pipeline runs per week. Without styles, that is 90+ manual configurations. With styles, it is a dropdown.

Course creators

Building a 40-lesson course? Lock a style and apply it to every run. Update pacing for the entire series by editing the style once.

Connected

How voice styles connect to the stack

Related feature

Voice styles configure how voiceover renders. Every style maps to voiceover settings.

Related feature

Captions are generated from voiced audio. Style-driven pacing directly affects caption timing.

Related feature

Admins manage approved styles. Base instructions apply across all styles for brand enforcement.

FAQ

Common questions about voice styles

How many voice styles can I create?

No hard limit. Create as many styles as your workflow needs — one per channel, one per content format, one per client.

Can I change a style after creating it?

Yes. Edit any style at any time. Changes apply to all future pipeline runs that use that style. Previously rendered videos stay unchanged.

What happens if I run without selecting a style?

Outbox uses your workspace default voice configuration. If no default is set, standard settings apply (alloy, 1.0x speed, no narrator brief). Set a workspace-level default so every run starts with your preferred config.

Can workspace admins restrict which styles are available?

Yes. With Team Workspaces, admins control which styles are available and lock base instructions that apply across all styles. Individual creators choose from the approved list.

Do voice styles affect caption generation?

Indirectly. Styles control narration pacing and speed. The auto captions stage generates timed subtitles from that audio — a faster style produces more compressed captions, a slower style produces more readable ones.

Do styles work with future voice providers?

The voice style system is provider-agnostic. When additional providers like ElevenLabs ship, existing styles transfer — voice ID mapping is handled at the rendering layer.

Person adjusting audio levels on a studio mixing board with colorful LED indicators

Get started

Define the voice once. Apply it everywhere.

Start with a built-in template. Customize the narrator brief and speed to match your brand. Every future video inherits your voice identity automatically — across channels, formats, and team members.