Back to features

Feature page

Caption styles that match your brand — on every platform, every format, every video.

Three built-in presets control font, color, outline, shadow, and positioning. Pick one and every video in your workspace renders with the same branded look. No subtitle editor. No per-video styling.

TL;DR: Outbox ships three caption presets — Classic Bold, Minimal Clean, and Highlight Brand — that control every visual property of your burned-in captions. Presets adapt to aspect ratio automatically, respect platform safe areas, and lock per workspace for brand consistency across hundreds of videos.
3 presetsResponsive sizingPlatform-safe marginsWorkspace lock2 timing modesEffect-driven
Caption styles
Pick a preset. Done.
3 presets
WELCOME TO OUTBOX
classic_boldClassic Bold
Welcome to Outbox
minimal_cleanMinimal Clean
WELCOME TO OUTBOX
highlight_brandHighlight Brand

Problem

What makes caption styling hard at scale?

Styling captions is a separate job from generating captions. Even after you have timed subtitles, you still need to decide: What font? What size? Bold or regular? White text with a black outline or colored text with a glow?

Multiply that by three aspect ratios, five videos a week, and a team of three editors — and "just pick a font" becomes a brand consistency nightmare.

Outbox reduces this to one decision: pick a preset. The preset handles every visual detail, scales to any resolution, and stays locked across your workspace.

Questions you stop asking
  • What font and size for this video?
  • Does it look right on a phone in portrait?
  • Will TikTok's share button cover my text?
  • Does it match the captions from last week?
  • Did the freelancer use the right style guide?
Smartphone displaying video content with captions

Presets

Three styles. Zero manual work.

classic_bold
Classic Bold

Large white text. Strong black outline. Center-bottom placement.

Clear and reliable
FontArial
Font size36–40px (responsive)
ColorWhite (#FFFFFF)
Outline3.0 — heavy black outline for maximum contrast
Shadow1.0 — subtle drop shadow
BoldYes
UppercaseNo (configurable per run)
Best forYouTube long-form, tutorials, product demos, explainer videos
minimal_clean
Minimal Clean

Smaller text. Subtle shadow. Lower-third positioning.

Quiet and professional
FontHelvetica
Font size34–38px (responsive)
ColorWhite (#FFFFFF)
Outline1.6 — softer outline, semi-transparent black
Shadow0.6 — barely-there shadow for depth
BoldNo
UppercaseNo
Best forSaaS demos, course videos, professional content, LinkedIn
highlight_brand
Highlight Brand

Bold text with accent color. Center placement. Uppercase forced.

Loud and scroll-stopping
FontArial
Font size38–42px (responsive — largest preset)
ColorGolden yellow (#FFD916)
Outline3.4 — heaviest outline for feed visibility
Shadow1.0 + glow effect
BoldYes
UppercaseYes (forced)
Best forYouTube Shorts, TikTok, Reels, high-energy short-form

Comparison

Side-by-side preset comparison

PropertyClassic BoldMinimal CleanHighlight Brand
FontArialHelveticaArial
Font size36–40px34–38px38–42px
ColorWhiteWhiteGolden yellow (#FFD916)
Outline3.0 (heavy)1.6 (soft)3.4 (heaviest)
Shadow1.00.61.0 + glow
BoldYesNoYes
UppercaseOptionalNoForced
PositionCenter-bottomLower-thirdCenter-screen
VibeClear and reliableQuiet and professionalLoud and scroll-stopping

Timing

How text chunks appear on screen

Caption styling isn't just visual — it's temporal. How many words appear at once and how fast they cycle affects readability and energy. Outbox offers two timing modes.

Timing modeWords per chunkPacePause thresholdBest for
short1–4 (target: 3)Fast0.14sShorts, Reels, TikTok, punchy delivery
cinematic2–8 (target: 6)Relaxed0.24sLong-form YouTube, tutorials, walkthroughs

Short mode keeps chunks tight — 2–3 words at a time, flipping quickly. Think Hormozi Shorts or viral TikToks. Cinematic mode lets longer sentences breathe — natural for 10-minute tutorials.

The timing mode pairs with the visual preset. highlight_brand + short gives high-energy short-form. minimal_clean + cinematic produces calm professional long-form.

Timing modes
Fast or relaxed. You pick.
Short mode
Ship
videos,
not edits.
1–4 words · fast cadence
Cinematic mode
Ship videos, not edits.
2–8 words · relaxed cadence

Responsive

One preset, every resolution and ratio

Font sizes aren't fixed pixel values — they scale with your video resolution and aspect ratio. Portrait video gets slightly larger text and wider margins because phone screens demand it. Landscape gets tighter because viewers sit further from larger screens.

Aspect ratioResolutionClassic Bold font sizeVertical margin
16:9 landscape1920×108036px76px
9:16 portrait1080×192038px80px
1:1 square1080×108036px76px
4K landscape3840×2160Scales proportionallyScales proportionally
Responsive layouts
Same preset. Every screen size.
Welcome to Outbox
16:9
WELCOME TO OUTBOX
9:16
Welcome to Outbox
1:1

Platforms

Platform-safe positioning out of the box

Every platform overlays its own UI on top of your video. If your captions sit in the wrong spot, platform chrome covers your text. Outbox's presets include safe-area margins that keep captions clear of platform UI for each aspect ratio.

PlatformDanger zoneOutbox behavior
YouTube (16:9)Bottom 48px — progress bar and controlsCaptions placed above the control area with 76px+ margin
YouTube ShortsBottom 120px — description, sound, subscribeCenter-screen placement avoids bottom UI entirely
TikTokRight 80px — share, comment, like buttonsText width constrained to avoid right-side overlap
Instagram ReelsBottom 140px — likes, comments, share, music labelSafe-area margins push captions above interaction zone

You don't configure this. The preset knows the format. The layout adjusts.

Effects

Script effects shape caption presentation

Your script segments carry an effects field from the scripting stage. Caption styles respond to those effects — adjusting presentation per segment without you changing settings mid-run.

Script effectStyle adjustment
upbeatSlightly larger font, faster chunk transitions
clearHigh-contrast white on black, no decoration
instructionalStable lower-third positioning, smaller font
technicalTighter layout, narrower line lengths
professionalRestrained styling, clean lines
fade_outReduced opacity near the segment end

A video that opens with an upbeat hook and transitions into a technical walkthrough gets caption styling that matches the energy shift — automatically.

Technology

ASS subtitle format under the hood

Outbox renders captions using the ASS (Advanced SubStation Alpha) subtitle format — the same format used in professional subtitle production. ASS supports precise font control, per-character color, exact positioning, and timed effects.

Script Voiceover Align Captions [your_preset].ass FFmpeg burn-in Final video

Result: captions baked into the video file. Viewers see them always — no subtitle toggle. This is how every major short-form creator ships caption content.

Consistency

Workspace-level preset lock for teams

When multiple people produce content for the same brand, caption consistency breaks down. One editor picks Arial Bold. Another uses Helvetica Light. A freelancer defaults to CapCut.

Outbox solves this at the workspace level. The admin sets the default caption preset and timing mode. Every pipeline run starts with the locked configuration.

RoleCan override preset?
Workspace adminSets the default, can override per run
Team memberUses the workspace default, can override if admin allows
TemplateLocks preset per template — overrides workspace default

A 50-video batch across three editors produces identical caption styling without anyone remembering to check the style guide.

→ Related: Team Workspaces

Iteration

Switch styles without re-processing everything

Changed your mind about the caption look after your first run? Outbox's stage isolation architecture means you can re-run from the captions stage only. Analysis, script, voiceover, and alignment stay cached.

Example

You rendered 10 videos with classic_bold last week. The client now wants minimal_clean. Re-run from the captions stage. Voiceover doesn't re-generate. Alignment doesn't re-sync. Only captions, editing, and rendering re-execute.

Stage isolation
Change the style. Keep the rest cached.
Analyzecached
Scriptcached
Voiceovercached
Aligncached
Captionsre-run
Editre-run
Renderre-run
Metadatare-run
Publishre-run

Examples

Real-world configuration recipes

YouTube 16:9
Faceless explainer channel
Preset: classic_bold
Timing mode: cinematic
Uppercase: true
Aspect ratio: 16:9

Large bold text at the bottom of the frame. All caps. Cinematic pacing for comfortable reading on desktop and TV.

TikTok / Shorts / Reels 9:16
Viral short-form content
Preset: highlight_brand
Timing mode: short
Uppercase: true
Highlight: #adff2f
Aspect ratio: 9:16

Center-screen golden text. Fast chunk cycling. Neon green active-word highlighting for the karaoke effect.

LinkedIn / YouTube 16:9
SaaS product demo
Preset: minimal_clean
Timing mode: cinematic
Uppercase: false
Aspect ratio: 16:9

Lower-third Helvetica text that doesn't compete with the screen recording. Sentence case for professional tone.

Education 16:9
Course module
Preset: classic_bold
Timing mode: cinematic
Uppercase: false
Aspect ratio: 16:9

Readable classic styling without uppercase. Cinematic pacing matches the deliberate teaching rhythm.

vs Manual

Caption styles vs. manual subtitle styling

DimensionManual workflowOutbox Caption Styles
Style setupConfigure font, size, color, position per video in your editorSelect a preset once — applied to every run
Multi-formatResize and reposition for each aspect ratio manuallyResponsive sizing and safe-area margins — automatic
Brand consistencyDepends on every editor following the same style guidePreset locked at workspace level — enforced automatically
Style change across videosRe-edit every video one at a timeRe-run from the captions stage — bulk update
Platform safetyTest on each platform, adjust margins by trial and errorBuilt-in safe-area margins per format
Effect-driven stylingManually adjust styling per section / timeline regionScript effects drive style adjustments per segment automatically

Reference

Every style property you can control

PropertyWhat it controlsRange
Font nameTypeface for caption textArial, Helvetica
Font sizeBase text size (scales with resolution)34–42px base
Primary colorMain text color#FFFFFF, #FFD916
Outline widthThickness of the text outline1.6 (soft) to 3.4 (heavy)
Outline colorColor of the text outlineBlack, semi-transparent
ShadowDrop shadow depth0.6 (subtle) to 1.0 (standard)
BoldFont weightOn / off
UppercaseForce all-caps renderingOn / off
Vertical marginDistance from bottom edge76–80px base
Highlight colorAccent color for active-word highlighting#adff2f

Audience

Who needs caption styles?

YouTube operators (5+ videos/week)

At volume, consistency matters more than creativity. Your audience expects a recognizable look. Caption styles lock that look across every video without per-video effort.

Agencies managing client brands

Different clients want different visual identities. Workspace-level presets mean each client workspace has its own caption style — no cross-contamination between brands.

Short-form creators

The highlight_brand preset with short timing mode produces high-energy, karaoke-style captions that perform on TikTok and Shorts — generated in your pipeline, not hand-animated in CapCut.

Educators and course creators

Captions should help students learn, not distract them. minimal_clean with cinematic timing produces quiet, readable captions that respect the educational content.

Connected

How caption styles connect to the stack

Related feature

Generates the timed caption events that styles are applied to.

Related feature

Produces the script and effect tags that drive per-segment style adjustments.

Related feature

Voice pacing determines caption chunk timing — faster delivery means shorter display windows.

Related feature

Narration energy pairs with caption energy. Upbeat voice matches highlight_brand.

Related feature

Locks presets per workspace — the enforcement layer for brand-consistent styling.

FAQ

Common questions about caption styles

Can I create custom presets beyond the three built-in ones?

Not yet. Outbox ships with three presets that cover the most common use cases. Custom preset creation is on the roadmap for teams that need bespoke styling — custom brand colors, specific fonts, unique positioning rules.

Can I use different presets for different videos in the same workspace?

Yes. The workspace default sets the starting point, but individual pipeline runs can override the preset. Useful when a workspace produces both long-form (Classic Bold) and short-form (Highlight Brand) content.

Do caption styles affect active-word highlighting?

Yes. Active-word highlighting uses the preset's highlight color (default: #adff2f neon green). The highlight color is configurable per run and the visual intensity adapts to the preset.

What happens to captions on a video I've already rendered?

Captions are burned into the video file. To change the style, re-run from the captions stage. Upstream stages (analyze, script, voiceover, align) stay cached.

Do styles look the same across YouTube, TikTok, and Instagram?

The caption styling itself (font, color, outline) is identical — it's baked into the video pixels. What changes is the layout: positioning and margins adapt to each platform's safe areas.

How do I switch timing modes?

Set caption_timing_mode to 'short' or 'cinematic' in your pipeline configuration, or select it in the Outbox dashboard when configuring a run.

Will custom fonts be supported?

Custom font support is planned. Current presets use Arial and Helvetica because they're available on every rendering environment. Custom fonts will require font file upload to your workspace.

Get started

Pick a preset. Run the pipeline. Captions match your brand.

Upload your footage. Select a caption preset. Choose a timing mode. Styled captions generate, burn into the video, and flow to publishing — automatically. Not sure? Start with classic_bold. It's the default for a reason.