Large bold text at the bottom of the frame. All caps. Cinematic pacing for comfortable reading on desktop and TV.
Feature page
Caption styles that match your brand — on every platform, every format, every video.
Three built-in presets control font, color, outline, shadow, and positioning. Pick one and every video in your workspace renders with the same branded look. No subtitle editor. No per-video styling.
Problem
What makes caption styling hard at scale?
Styling captions is a separate job from generating captions. Even after you have timed subtitles, you still need to decide: What font? What size? Bold or regular? White text with a black outline or colored text with a glow?
Multiply that by three aspect ratios, five videos a week, and a team of three editors — and "just pick a font" becomes a brand consistency nightmare.
Outbox reduces this to one decision: pick a preset. The preset handles every visual detail, scales to any resolution, and stays locked across your workspace.
- ✕What font and size for this video?
- ✕Does it look right on a phone in portrait?
- ✕Will TikTok's share button cover my text?
- ✕Does it match the captions from last week?
- ✕Did the freelancer use the right style guide?
Presets
Three styles. Zero manual work.
Large white text. Strong black outline. Center-bottom placement.
| Font | Arial |
| Font size | 36–40px (responsive) |
| Color | White (#FFFFFF) |
| Outline | 3.0 — heavy black outline for maximum contrast |
| Shadow | 1.0 — subtle drop shadow |
| Bold | Yes |
| Uppercase | No (configurable per run) |
| Best for | YouTube long-form, tutorials, product demos, explainer videos |
Smaller text. Subtle shadow. Lower-third positioning.
| Font | Helvetica |
| Font size | 34–38px (responsive) |
| Color | White (#FFFFFF) |
| Outline | 1.6 — softer outline, semi-transparent black |
| Shadow | 0.6 — barely-there shadow for depth |
| Bold | No |
| Uppercase | No |
| Best for | SaaS demos, course videos, professional content, LinkedIn |
Bold text with accent color. Center placement. Uppercase forced.
| Font | Arial |
| Font size | 38–42px (responsive — largest preset) |
| Color | Golden yellow (#FFD916) |
| Outline | 3.4 — heaviest outline for feed visibility |
| Shadow | 1.0 + glow effect |
| Bold | Yes |
| Uppercase | Yes (forced) |
| Best for | YouTube Shorts, TikTok, Reels, high-energy short-form |
Comparison
Side-by-side preset comparison
| Property | Classic Bold | Minimal Clean | Highlight Brand |
|---|---|---|---|
| Font | Arial | Helvetica | Arial |
| Font size | 36–40px | 34–38px | 38–42px |
| Color | White | White | Golden yellow (#FFD916) |
| Outline | 3.0 (heavy) | 1.6 (soft) | 3.4 (heaviest) |
| Shadow | 1.0 | 0.6 | 1.0 + glow |
| Bold | Yes | No | Yes |
| Uppercase | Optional | No | Forced |
| Position | Center-bottom | Lower-third | Center-screen |
| Vibe | Clear and reliable | Quiet and professional | Loud and scroll-stopping |
Timing
How text chunks appear on screen
Caption styling isn't just visual — it's temporal. How many words appear at once and how fast they cycle affects readability and energy. Outbox offers two timing modes.
| Timing mode | Words per chunk | Pace | Pause threshold | Best for |
|---|---|---|---|---|
| short | 1–4 (target: 3) | Fast | 0.14s | Shorts, Reels, TikTok, punchy delivery |
| cinematic | 2–8 (target: 6) | Relaxed | 0.24s | Long-form YouTube, tutorials, walkthroughs |
Short mode keeps chunks tight — 2–3 words at a time, flipping quickly. Think Hormozi Shorts or viral TikToks. Cinematic mode lets longer sentences breathe — natural for 10-minute tutorials.
The timing mode pairs with the visual preset. highlight_brand + short gives high-energy short-form. minimal_clean + cinematic produces calm professional long-form.
Responsive
One preset, every resolution and ratio
Font sizes aren't fixed pixel values — they scale with your video resolution and aspect ratio. Portrait video gets slightly larger text and wider margins because phone screens demand it. Landscape gets tighter because viewers sit further from larger screens.
| Aspect ratio | Resolution | Classic Bold font size | Vertical margin |
|---|---|---|---|
| 16:9 landscape | 1920×1080 | 36px | 76px |
| 9:16 portrait | 1080×1920 | 38px | 80px |
| 1:1 square | 1080×1080 | 36px | 76px |
| 4K landscape | 3840×2160 | Scales proportionally | Scales proportionally |
Platforms
Platform-safe positioning out of the box
Every platform overlays its own UI on top of your video. If your captions sit in the wrong spot, platform chrome covers your text. Outbox's presets include safe-area margins that keep captions clear of platform UI for each aspect ratio.
| Platform | Danger zone | Outbox behavior |
|---|---|---|
| YouTube (16:9) | Bottom 48px — progress bar and controls | Captions placed above the control area with 76px+ margin |
| YouTube Shorts | Bottom 120px — description, sound, subscribe | Center-screen placement avoids bottom UI entirely |
| TikTok | Right 80px — share, comment, like buttons | Text width constrained to avoid right-side overlap |
| Instagram Reels | Bottom 140px — likes, comments, share, music label | Safe-area margins push captions above interaction zone |
You don't configure this. The preset knows the format. The layout adjusts.
Effects
Script effects shape caption presentation
Your script segments carry an effects field from the scripting stage. Caption styles respond to those effects — adjusting presentation per segment without you changing settings mid-run.
| Script effect | Style adjustment |
|---|---|
| upbeat | Slightly larger font, faster chunk transitions |
| clear | High-contrast white on black, no decoration |
| instructional | Stable lower-third positioning, smaller font |
| technical | Tighter layout, narrower line lengths |
| professional | Restrained styling, clean lines |
| fade_out | Reduced opacity near the segment end |
A video that opens with an upbeat hook and transitions into a technical walkthrough gets caption styling that matches the energy shift — automatically.
Technology
ASS subtitle format under the hood
Outbox renders captions using the ASS (Advanced SubStation Alpha) subtitle format — the same format used in professional subtitle production. ASS supports precise font control, per-character color, exact positioning, and timed effects.
Result: captions baked into the video file. Viewers see them always — no subtitle toggle. This is how every major short-form creator ships caption content.
Consistency
Workspace-level preset lock for teams
When multiple people produce content for the same brand, caption consistency breaks down. One editor picks Arial Bold. Another uses Helvetica Light. A freelancer defaults to CapCut.
Outbox solves this at the workspace level. The admin sets the default caption preset and timing mode. Every pipeline run starts with the locked configuration.
| Role | Can override preset? |
|---|---|
| Workspace admin | Sets the default, can override per run |
| Team member | Uses the workspace default, can override if admin allows |
| Template | Locks preset per template — overrides workspace default |
A 50-video batch across three editors produces identical caption styling without anyone remembering to check the style guide.
→ Related: Team Workspaces
Iteration
Switch styles without re-processing everything
Changed your mind about the caption look after your first run? Outbox's stage isolation architecture means you can re-run from the captions stage only. Analysis, script, voiceover, and alignment stay cached.
You rendered 10 videos with classic_bold last week. The client now wants minimal_clean. Re-run from the captions stage. Voiceover doesn't re-generate. Alignment doesn't re-sync. Only captions, editing, and rendering re-execute.
Examples
Real-world configuration recipes
Center-screen golden text. Fast chunk cycling. Neon green active-word highlighting for the karaoke effect.
Lower-third Helvetica text that doesn't compete with the screen recording. Sentence case for professional tone.
Readable classic styling without uppercase. Cinematic pacing matches the deliberate teaching rhythm.
vs Manual
Caption styles vs. manual subtitle styling
| Dimension | Manual workflow | Outbox Caption Styles |
|---|---|---|
| Style setup | Configure font, size, color, position per video in your editor | Select a preset once — applied to every run |
| Multi-format | Resize and reposition for each aspect ratio manually | Responsive sizing and safe-area margins — automatic |
| Brand consistency | Depends on every editor following the same style guide | Preset locked at workspace level — enforced automatically |
| Style change across videos | Re-edit every video one at a time | Re-run from the captions stage — bulk update |
| Platform safety | Test on each platform, adjust margins by trial and error | Built-in safe-area margins per format |
| Effect-driven styling | Manually adjust styling per section / timeline region | Script effects drive style adjustments per segment automatically |
Reference
Every style property you can control
| Property | What it controls | Range |
|---|---|---|
| Font name | Typeface for caption text | Arial, Helvetica |
| Font size | Base text size (scales with resolution) | 34–42px base |
| Primary color | Main text color | #FFFFFF, #FFD916 |
| Outline width | Thickness of the text outline | 1.6 (soft) to 3.4 (heavy) |
| Outline color | Color of the text outline | Black, semi-transparent |
| Shadow | Drop shadow depth | 0.6 (subtle) to 1.0 (standard) |
| Bold | Font weight | On / off |
| Uppercase | Force all-caps rendering | On / off |
| Vertical margin | Distance from bottom edge | 76–80px base |
| Highlight color | Accent color for active-word highlighting | #adff2f |
Audience
Who needs caption styles?
At volume, consistency matters more than creativity. Your audience expects a recognizable look. Caption styles lock that look across every video without per-video effort.
Different clients want different visual identities. Workspace-level presets mean each client workspace has its own caption style — no cross-contamination between brands.
The highlight_brand preset with short timing mode produces high-energy, karaoke-style captions that perform on TikTok and Shorts — generated in your pipeline, not hand-animated in CapCut.
Captions should help students learn, not distract them. minimal_clean with cinematic timing produces quiet, readable captions that respect the educational content.
Connected
How caption styles connect to the stack
Generates the timed caption events that styles are applied to.
Produces the script and effect tags that drive per-segment style adjustments.
Voice pacing determines caption chunk timing — faster delivery means shorter display windows.
Narration energy pairs with caption energy. Upbeat voice matches highlight_brand.
Locks presets per workspace — the enforcement layer for brand-consistent styling.
FAQ
Common questions about caption styles
Can I create custom presets beyond the three built-in ones?
Not yet. Outbox ships with three presets that cover the most common use cases. Custom preset creation is on the roadmap for teams that need bespoke styling — custom brand colors, specific fonts, unique positioning rules.
Can I use different presets for different videos in the same workspace?
Yes. The workspace default sets the starting point, but individual pipeline runs can override the preset. Useful when a workspace produces both long-form (Classic Bold) and short-form (Highlight Brand) content.
Do caption styles affect active-word highlighting?
Yes. Active-word highlighting uses the preset's highlight color (default: #adff2f neon green). The highlight color is configurable per run and the visual intensity adapts to the preset.
What happens to captions on a video I've already rendered?
Captions are burned into the video file. To change the style, re-run from the captions stage. Upstream stages (analyze, script, voiceover, align) stay cached.
Do styles look the same across YouTube, TikTok, and Instagram?
The caption styling itself (font, color, outline) is identical — it's baked into the video pixels. What changes is the layout: positioning and margins adapt to each platform's safe areas.
How do I switch timing modes?
Set caption_timing_mode to 'short' or 'cinematic' in your pipeline configuration, or select it in the Outbox dashboard when configuring a run.
Will custom fonts be supported?
Custom font support is planned. Current presets use Arial and Helvetica because they're available on every rendering environment. Custom fonts will require font file upload to your workspace.
Get started
Pick a preset. Run the pipeline. Captions match your brand.
Upload your footage. Select a caption preset. Choose a timing mode. Styled captions generate, burn into the video, and flow to publishing — automatically. Not sure? Start with classic_bold. It's the default for a reason.