n8ncaptionssubtitlesautomationbucket-outcome

Auto-Add Captions to Every Video Your Team Uploads (n8n + FFmpeg)

·Javid Jamae·6 min read
Auto-Add Captions to Every Video Your Team Uploads (n8n + FFmpeg)

Every video your team publishes needs captions. Accessibility requirements, social media autoplay, SEO value. But nobody wants to sit there adding subtitles by hand, and outsourcing transcription adds days of turnaround.

You can automate the entire thing. This n8n workflow watches a folder (Google Drive, Dropbox, or an S3 bucket), transcribes new videos with FFmpeg Micro's Whisper-based API, burns the captions directly into the video, and drops the result wherever you need it. Your team uploads a video. A few minutes later, the captioned version appears. No manual steps.

What you're building

The workflow has four stages:

  • Trigger: n8n watches for new video files in a cloud folder
  • Transcribe: FFmpeg Micro generates an SRT subtitle file from the audio track
  • Burn captions: A second FFmpeg Micro call overlays the SRT onto the video
  • Deliver: The captioned video gets saved back to your storage or sent to Slack

Each step is a single HTTP request. No FFmpeg installed anywhere. No GPU instances. No transcription service accounts.

Prerequisites

You'll need an n8n instance (cloud or self-hosted), an FFmpeg Micro API key from ffmpeg-micro.com, and a cloud storage folder your team already uses for video files.

Step 1: Set up the trigger

In n8n, create a new workflow and add a trigger node for your storage provider. Google Drive and Dropbox both have native trigger nodes. For S3, use the Schedule Trigger and check for new objects.

The trigger should fire when a new video file lands in a specific folder. Filter for video MIME types (video/mp4, video/quicktime, video/webm) so you don't accidentally process PDFs.

Step 2: Transcribe the audio to SRT

Add an HTTP Request node that calls FFmpeg Micro's transcribe endpoint. This uses Whisper under the hood to generate an SRT subtitle file from the video's audio track.

{
  "method": "POST",
  "url": "https://api.ffmpeg-micro.com/v1/transcribe",
  "headers": {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  "body": {
    "media_url": "{{ $json.fileUrl }}",
    "language": "en"
  }
}

The media_url field accepts any public URL. If your storage provides signed URLs (like S3 presigned URLs or Supabase signed URLs), those work too. The language field is optional. Leave it out and Whisper will auto-detect.

This returns a job with status: "queued". You need to poll until it's done.

Step 3: Poll until transcription completes

Add a loop that checks the transcribe job status every 3 seconds. In n8n, use an HTTP Request node inside a Loop Over Items node, or use a Wait node with a loop-back connection.

The simplest pattern:

  1. HTTP Request to GET https://api.ffmpeg-micro.com/v1/transcribe/{{ $json.id }}
  2. IF node checking {{ $json.status }} equals completed
  3. If not completed, Wait 3 seconds and loop back to the GET request
  4. If completed, continue to the next step

Once the status is completed, grab the SRT download URL:

{
  "method": "GET",
  "url": "https://api.ffmpeg-micro.com/v1/transcribe/{{ $json.id }}/download",
  "headers": {
    "Authorization": "Bearer YOUR_API_KEY"
  }
}

This returns a signed URL for the SRT file that's valid for 10 minutes. You don't need to download the SRT yourself. Just pass the URL to the next step.

Step 4: Burn captions into the video

Now you have two URLs: the original video and the generated SRT file. Send both to FFmpeg Micro's transcode endpoint to burn the subtitles directly into the video.

{
  "method": "POST",
  "url": "https://api.ffmpeg-micro.com/v1/transcodes",
  "headers": {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
  },
  "body": {
    "inputs": [{ "url": "{{ $json.originalVideoUrl }}" }],
    "outputFormat": "mp4",
    "options": [
      { "option": "-vf", "argument": "subtitles={{ $json.srtUrl }}" }
    ]
  }
}

The -vf subtitles= filter burns the SRT text onto every frame. The captions become part of the video itself, so they show up everywhere, not just players that support SRT tracks.

Poll this job the same way you polled the transcribe job. When it's completed, grab the download URL from /v1/transcodes/{id}/download.

Step 5: Save the captioned video

The final step depends on where your team wants the output. Common options:

Google Drive: Use n8n's Google Drive node to upload the file to a "Captioned" subfolder.

Slack notification: Post the download URL to a channel so the team knows it's ready.

S3/Supabase: Upload to your storage bucket with a predictable naming convention like captioned-{original-filename}.mp4.

Add an HTTP Request node to download the processed video from the FFmpeg Micro signed URL, then pipe it to your output node.

The complete n8n workflow

The finished workflow, node by node:

  1. Google Drive Trigger (new file in /Team Videos/)
  2. HTTP Request: POST /v1/transcribe (start transcription)
  3. Loop: Poll GET /v1/transcribe/{id} until completed
  4. HTTP Request: GET /v1/transcribe/{id}/download (get SRT URL)
  5. HTTP Request: POST /v1/transcodes (burn captions)
  6. Loop: Poll GET /v1/transcodes/{id} until completed
  7. HTTP Request: GET /v1/transcodes/{id}/download (get final video)
  8. Google Drive Upload: Save to /Team Videos/Captioned/

Total: 8 nodes, zero code, runs automatically.

Common pitfalls

The SRT download URL expires in 10 minutes. Don't add a long delay between the transcribe download and the transcode request. The signed URL from /v1/transcribe/{id}/download has a 10-minute window. If you're processing long videos, start the transcode immediately after getting the SRT URL.

Public URLs only. FFmpeg Micro needs to download your video file. If your Google Drive or Dropbox file isn't publicly accessible, you'll need to generate a sharing link or use a signed URL. Most n8n storage nodes can output a public download URL.

Caption styling is default. The basic subtitles= filter uses FFmpeg's default white-text-with-black-outline style. If you want custom fonts, colors, or positioning, you'll need to pass additional FFmpeg filter options. Check the FFmpeg subtitles filter documentation for styling parameters.

Language detection isn't perfect. If your team records in a specific language, set the language field explicitly in the transcribe request instead of relying on auto-detection. Auto-detect works well for common languages but can struggle with code-switching or heavy accents.

FAQ

How accurate is the transcription?
FFmpeg Micro uses Whisper for transcription, which handles English and most European languages very well. For clean audio with a single speaker, expect 95%+ accuracy. Background noise, multiple speakers, or heavy accents will reduce accuracy.

Can I process videos longer than 10 minutes?
Yes. FFmpeg Micro handles videos of any reasonable length. Transcription time scales roughly linearly with audio duration. A 10-minute video typically takes 1-2 minutes to transcribe. Billing is per minute of input media.

What about non-English content?
Set the language field to the appropriate BCP-47 code (like "es" for Spanish or "fr" for French). You can also use "task": "translate" to transcribe foreign-language audio and output the subtitles in English.

Does this work with n8n Cloud?
Yes. Everything runs through HTTP requests to the FFmpeg Micro API. No Execute Command nodes, no local FFmpeg binaries. n8n Cloud handles it just fine.

What does this cost per video?
Two FFmpeg Micro API calls per video: one for transcription, one for burning captions. Both bill per minute of input. A 5-minute video costs roughly the equivalent of 10 minutes of processing. On the free tier, that's enough to test the workflow. Production pricing starts at $19/month.

About Javid Jamae

Founder & CEO at FFmpeg Micro

Javid is a software engineer, author, and entrepreneur with over 25 years of professional software development experience across enterprise, startup, and consulting environments. He founded FFmpeg Micro to make video processing accessible to developers through a simple, automation-first REST API.

Software EngineeringVideo ProcessingFFmpegCloud ArchitectureAPI DesignAutomation

Ready to process videos at scale?

Start using FFmpeg Micro's simple API today. No infrastructure required.

Get Started Free