One still image, one audio file, one bash script → a Ken Burns zoom video with the narration muxed in and auto-generated captions burned on screen. All through the FFmpeg Micro API.
No brew install ffmpeg. No Whisper Docker image. No Python ML stack. Just bash + curl + python3(all preinstalled on macOS and Linux) and a free API key.
Download Resources for This Video
Get the bash script and setup guide — completely free.
Already have an account? Log in
What You'll Learn
- ✅ Presigned-URL uploads — push bytes directly to GCS, no egress through our API
- ✅ The
/v1/transcribeendpoint — Whisper-powered SRT generation without hosting Whisper yourself - ✅ Chaining transcribe → transcode — the signed SRT URL drops straight into an FFmpeg
subtitles=filter - ✅ Ken Burns zoom via
zoompan— a still image becomes a moving video without a render farm - ✅ Audio mux with
-shortest— keep the video as long as the voice, no manual trimming
How the Script Works
One run of the script performs a six-step pipeline end-to-end:
- Upload the image —
POST /v1/upload/presigned-url+PUT+POST /v1/upload/confirm - Upload the audio — same flow, and the confirm response hands back the audio duration so the script knows how long to zoom for
- Transcribe —
POST /v1/transcribewith the audio'sgs://URL, then poll until complete - Fetch the signed SRT URL —
GET /v1/transcribe/:id/downloadreturns an HTTPS URL good for 10 minutes - Render the final video —
POST /v1/transcodeswith one filter graph:zoompan+subtitles='<signed-url>', and audio mapped from the second input with-shortest - Download the MP4 —
GET /v1/transcodes/:id/download→ curl the signed URL to disk
The whole pipeline is three API calls (transcribe, transcode, download) plus a pair of uploads. Whisper runs on our side, FFmpeg runs on our side, you never install either.
What You'll Need
- 🔧 Bash + curl + python3 — already installed on macOS and Linux
- 🔑 FFmpeg Micro API key — free tier, no credit card required
- 🖼️ A still image — JPG or PNG, landscape works best for zoom
- 🎙️ An audio file — MP3, WAV, or M4A with clear narration (music-only tracks transcribe to empty SRTs)
Cost Breakdown
Video processing minutes are billed per transcode job, rounded up — a 30-second clip consumes 1 billable minute, a 61-second clip consumes 2. Every clip in this pipeline is one transcode, so plan sizing scales directly with how many videos you render.
- • Free plan ($0/mo, 100 min): roughly 100 clips/month at ≤60s each
- • Starter plan ($19/mo, 2,000 min): roughly 2,000 clips/month
- • Pro plan ($89/mo, 12,000 min): roughly 12,000 clips/month
Check FFmpeg Micro pricing for the latest plan limits and features.
Run It — with sample assets (fastest)
The zip also ships a fetch-demo-assets.sh that pulls a sample image + narration from ffmpeg-micro.com/samples — so you can watch the pipeline run end-to-end before curating your own content:
export FFMPEG_MICRO_API_KEY=sk_live_xxx ./fetch-demo-assets.sh ./zoom-and-captions.sh sample-image.jpg sample-narration.mp3 demo.mp4 open demo.mp4
Run It — with your own content
export FFMPEG_MICRO_API_KEY=sk_live_xxx ./zoom-and-captions.sh my-photo.jpg narration.mp3 output.mp4
Come back in ~60 seconds and output.mp4 is a 1280×720 H.264 + AAC file with your image slowly zooming, your voice playing, and captions crawling along the bottom in sync.
Tweaks
The script is short and readable — open it in your editor. Easy knobs:
- • Zoom speed: the
zoom+0.0005increment per frame — lower = slower zoom - • Max zoom level: the
min(zoom+..., 1.3)cap — raise for more dramatic end - • Caption position:
force_style='Alignment=2'— 2 = bottom center, 8 = top center, 1/3 = corners - • Output resolution:
s=1280x720in the filter — swap to720x1280for vertical Shorts / Reels / TikTok - • Caption font size:
Fontsize=22— raise for more readable captions on mobile
Related training
- • Watermark a video with a logo — same bash + FFmpeg Micro shape, different filter (overlay instead of zoompan + subtitles)
- • Make 100 shorts in under a minute — parallel bash pipeline driven by a CSV
Related endpoints
See the FFmpeg API documentation for the full surface this script uses. Key endpoints:
- •
POST /v1/upload/presigned-url— generate signed PUT URLs for direct-to-GCS uploads - •
POST /v1/transcribe— Whisper-powered audio → SRT - •
GET /v1/transcribe/:id/download— signed URL for the generated SRT - •
POST /v1/transcodes— multi-input FFmpeg pipeline with zoompan + subtitles filters
Skip the install chore
FFmpeg Micro runs FFmpeg and Whisper on our side so you don't have to. Send URLs, get finished videos.
Get a free API key