ffmpegautomationn8nyoutube-shortsbucket-outcome

RSS Feed to YouTube Short: Build the Full Pipeline

·Javid Jamae·7 min read
RSS Feed to YouTube Short: Build the Full Pipeline

Every time a new article drops in your RSS feed, you could turn it into a YouTube Short. Not manually. Not with a VA. Fully automated: new post triggers the pipeline, and a finished Short lands in your upload queue.

The pieces exist. RSS triggers, LLMs that summarize text, text-to-speech APIs, stock background video, and a video composition API that stitches it all together. The trick is wiring them into a pipeline that actually runs end to end without you touching it.

The Pipeline Architecture

Five stages, each handled by a different tool:

  • RSS trigger catches new articles (n8n, Make.com, or a cron job)
  • LLM rewrites the article into a 30-60 second script (Claude, GPT-4, or any API)
  • TTS converts the script to audio (ElevenLabs, OpenAI TTS, or Google Cloud TTS)
  • FFmpeg Micro composites the audio over background video with text overlay
  • Upload pushes the finished Short to YouTube (YouTube Data API)

Each stage produces one artifact that feeds the next. If any stage fails, the pipeline stops cleanly without leaving half-finished garbage.

Stage 1: RSS Trigger

In n8n, drop an RSS Feed Trigger node pointing at your source feed. It fires every time a new item appears.

{
  "feedUrl": "https://techcrunch.com/feed/",
  "pollInterval": 15
}

The trigger outputs the article title, link, and content body. That body is what the LLM will work with.

If you're not using n8n, a simple cron script works too. Fetch the RSS XML, parse it, check against a list of already-processed GUIDs, and pass new items downstream.

Stage 2: LLM Script Generation

Feed the article body to an LLM with a system prompt that outputs a tight script. YouTube Shorts max out at 60 seconds, so the script needs to land between 120-180 words (roughly 1 word per 0.4 seconds at natural speaking pace).

{
  "model": "claude-sonnet-4-20250514",
  "system": "You are a social media scriptwriter. Convert the article into a 60-second video script. Output ONLY the narration text, no stage directions. Keep it between 120-180 words. Start with a hook. End with a takeaway.",
  "messages": [
    {
      "role": "user",
      "content": "Article: {{article_body}}"
    }
  ]
}

The output is plain text narration. No formatting, no SSML tags, just words someone would say out loud.

Stage 3: Text-to-Speech

Send the script to a TTS API. ElevenLabs gives the most natural results. OpenAI's TTS is solid and cheaper. Google Cloud TTS works if you're already in the GCP ecosystem.

curl -s -X POST "https://api.openai.com/v1/audio/speech" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Your generated script text here...",
    "voice": "onyx",
    "response_format": "mp3"
  }' --output narration.mp3

Upload the resulting audio file to FFmpeg Micro so it's accessible for the composition step:

# Get presigned upload URL
PRESIGN=$(curl -s -X POST "https://api.ffmpeg-micro.com/v1/upload/presigned-url" \
  -H "Authorization: Bearer $FFMPEG_MICRO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"filename": "narration.mp3", "contentType": "audio/mpeg", "fileSize": 245000}')

UPLOAD_URL=$(echo $PRESIGN | python3 -c "import sys,json; print(json.load(sys.stdin)['result']['uploadUrl'])")
FILENAME=$(echo $PRESIGN | python3 -c "import sys,json; print(json.load(sys.stdin)['result']['filename'])")

# Upload the file
curl -s -X PUT "$UPLOAD_URL" \
  -H "Content-Type: audio/mpeg" \
  --data-binary @narration.mp3

# Confirm upload
curl -s -X POST "https://api.ffmpeg-micro.com/v1/upload/confirm" \
  -H "Authorization: Bearer $FFMPEG_MICRO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"filename": "'"$FILENAME"'", "fileSize": 245000}'

Stage 4: Compose with FFmpeg Micro

This is where it comes together. You have a background video (stock footage, animated gradient, or a looping clip) and the narration audio. FFmpeg Micro merges them and adds a text overlay with the article headline.

curl -s -X POST "https://api.ffmpeg-micro.com/v1/transcodes" \
  -H "Authorization: Bearer $FFMPEG_MICRO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": [
      {"url": "https://your-bucket.com/background-9x16.mp4"},
      {"url": "gs://your-bucket/'"$FILENAME"'"}
    ],
    "outputFormat": "mp4",
    "options": [
      {"option": "@text-overlay", "argument": {
        "text": "'"$ARTICLE_TITLE"'",
        "style": {
          "charsPerLine": 18,
          "fontSize": 60,
          "lineSpacing": 15,
          "y": "(h-text_h)/2",
          "boxBorderW": 12
        }
      }}
    ]
  }'

FFmpeg Micro takes the two inputs (video and audio), composites them, and adds the text overlay with auto word-wrap. The @text-overlay virtual option handles font rendering, positioning, and line breaking without you writing raw drawtext filter syntax.

Poll for completion:

# Poll until done (typically 10-30 seconds for a 60s Short)
JOB_ID="<from previous response>"
while true; do
  STATUS=$(curl -s "https://api.ffmpeg-micro.com/v1/transcodes/$JOB_ID" \
    -H "Authorization: Bearer $FFMPEG_MICRO_API_KEY" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
  [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ] && break
  sleep 3
done

# Get download URL
DOWNLOAD_URL=$(curl -s "https://api.ffmpeg-micro.com/v1/transcodes/$JOB_ID/download" \
  -H "Authorization: Bearer $FFMPEG_MICRO_API_KEY" | python3 -c "import sys,json; print(json.load(sys.stdin)['url'])")

Stage 5: Upload to YouTube

Download the finished video and push it to YouTube via the Data API. You'll need OAuth2 credentials with the youtube.upload scope.

# Download the composed video
curl -s -o short.mp4 "$DOWNLOAD_URL"

# Upload to YouTube (simplified - real implementation needs OAuth token refresh)
curl -s -X POST "https://www.googleapis.com/upload/youtube/v3/videos?part=snippet,status" \
  -H "Authorization: Bearer $YOUTUBE_ACCESS_TOKEN" \
  -H "Content-Type: application/octet-stream" \
  -d @short.mp4

In practice, you'd use the YouTube API client library for proper multipart uploads with metadata (title, description, tags, #Shorts hashtag). n8n has a built-in YouTube node that handles this.

Putting It Together in n8n

The full n8n workflow has five nodes chained sequentially:

  • RSS Feed Trigger fires on new articles
  • HTTP Request node calls your LLM API for script generation
  • HTTP Request node calls TTS API and uploads audio to FFmpeg Micro
  • HTTP Request node creates the transcode job and polls for completion
  • YouTube node uploads the finished Short

Each node passes its output to the next. If the RSS feed publishes 3 articles in one poll cycle, n8n processes all three as separate executions. No batching logic needed.

Common Pitfalls

Background video shorter than narration. If your background clip is 30 seconds but the narration is 55 seconds, FFmpeg cuts the output at the shorter input. Use a looping background or pick footage longer than your max script length.

Audio format mismatch. Some TTS APIs output WAV, others MP3 or OGG. FFmpeg handles all of them, but make sure the contentType in your upload matches the actual file format. A mismatch causes the upload confirmation to fail.

Shorts aspect ratio. YouTube Shorts need 9:16 vertical video (1080x1920). Your background video must already be 9:16. FFmpeg Micro can resize, but starting with the right dimensions avoids quality loss from scaling.

Rate limits on TTS APIs. ElevenLabs and OpenAI both have per-minute rate limits on their free tiers. If your RSS feed dumps 10 articles at once, you'll hit the limit. Add a delay between executions or queue them.

FAQ

Can I use Make.com instead of n8n for this pipeline?

Yes. Make.com has HTTP modules that call the same APIs. The workflow is structurally identical: RSS trigger, HTTP to LLM, HTTP to TTS, HTTP to FFmpeg Micro, HTTP to YouTube. n8n gives you more control over error handling and retries, but Make.com works fine for simpler setups. Check the Make.com FFmpeg integration guide for setup details.

How long does the FFmpeg Micro composition step take?

For a 60-second YouTube Short, composition typically takes 10-30 seconds. The bulk of that is audio merging and text overlay rendering. FFmpeg Micro processes asynchronously, so your pipeline polls until it's done. You won't notice the wait in an automated workflow.

What does this cost to run per video?

Rough breakdown for one Short: LLM script generation costs about $0.01-0.03. TTS (OpenAI tts-1) is roughly $0.015 per 1000 characters. FFmpeg Micro charges by processing minutes, and a 60-second composition uses 1 billable minute. The free tier covers testing and low-volume pipelines.

Can I add captions instead of a headline overlay?

Yes. Use FFmpeg Micro's transcribe endpoint to generate SRT subtitles from your audio, then burn them in during composition. The auto-caption guide walks through this exact flow. It adds one more API call to the pipeline but the result looks more polished.

What RSS feeds work best for this?

Industry news feeds with short, punchy articles convert best. Tech news (TechCrunch, The Verge), product launches, and AI research summaries all produce good 60-second scripts. Long-form essays and opinion pieces don't compress well into Shorts. Test with 5-10 articles before committing to a feed.

About Javid Jamae

Founder & CEO at FFmpeg Micro

Javid is a software engineer, author, and entrepreneur with over 25 years of professional software development experience across enterprise, startup, and consulting environments. He founded FFmpeg Micro to make video processing accessible to developers through a simple, automation-first REST API.

Software EngineeringVideo ProcessingFFmpegCloud ArchitectureAPI DesignAutomation

Ready to process videos at scale?

Start using FFmpeg Micro's simple API today. No infrastructure required.

Get Started Free