Automating Audio Narration with Sanity Blueprints and Google Text-to-Speech

Last updated: 25.12.2025

🎧

Audio Narration

Listen to this post

I recently developed an automated audio narration pipeline for my blog using Sanity, Next.js, and Google Cloud Text-to-Speech. The system regenerates a high-quality MP3 narration automatically whenever the blog post content changes, avoiding manual steps, wasted API calls, or infinite loops. The process relies on Sanity's native automation tools, including Blueprints, delta detection, and GROQ projections. The system reacts to content changes at the CMS level, triggering narration generation only when the blog post's body field changes. This is achieved by using Sanity's delta function to detect changes and a secure webhook to initiate the narration generation via a Next.js API route. The generated MP3 is uploaded back to Sanity and linked to the post. This ensures narration is generated only once per meaningful content change. Storing the audio in Sanity is effective for a personal blog, as it utilizes Sanity's CDN, keeps editorial state and content in one place, and eliminates the need for extra storage services. The result is a fully automated, content-driven audio system with no manual triggers, unnecessary TTS calls, or client-side secrets, providing a clean separation of concerns and scalability.

I recently built an automated audio narration pipeline for my blog using Sanity, Next.js, and Google Cloud Text-to-Speech. The goal was simple: whenever the actual content of a blog post changes, regenerate a high-quality MP3 narration automatically — without manual steps, wasted API calls, or accidental infinite loops.

The key to making this reliable was leaning into Sanity's native automation tools, especially Blueprints, delta detection, and GROQ projections.

The Core Idea

Instead of generating audio on every page request, the system reacts to content changes at the CMS level:

A blog post is updated in Sanity
A Blueprint triggers only on document updates
Sanity's delta function is used to detect whether the body field actually changed
If (and only if) the body changed, Sanity calls a secure webhook
A Next.js API route generates narration using Google Text-to-Speech
The resulting MP3 is uploaded back into Sanity as a file asset
The post is patched with a reference to the audio file

This ensures narration is generated once per meaningful content change, not per request or per publish event.

Why Blueprints + Delta Matter

Sanity already knows exactly what changed in a document. By using delta I avoided fragile solutions like hashing Portable Text or doing deep JSON diffs. Combined with GROQ filters and projections, the Blueprint stays declarative, readable, and easy to maintain.

To prevent infinite loops (since uploading audio also updates the document), the function is only checking the delta in the body text and if the text audio file exist.

Why Store Audio in Sanity?

For a personal blog, storing the MP3 directly in Sanity works surprisingly well:

The audio is served via Sanity's CDN
Each post owns its narration
No extra storage service is required
Editorial state and content stay in one place

The Result

The end result is a fully automated, content-driven audio system:

No manual triggers
No unnecessary TTS calls
No client-side secrets
Clean separation of concerns

This setup scales well, respects cost limits, and fits naturally into Sanity's content model — exactly what I was aiming for.