I Built an AI That Wrote Me a Country Breakup Song

· Source: Siraj Raval · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Generative AI for Media · Depth: Intermediate, medium

Summary

An AI-generated country breakup song was created in 8 minutes using Fish Audio's S2 Pro, a platform enabling end-to-end text-to-song generation with advanced emotional control. The process involved recording 10-210 seconds of audio for voice cloning, then programmatically embedding natural language emotion tags like "whisper like a man who just lost his dog" into AI-generated lyrics. Fish Audio's S2 Pro leverages an open-source 4.4 billion parameter model, trained on 10 billion hours of audio data across 50 languages. This model employs a two-step architecture: a 4 billion parameter model for text and emotion input, feeding into a 400 million parameter model for raw audio waveform output. It utilizes Group Relative Policy Optimization (GRPO) for reinforcement learning, outperforming competitors like ElevenLabs and Inworld in benchmarks. The final song was produced by stitching the AI voice with a backing track in CapCut and uploaded to Spotify via RouteNote.

Key takeaway

For creative technologists or musicians exploring AI-driven content creation, Fish Audio's S2 Pro offers a powerful, rapid solution for producing emotionally rich, custom-voiced songs. You can clone your voice and embed nuanced emotions using natural language tags, significantly reducing production time to minutes. Consider integrating this platform into your workflow to quickly prototype or finalize tracks, leveraging its advanced capabilities to differentiate your AI-generated audio. This technology changes how you approach personalized audio content.

Key insights

Fish Audio's S2 Pro enables rapid, emotionally nuanced AI song generation via voice cloning and natural language emotion tagging.

Principles

Method

Record 10-210 seconds for voice cloning. Input lyrics with natural language emotion tags. Generate audio, then stitch with backing track. Upload to distribution platforms.

In practice

Topics

Best for: AI Engineer, AI Student, Creative Technologist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Siraj Raval.