Posted by Zal?n Borsos, Research Software Engineer, and Marco Tagliasacchi, Senior Staff Research Scientist, Google Research

The recent progress in generative AI unlocked the possibility of creating new content in several different domains, including text, vision and audio. These models often rely on the fact that raw data is first converted to a compressed format as a sequence of tokens. In the case of audio, neural audio codecs (e.g., SoundStream or EnCodec) can efficiently compress waveforms to a compact representation, which can be inverted to reconstruct an approximation of the original audio signal. Such a representation consists of a sequence of discrete audio tokens, capturing the local properties of sounds (e.g., phonemes) and their temporal structure (e.g., prosody). By representing audio as a sequence of discrete tokens, audio generation can be performed with Transformer-based sequence-to-sequence models — this has unlocked rapid progress in speech continuation (e.g., with AudioLM), text-to-speech (e.g., with SPEAR-TTS), and general audio and music generation (e.g., AudioGen and MusicLM). Many generative audio models, including AudioLM, rely on auto-regressive decoding, which produces tokens one by one. While this method achieves high acoustic quality, inference (i.e., calculating an output) can be slow, especially when decoding long sequences.

To address this issue, in “SoundStorm: Efficient Parallel Audio Generation“, we propose a new method for efficient and high-quality audio generation. SoundStorm addresses the problem of generating long audio token sequences by relying on two novel elements: 1) an architecture adapted to the specific nature of audio tokens as produced by the SoundStream neural codec, and 2) a decoding scheme inspired by MaskGIT, a recently proposed method for image generation, which is tailored to operate on audio tokens. Compared to the autoregressive decoding approach of AudioLM, SoundStorm is able to generate tokens in parallel, thus decreasing the inference time by 100x for long sequences, and produces audio of the same quality and with higher consistency in voice and acoustic conditions. Moreover, we show that SoundStorm, coupled with the text-to-semantic modeling stage of SPEAR-TTS, can synthesize high-quality, natural dialogues, allowing one to control the spoken content (via transcripts), speaker voices (via short voice prompts) and speaker turns (via transcript annotations), as demonstrated by the examples below:

Input: Text (transcript used to drive the audio generation in bold)

Something really funny happened to me this morning. Well, uh I woke up as usual.

Post Disclaimer
The information provided in our posts or blogs are for educational and informative purposes only. We do not guarantee the accuracy, completeness or suitability of the information. We do not provide financial or investment advice. Readers should always seek professional advice before making any financial or investment decisions based on the information provided in our content. We will not be held responsible for any losses, damages or consequences that may arise from relying on the information provided in our content.

PAYMENT PROCESSING

Payment Processing in 2025: The Fast, The Secure, and The Smart

Tech News Contributor - August 10, 2025

SUPPLY CHAIN MANAGEMENT

The Future of Supply Chain Management: 2025–2026 Tech Trends to Watch

Tech News Contributor - August 10, 2025

Computing Power

Computing Power in 2025: How AI Is Supercharging PCs and Laptops

FROM THE EDITORS

The Road Ahead: Editors’ Vision for the Remainder of 2025 and the Dawn of 2026

Share

Facebook
Twitter
Pinterest
WhatsApp
Linkedin
ReddIt
Email
Telegram

Previous article
Unifying image-caption and image-classification datasets with prefix conditioning
Next article
Responsible AI at Google Research: AI for Social Good

Tech News Contributor https://techonlinenews.com

RELATED ARTICLES

AI & Machine Learning

AI & Machine Learning: From Buzzwords to Boardroom Blueprints

August 10, 2025

AI & Machine Learning

Explainable AI Trends 2025: Boosting Transparency and Trust in Artificial Intelligence

March 15, 2025

AI & Machine Learning

Revolutionary AI Agent Technology for 2025

January 26, 2025

- Advertisment -

Most Popular

7 Key Factors Affecting Supply Chain Management Success

October 27, 2023

The Future of Supply Chain Management: 2025–2026 Tech Trends to Watch

August 10, 2025

Comprehending the Function and Duties of a Supply Chain Manager

March 4, 2024

Understanding the Matarbari Deep Sea Port from a Supply Chain Management Perspective

February 4, 2024

Load more

Recent Comments

EDITOR PICKS

When AI Meets Cybersecurity: The Digital Arms Race We All Signed Up For

August 8, 2025

Navigating the Web 3.0: A Guide to Harnessing Its Power in 2024

December 21, 2023

Payment Processing in 2025: The Fast, The Secure, and The Smart

August 10, 2025

POPULAR POSTS

Cloud Native Identity and Access Management in Kubernetes

April 5, 2023

AI & Machine Learning: From Buzzwords to Boardroom Blueprints

August 10, 2025

The Future of Payments: How AI and Machine Learning are Revolutionizing Account-to-Account (A2A) Transactions

December 15, 2024

POPULAR CATEGORY
ELECTRONICS577
AI & Machine Learning380
Advanced Shipping125
Cryptocurrency110
POCKET GADGETS100
BlockChain79
Deep learning with Tensorflow77
eCom Builder Apps75

ABOUT TECH ONLINE NEWS

TECH ONLINE NEWS is an online digital newspaper covering the latest news coverage of a wide spectrum of advanced technologies from around the globe.

Contact us: info@techonlinenews.com

FOLLOW US

Behance
CloutHub
Facebook
Flickr
Flipboard
Instagram
Linkedin
Medium
Pinterest
TikTok
Twitter
Vimeo
Website
Youtube

Corporate

About Us

Legal

peterjonathanwilcheck.com

Services

SiteMap

Communications

Advertising

Affiliate Programs

Events

Newsletter

Podcast

Sponsorships

Webinars

Privacy Policy

Terms of Use

Powered by: Alttrix Cloud Corporation Web Development by: Acayasoft

© Tech News Media managed by Aurora News Media Corporation

2025

error: Content is protected !!

SoundStorm: Efficient parallel audio generation

Post Disclaimer

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT TECH ONLINE NEWS

FOLLOW US